Cluster1 sources· last seen 4h ago· first seen 4h ago

llama.cpp Gemma 4 using up all system RAM on larger prompts

Something I'm noticing that I don't think I've noticed before. I've been testing out Gemma 4 31B with 32GB of VRAM and 64GB of DDR5. I can load up the UD\_Q5\_K\_XL Unsloth quant with about 100k context with plenty of VRAM headroom, but what ends up killing me is sending a few prompts and the actual

Lead: r/LocalLLaMABigness: 20metacppgemmaramlarger

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

16 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

llama.cpp Gemma 4 using up all system RAM on larger prompts

REDDIT · r/LocalLLaMA · 4h ago · ⬆ 16 · 💬 26

score 113