Cluster1 sources· last seen 4h ago· first seen 4h ago

llama.cpp Gemma 4 using up all system RAM on larger prompts

Something I'm noticing that I don't think I've noticed before. I've been testing out Gemma 4 31B with 32GB of VRAM and 64GB of DDR5. I can load up the UD\_Q5\_K\_XL Unsloth quant with about 100k context with plenty of VRAM headroom, but what ends up killing me is sending a few prompts and the actual

Lead: r/LocalLLaMABigness: 20metacppgemmaramlarger
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
44
16 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

llama.cpp Gemma 4 using up all system RAM on larger prompts
REDDIT · r/LocalLLaMA · 4h ago · ⬆ 16 · 💬 26
score 113

Something I'm noticing that I don't think I've noticed before. I've been testing out Gemma 4 31B with 32GB of VRAM and 64GB of DDR5. I can load up the UD\_Q5\_K\_XL Unsloth quant with about 100k context with plenty of VRAM headroom, but what ends up killing me is sending a few prompts and the actual