Rising1 sources· last seen 5h ago· first seen 5h ago

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

I mean, I have 40GB of Vram and I still cannot fit the entire Unsloth Gemma-4-31B-it-UD-Q8 (35GB) even at 2K context size unless I quantize KV to Q4 with 2K context size? WTF? For comparison, I can fit the entire UD-Q8 Qwen3.5-27B at full context without KV quantization! If I have to run a Q4 Gemm

Lead: r/LocalLLaMABigness: 28biggestissuegemma-4massivecache

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

118 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

My biggest Issue with the Gemma-4 Models is the Massive KV Cache!!

REDDIT · r/LocalLLaMA · 5h ago · ⬆ 118 · 💬 68

score 124