Rising1 sources· last seen 2h ago· first seen 2h ago

In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation

The comment: [https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357](https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357) I think this could be great for existing q8 users. Personally I'll be sticking with fp16 for the foreseeable future.

Lead: r/LocalLLaMABigness: 25recentrotationfoundexistingquants

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

75 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation

REDDIT · r/LocalLLaMA · 2h ago · ⬆ 75 · 💬 33

score 126