Rising1 sources· last seen 2h ago· first seen 2h ago
In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation
The comment: [https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357](https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357) I think this could be great for existing q8 users. Personally I'll be sticking with fp16 for the foreseeable future.
Lead: r/LocalLLaMABigness: 25recentrotationfoundexistingquants
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
59
75 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation
REDDIT · r/LocalLLaMA · 2h ago · ⬆ 75 · 💬 33
score 126
The comment: [https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357](https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357) I think this could be great for existing q8 users. Personally I'll be sticking with fp16 for the foreseeable future.