Rising1 sources· last seen 2h ago· first seen 2h ago

In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation

The comment: [https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357](https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357) I think this could be great for existing q8 users. Personally I'll be sticking with fp16 for the foreseeable future.

Lead: r/LocalLLaMABigness: 25recentrotationfoundexistingquants
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
59
75 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

The comment: [https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357](https://github.com/ggml-org/llama.cpp/pull/21038#issuecomment-4150413357) I think this could be great for existing q8 users. Personally I'll be sticking with fp16 for the foreseeable future.