Big1 sources· last seen 5h ago· first seen 5h ago
Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance!
After the great work yesterday of TheTom's work on showing Turboquant working in Llama.cpp I added a few other things that added some more complimentary speedups to Llama.cpp. so far CPU and CUDA build and are fully usable. I'm seeing full speed token generation on my 16gb 4060ti up to 256k+ context
Lead: r/LocalLLaMABigness: 49metacppturboquantheavy-hitteroracle
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
39
13 upvotes across 1 sub
📈 Google Trends
90
Meta AI: 90/100
Full methodology: How scoring works
Receipts (all sources)
Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance!
REDDIT · r/LocalLLaMA · 5h ago · ⬆ 13 · 💬 12
score 109
After the great work yesterday of TheTom's work on showing Turboquant working in Llama.cpp I added a few other things that added some more complimentary speedups to Llama.cpp. so far CPU and CUDA build and are fully usable. I'm seeing full speed token generation on my 16gb 4060ti up to 256k+ context