Big1 sources· last seen 5h ago· first seen 5h ago

Llama.cpp with Turboquant, Heavy-Hitter Oracle (H2O), and StreamingLLM. Even more performance!

After the great work yesterday of TheTom's work on showing Turboquant working in Llama.cpp I added a few other things that added some more complimentary speedups to Llama.cpp. so far CPU and CUDA build and are fully usable. I'm seeing full speed token generation on my 16gb 4060ti up to 256k+ context

Lead: r/LocalLLaMABigness: 49metacppturboquantheavy-hitteroracle
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
39
13 upvotes across 1 sub
📈 Google Trends
90
Meta AI: 90/100
Full methodology: How scoring works

Receipts (all sources)

After the great work yesterday of TheTom's work on showing Turboquant working in Llama.cpp I added a few other things that added some more complimentary speedups to Llama.cpp. so far CPU and CUDA build and are fully usable. I'm seeing full speed token generation on my 16gb 4060ti up to 256k+ context

Related clusters