Big2 sources· last seen 1h ago· first seen 9h ago
llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig
PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4\_K\_M: 11.7 → 21.2 tok/s (1.81×) * Q8\_0: 7.4 → 18.1
Lead: r/LocalLLaMABigness: 65benchmarkingmetacpp'smtpsupport
📡 Coverage
50
2 news sources
🟠 Hacker News
11
1 pts, 0 comments
🔴 Reddit
56
57 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
Benchmarking llama.cpp's new MTP support on Strix Halo
HACKERNEWS · Hacker News · 1h ago · ▲ 1
score 155
llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig
REDDIT · r/LocalLLaMA · 9h ago · ⬆ 57 · 💬 29
score 114
PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4\_K\_M: 11.7 → 21.2 tok/s (1.81×) * Q8\_0: 7.4 → 18.1