Big2 sources· last seen 1h ago· first seen 9h ago

llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig

PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4\_K\_M: 11.7 → 21.2 tok/s (1.81×) * Q8\_0: 7.4 → 18.1

Lead: r/LocalLLaMABigness: 65benchmarkingmetacpp'smtpsupport
📡 Coverage
50
2 news sources
🟠 Hacker News
11
1 pts, 0 comments
🔴 Reddit
56
57 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

Benchmarking llama.cpp's new MTP support on Strix Halo
HACKERNEWS · Hacker News · 1h ago · ▲ 1
score 155

PR #22673 (commit 4f13cb7) landed MTP speculative decoding in mainline llama.cpp on May 16. I tested it on two separate rigs. Qwen3.6 27B, single-stream chat, temperature 0, median of 5 runs: Strix Halo (Framework Desktop, ROCm 7.0.2): * Q4\_K\_M: 11.7 → 21.2 tok/s (1.81×) * Q8\_0: 7.4 → 18.1