Rising1 sources· last seen 15h ago· first seen 16h ago

Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed

### **TL;DR** All models were Qwen3.6 **27B-MTP vs Base 27B (15k single-turn): Faster overall** * **Total Time (wall):** 87.44s → 77.39s (**10.05s faster** / -11.50%) * **Generation:** 7.63 → 16.15 t/s (+111.77% speedup) * **Prompt Processing:** 279.75 → 244.90 t/s (-12.46% slowdown) **35B-MTP vs

Lead: r/LocalLLaMABigness: 29strixhalometacppmtp

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

169 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed

REDDIT · r/LocalLLaMA · 16h ago · ⬆ 126 · 💬 55

score 110

Qwen 27b MTP Config, Llama.cpp Single 3090

REDDIT · r/LocalLLaMA · 15h ago · ⬆ 43 · 💬 32

score 103

What setup are you using for qwen 27b on a single 3090? Here's what I've started using today. It has to compact often but I'm worried about giving up more accuracy and reliability with a lower quant: `llama-server -m /Models/q3.6/Qwen3.6-27B-Q5_K_S.gguf -c 65536 -ngl -1 -t 8 -ctk q8_0 -ctv q8_0 -