Rising1 sources· last seen 2h ago· first seen 21h ago

Testing llama.cpp MTP support on Qwen3.6 - RTX 5090

Setup: \- RTX 5090, 32 GB, Linux \- Built llama.cpp from 4f13cb7 (the official [ghcr.io/ggml-org/llama.cpp:server-cuda](http://ghcr.io/ggml-org/llama.cpp:server-cuda) image hasn't picked up the merge yet as of writing — had to docker build from source with CUDA\_DOCKER\_ARCH=120) \- Unslot

Lead: r/LocalLLaMABigness: 34testingmetacppmtpsupport

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

647 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

Testing llama.cpp MTP support on Qwen3.6 - RTX 5090

REDDIT · r/LocalLLaMA · 3h ago · ⬆ 70 · 💬 10

score 123

Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090

REDDIT · r/LocalLLaMA · 2h ago · ⬆ 20 · 💬 19

score 117

Saw some posts around PP being slower, so they were cautious on trying it. Here's a real-world datapoint. **Settings:** * Headless RTX 3090 24G * OpenCode * Model unsloth's Qwen3.6-27B-MTP-Q4\_K\_M.gguf * 128k context * q8\_0 kv cache * \--spec-draft-n-max: 3 * \--draft-p-min: 0 **Use Cases:**

MTP support merged into llama.cpp

REDDIT · r/LocalLLaMA · 21h ago · ⬆ 557 · 💬 103

score 112

PR [22673](https://github.com/ggml-org/llama.cpp/pull/22673) has been merged into master! 🎉