Rising1 sources· last seen 5h ago· first seen 10h ago

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

Had been getting [great MTP performance](https://www.reddit.com/r/LocalLLaMA/comments/1t82zxv/80_toksec_and_128k_context_on_12gb_vram_with/) with [llama.cpp](https://github.com/ggml-org/llama.cpp) on my RTX 4070 Super 12GB, until they actually merged the MTP PR. Then, performance tanked and was bare

Lead: r/LocalLLaMABigness: 31110tok12gbvramqwen3

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

265 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

REDDIT · r/LocalLLaMA · 5h ago · ⬆ 167 · 💬 58

score 127

Qwen3.6 27B and llama.cpp appreciation post

REDDIT · r/LocalLLaMA · 10h ago · ⬆ 98 · 💬 57

score 116

To preface, here's my config: llama-server \ --host 0.0.0.0 \ --port 1235 \ --models-preset %h/Software/models.ini \ --models-max 1 \ --sleep-idle-seconds 3600 \ --timeout 3600 \ --parallel 1 \ --device ROCm0,ROCm1 [*] flash-attn