Rising1 sources· last seen 3h ago· first seen 1d ago

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from [yesterday](https://www.reddit.com/r/LocalLLaMA/comments/1sv8eua/qwen3627b_at_80_tps_with_218k_context_window_on/) and delivered a whopping 100+ tps (TG). Model: [https://huggingface.co/Lo

Lead: r/LocalLLaMABigness: 27qwen36-27b-int4clocking100tps

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

408 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

REDDIT · r/LocalLLaMA · 3h ago · ⬆ 88 · 💬 22

score 125

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

REDDIT · r/LocalLLaMA · 1d ago · ⬆ 320 · 💬 125

score 103

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: [https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP](https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP) Can follow the same recipe I used for Qwen3.5-27B to achieve \~80 tps on a single RTX