Rising1 sources· last seen 3h ago· first seen 1d ago

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from [yesterday](https://www.reddit.com/r/LocalLLaMA/comments/1sv8eua/qwen3627b_at_80_tps_with_218k_context_window_on/) and delivered a whopping 100+ tps (TG). Model: [https://huggingface.co/Lo

Lead: r/LocalLLaMABigness: 27qwen36-27b-int4clocking100tps
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
64
408 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

score 125

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from [yesterday](https://www.reddit.com/r/LocalLLaMA/comments/1sv8eua/qwen3627b_at_80_tps_with_218k_context_window_on/) and delivered a whopping 100+ tps (TG). Model: [https://huggingface.co/Lo

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19
REDDIT · r/LocalLLaMA · 1d ago · ⬆ 320 · 💬 125
score 103

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: [https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP](https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP) Can follow the same recipe I used for Qwen3.5-27B to achieve \~80 tps on a single RTX