Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19
Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from [yesterday](https://www.reddit.com/r/LocalLLaMA/comments/1sv8eua/qwen3627b_at_80_tps_with_218k_context_window_on/) and delivered a whopping 100+ tps (TG). Model: [https://huggingface.co/Lo
Receipts (all sources)
Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from [yesterday](https://www.reddit.com/r/LocalLLaMA/comments/1sv8eua/qwen3627b_at_80_tps_with_218k_context_window_on/) and delivered a whopping 100+ tps (TG). Model: [https://huggingface.co/Lo
Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: [https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP](https://huggingface.co/sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP) Can follow the same recipe I used for Qwen3.5-27B to achieve \~80 tps on a single RTX