Rising1 sources· last seen 19h ago· first seen 19h ago

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090

Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B. We call it Luce DFlash ([https://github.com/Luce-Org/lucebox-hub](https://gi

Lead: r/LocalLLaMABigness: 34lucedflashqwen36-27bthroughput
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
85
602 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090
REDDIT · r/LocalLLaMA · 19h ago · ⬆ 602 · 💬 166
score 115

Hey fellow Llamas, your time is precious, so I'll keep it short. We built a GGUF port of DFlash speculative decoding. Standalone C++/CUDA stack on top of ggml, runs on a single 24 GB RTX 3090, hosts the new Qwen3.6-27B. We call it Luce DFlash ([https://github.com/Luce-Org/lucebox-hub](https://gi