Cluster1 sources· last seen 6d ago· first seen 6d ago
DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)
A few days ago I posted early results from a native MLX implementation of DFlash. Since then I rewrote the benchmark methodology, fixed numerical issues, and open sourced the whole thing. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forwar
Lead: r/LocalLLaMABigness: 15dflashspeculativedecodingapplesilicon
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
28
30 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)
REDDIT · r/LocalLLaMA · 6d ago · ⬆ 30 · 💬 13
score 119
A few days ago I posted early results from a native MLX implementation of DFlash. Since then I rewrote the benchmark methodology, fixed numerical issues, and open sourced the whole thing. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forwar