Cluster1 sources· last seen 6d ago· first seen 6d ago

DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)

A few days ago I posted early results from a native MLX implementation of DFlash. Since then I rewrote the benchmark methodology, fixed numerical issues, and open sourced the whole thing. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forwar

Lead: r/LocalLLaMABigness: 15dflashspeculativedecodingapplesilicon

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

30 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max)

REDDIT · r/LocalLLaMA · 6d ago · ⬆ 30 · 💬 13

score 119