Rising1 sources· last seen 3h ago· first seen 3h ago

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

I'm building a native MLX implementation of DFlash ([paper](https://arxiv.org/abs/2602.06036)) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (greedy exact argmax ma

Lead: r/LocalLLaMABigness: 27dflashspeculativedecodingapplesilicon

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

134 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

REDDIT · r/LocalLLaMA · 3h ago · ⬆ 134 · 💬 27

score 127