Rising1 sources· last seen 3h ago· first seen 3h ago

DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)

I'm building a native MLX implementation of DFlash ([paper](https://arxiv.org/abs/2602.06036)) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (greedy exact argmax ma

Lead: r/LocalLLaMABigness: 27dflashspeculativedecodingapplesilicon
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
65
134 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

score 127

I'm building a native MLX implementation of DFlash ([paper](https://arxiv.org/abs/2602.06036)) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (greedy exact argmax ma