Rising1 sources· last seen 3h ago· first seen 3h ago
DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)
I'm building a native MLX implementation of DFlash ([paper](https://arxiv.org/abs/2602.06036)) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (greedy exact argmax ma
Lead: r/LocalLLaMABigness: 27dflashspeculativedecodingapplesilicon
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
65
134 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
DFlash speculative decoding on Apple Silicon : 85 tok/s, 3.3x on Qwen3.5-9B (MLX, M5 Max)
REDDIT · r/LocalLLaMA · 3h ago · ⬆ 134 · 💬 27
score 127
I'm building a native MLX implementation of DFlash ([paper](https://arxiv.org/abs/2602.06036)) for Apple Silicon. A small draft model generates 16 tokens in parallel via block diffusion, the target verifies them in one forward pass. Output is bit-for-bit identical to baseline (greedy exact argmax ma