Rising1 sources· last seen 17h ago· first seen 17h ago
Needle: We Distilled Gemini Tool Calling Into a 26M Model
We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led
Lead: r/LocalLLaMABigness: 31needledistilledgooglecalling26m
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
75
305 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
Needle: We Distilled Gemini Tool Calling Into a 26M Model
REDDIT · r/LocalLLaMA · 17h ago · ⬆ 305 · 💬 40
score 114
We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led