Rising1 sources· last seen 12h ago· first seen 12h ago
Local model on coding has reached a certain threshold to be feasible for real work
We ran open-weight 27B–32B models on Terminal-Bench 2.0 (89 tasks, `terminal-bench-2.git @ 69671fb`) through our agent harness. Best result was Qwen 3.6-27B at **38.2% (34/89)** under the **default** per-task timeout — the same constraint the public leaderboard uses ([Qwen's official post uses a mor
Lead: r/LocalLLaMABigness: 26localcodingreachedcertainthreshold
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
61
93 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
Local model on coding has reached a certain threshold to be feasible for real work
REDDIT · r/LocalLLaMA · 12h ago · ⬆ 93 · 💬 34
score 113
We ran open-weight 27B–32B models on Terminal-Bench 2.0 (89 tasks, `terminal-bench-2.git @ 69671fb`) through our agent harness. Best result was Qwen 3.6-27B at **38.2% (34/89)** under the **default** per-task timeout — the same constraint the public leaderboard uses ([Qwen's official post uses a mor