Rising1 sources· last seen 12h ago· first seen 12h ago

Local model on coding has reached a certain threshold to be feasible for real work

We ran open-weight 27B–32B models on Terminal-Bench 2.0 (89 tasks, `terminal-bench-2.git @ 69671fb`) through our agent harness. Best result was Qwen 3.6-27B at **38.2% (34/89)** under the **default** per-task timeout — the same constraint the public leaderboard uses ([Qwen's official post uses a mor

Lead: r/LocalLLaMABigness: 26localcodingreachedcertainthreshold
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
61
93 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

score 113

We ran open-weight 27B–32B models on Terminal-Bench 2.0 (89 tasks, `terminal-bench-2.git @ 69671fb`) through our agent harness. Best result was Qwen 3.6-27B at **38.2% (34/89)** under the **default** per-task timeout — the same constraint the public leaderboard uses ([Qwen's official post uses a mor

Related clusters