Big1 sources· last seen 6h ago· first seen 6h ago
SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash and More
Hi, We’ve updated the **SWE-rebench leaderboard** with our **February runs** on **57 fresh GitHub PR tasks** (restricted to PRs created in the previous month). The setup is standard SWE-bench: models read real PR issues, edit code, run tests, and must make the full suite pass. Key observations: *
Lead: r/LocalLLaMABigness: 54swe-rebenchleaderboardfeb2026gpt-5
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
63
97 upvotes across 1 sub
📈 Google Trends
76
Gemini AI: 76/100 ↑9%
Full methodology: How scoring works
Receipts (all sources)
SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash and More
REDDIT · r/LocalLLaMA · 6h ago · ⬆ 97 · 💬 57
score 123
Hi, We’ve updated the **SWE-rebench leaderboard** with our **February runs** on **57 fresh GitHub PR tasks** (restricted to PRs created in the previous month). The setup is standard SWE-bench: models read real PR issues, edit code, run tests, and must make the full suite pass. Key observations: *