Rising1 sources· last seen 18h ago· first seen 18h ago

On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7

Link to tweets: https://x.com/KLieret/status/2054215545663144217?s=20 Link to GitHub: [https://github.com/facebookresearch/ProgramBench/](https://github.com/facebookresearch/ProgramBench/) Link to ProgramBench website: [https://programbench.com/blog/gpt-5-5-first-solve/](https://programbench.co

Lead: r/singularityBigness: 32difficultswebenchmarkprogrambenchopenai
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
80
437 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

Link to tweets: https://x.com/KLieret/status/2054215545663144217?s=20 Link to GitHub: [https://github.com/facebookresearch/ProgramBench/](https://github.com/facebookresearch/ProgramBench/) Link to ProgramBench website: [https://programbench.com/blog/gpt-5-5-first-solve/](https://programbench.co