Rising1 sources· last seen 3h ago· first seen 3h ago

GPT-5.5 was used to flag fatal errors in FrontierMath problems

FrontierMath is supposed to be one of the hard benchmarks for frontier models, and now Epoch is saying an AI-assisted review found fatal errors in about a third of Tiers 1-4. Noam Brown says the initial flags came from GPT-5.5. Obviously we’ll have to wait for the corrected scores, but this is a p

Lead: r/singularityBigness: 26gpt-5usedflagfatalerrors
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
61
96 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

GPT-5.5 was used to flag fatal errors in FrontierMath problems
REDDIT · r/singularity · 3h ago · ⬆ 96 · 💬 16
score 125

FrontierMath is supposed to be one of the hard benchmarks for frontier models, and now Epoch is saying an AI-assisted review found fatal errors in about a third of Tiers 1-4. Noam Brown says the initial flags came from GPT-5.5. Obviously we’ll have to wait for the corrected scores, but this is a p