Rising1 sources· last seen 3h ago· first seen 3h ago

GPT-5.5 was used to flag fatal errors in FrontierMath problems

FrontierMath is supposed to be one of the hard benchmarks for frontier models, and now Epoch is saying an AI-assisted review found fatal errors in about a third of Tiers 1-4. Noam Brown says the initial flags came from GPT-5.5. Obviously we’ll have to wait for the corrected scores, but this is a p

Lead: r/singularityBigness: 26gpt-5usedflagfatalerrors

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

96 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

GPT-5.5 was used to flag fatal errors in FrontierMath problems

REDDIT · r/singularity · 3h ago · ⬆ 96 · 💬 16

score 125