Rising1 sources· last seen 4h ago· first seen 4h ago

The ARC-AGI2 Illusion Of Progress: If Changing the Font Breaks the Model, It Doesn't Understand

Over the past few weeks, with the release of Claude Opus 4.6, Gemini 3.1 Pro, and Gemini 3 Pro Deepthink, all scoring a record-breaking 68%, 77%, and 84% on ARC-AGI2, I became extremely excited and started to believe these new models could kick off recursive self-improvement any minute. Indeed, the

Lead: r/singularityBigness: 28arc-agi2illusionprogresschangingfont
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
68
151 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

score 127

Over the past few weeks, with the release of Claude Opus 4.6, Gemini 3.1 Pro, and Gemini 3 Pro Deepthink, all scoring a record-breaking 68%, 77%, and 84% on ARC-AGI2, I became extremely excited and started to believe these new models could kick off recursive self-improvement any minute. Indeed, the