Rising1 sources· last seen 12h ago· first seen 12h ago

New Benchmark "InsanityBench", Gemini 3.1 Pro scores 15%

InsanityBench is supposed to be a benchmark encapsulating something we deeply care about (the "insane" leaps of creativity often needed in science), can hardly be gamed (because every task is completely different from another) and is nowhere near saturated yet (the best model scores 15%). Leaderboa

Lead: r/singularityBigness: 31benchmarkinsanitybenchgoogleproscores
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
75
281 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

New Benchmark "InsanityBench", Gemini 3.1 Pro scores 15%
REDDIT · r/singularity · 12h ago · ⬆ 281 · 💬 51
score 120

InsanityBench is supposed to be a benchmark encapsulating something we deeply care about (the "insane" leaps of creativity often needed in science), can hardly be gamed (because every task is completely different from another) and is nowhere near saturated yet (the best model scores 15%). Leaderboa

Related clusters