Rising1 sources· last seen 12h ago· first seen 12h ago
New Benchmark "InsanityBench", Gemini 3.1 Pro scores 15%
InsanityBench is supposed to be a benchmark encapsulating something we deeply care about (the "insane" leaps of creativity often needed in science), can hardly be gamed (because every task is completely different from another) and is nowhere near saturated yet (the best model scores 15%). Leaderboa
Lead: r/singularityBigness: 31benchmarkinsanitybenchgoogleproscores
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
75
281 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
New Benchmark "InsanityBench", Gemini 3.1 Pro scores 15%
REDDIT · r/singularity · 12h ago · ⬆ 281 · 💬 51
score 120
InsanityBench is supposed to be a benchmark encapsulating something we deeply care about (the "insane" leaps of creativity often needed in science), can hardly be gamed (because every task is completely different from another) and is nowhere near saturated yet (the best model scores 15%). Leaderboa