Cluster1 sources· last seen 20h ago· first seen 20h ago
New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously
Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much. The article New benchmark shows Claude Mythos and GPT-5.5 can devel
Lead: The DecoderBigness: 4benchmarkshowsanthropicmythosgpt-5
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
0
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously
RSS · The Decoder · 20h ago
score 143
Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much. The article New benchmark shows Claude Mythos and GPT-5.5 can devel