Cluster1 sources· last seen 20h ago· first seen 20h ago

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much. The article New benchmark shows Claude Mythos and GPT-5.5 can devel

Lead: The DecoderBigness: 4benchmarkshowsanthropicmythosgpt-5
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
0
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much. The article New benchmark shows Claude Mythos and GPT-5.5 can devel

Related clusters