Cluster1 sources· last seen 20h ago· first seen 20h ago

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Researchers at Carnegie Mellon University built a new benchmark that measures how far AI agents can go when exploiting real vulnerabilities in Google's V8 engine. Mythos leads GPT-5.5 by a wide margin but costs twelve times as much. The article New benchmark shows Claude Mythos and GPT-5.5 can devel

Lead: The DecoderBigness: 4benchmarkshowsanthropicmythosgpt-5

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

RSS · The Decoder · 20h ago

score 143

Related clusters

Claude Mythos has been spotted in Google Vertex

1 sources · bigness 30 · 11h ago

Elite researchers teamed up with Anthropic’s Mythos AI to smash Apple’s multi-billion dollar M5 security and build a kernel exploit in just 5 days.

1 sources · bigness 28 · 1d ago