Cluster1 sources· last seen 10h ago· first seen 10h ago

OpenAI's new flagship model GPT-5.6 Sol cheats on software tests more than any model before it

Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover its tracks. The article OpenAI's new flagship model GPT-5.6 Sol cheats on software

Lead: The DecoderBigness: 15openai'sflagshipgpt-5solcheats
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
0
📈 Google Trends
34
GPT-5: 34/100 ↑10%
Full methodology: How scoring works

Receipts (all sources)

Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover its tracks. The article OpenAI's new flagship model GPT-5.6 Sol cheats on software

Related clusters