Massive2 sources· last seen 2h ago· first seen 2h ago

New LLM Persuasion Benchmark: models try to move each other's stated positions in multi-turn conversations. GPT-5.4 (high) is the strongest persuader. Claude Opus 4.6 (high) is second. Xiaomi MiMo V2 Pro and Gemini 3.1 Pro Preview are the softest targets.

More info (transcripts, model dossiers, quotes): [https://github.com/lechmazur/persuasion](https://github.com/lechmazur/persuasion) 15 models, 6,296 conversations, 15 topics. Stance is measured on a 7-point scale (-3 to +3), probed 3 times before and 3 times after the conversation. Signed shift &g

Lead: r/singularityBigness: 80llmpersuasionbenchmarkmulti-turn
📡 Coverage
50
2 news sources
🟠 Hacker News
33
7 pts, 0 comments
🔴 Reddit
49
36 upvotes across 1 sub
📈 Google Trends
87
Anthropic: 87/100
Full methodology: How scoring works

Receipts (all sources)

LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models
HACKERNEWS · Hacker News · 2h ago · ▲ 7
score 166

More info (transcripts, model dossiers, quotes): [https://github.com/lechmazur/persuasion](https://github.com/lechmazur/persuasion) 15 models, 6,296 conversations, 15 topics. Stance is measured on a 7-point scale (-3 to +3), probed 3 times before and 3 times after the conversation. Signed shift &g