Massive2 sources· last seen 2h ago· first seen 2h ago

New LLM Persuasion Benchmark: models try to move each other's stated positions in multi-turn conversations. GPT-5.4 (high) is the strongest persuader. Claude Opus 4.6 (high) is second. Xiaomi MiMo V2 Pro and Gemini 3.1 Pro Preview are the softest targets.

More info (transcripts, model dossiers, quotes): [https://github.com/lechmazur/persuasion](https://github.com/lechmazur/persuasion) 15 models, 6,296 conversations, 15 topics. Stance is measured on a 7-point scale (-3 to +3), probed 3 times before and 3 times after the conversation. Signed shift &g

Lead: r/singularityBigness: 80llmpersuasionbenchmarkmulti-turn

Open primary source

📡 Coverage

2 news sources

🟠 Hacker News

7 pts, 0 comments

🔴 Reddit

36 upvotes across 1 sub

📈 Google Trends

Anthropic: 87/100

Full methodology: How scoring works

Receipts (all sources)

LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models

HACKERNEWS · Hacker News · 2h ago · ▲ 7

score 166

New LLM Persuasion Benchmark: models try to move each other's stated positions in multi-turn conversations. GPT-5.4 (high) is the strongest persuader. Claude Opus 4.6 (high) is second. Xiaomi MiMo V2 Pro and Gemini 3.1 Pro Preview are the softest targets.

REDDIT · r/singularity · 2h ago · ⬆ 36 · 💬 9

score 121