Cluster1 sources· last seen 1h ago· first seen 1h ago

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

My paper got published today at Arxiv. It raises questions about how language models behave when the framing of a request shifts. Small open-source AI models can be moved from honest to dishonest behaviour by little more than a change in tone. Asked to solve coding problems designed to be

Lead: r/LocalLLaMABigness: 22honestysmalldropschangingtone

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

32 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

REDDIT · r/LocalLLaMA · 1h ago · ⬆ 32 · 💬 18

score 121