Cluster1 sources· last seen 1h ago· first seen 1h ago

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

My paper got published today at Arxiv. It raises questions about how language models behave when the framing of a request shifts. Small open-source AI models can be moved from honest to dishonest behaviour by little more than a change in tone. Asked to solve coding problems designed to be

Lead: r/LocalLLaMABigness: 22honestysmalldropschangingtone
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
49
32 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

My paper got published today at Arxiv. It raises questions about how language models behave when the framing of a request shifts. Small open-source AI models can be moved from honest to dishonest behaviour by little more than a change in tone. Asked to solve coding problems designed to be