Cluster1 sources· last seen 1h ago· first seen 1h ago
Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
My paper got published today at Arxiv. It raises questions about how language models behave when the framing of a request shifts. Small open-source AI models can be moved from honest to dishonest behaviour by little more than a change in tone. Asked to solve coding problems designed to be
Lead: r/LocalLLaMABigness: 22honestysmalldropschangingtone
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
49
32 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
REDDIT · r/LocalLLaMA · 1h ago · ⬆ 32 · 💬 18
score 121
My paper got published today at Arxiv. It raises questions about how language models behave when the framing of a request shifts. Small open-source AI models can be moved from honest to dishonest behaviour by little more than a change in tone. Asked to solve coding problems designed to be