Big1 sources· last seen 13h ago· first seen 13h ago

Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge

Just finished a 3-way head-to-head. Sharing the raw results because this sub has been good about poking holes in methodology, and I'd rather get that feedback than pretend my setup is perfect. **Setup** * 30 questions, 6 per category (code, reasoning, analysis, communication, meta-alignment) * All

Lead: r/LocalLLaMABigness: 54gemma31b26b-a4bqwen27b
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
66
125 upvotes across 1 sub
📈 Google Trends
72
Claude AI: 72/100
Full methodology: How scoring works

Receipts (all sources)

Just finished a 3-way head-to-head. Sharing the raw results because this sub has been good about poking holes in methodology, and I'd rather get that feedback than pretend my setup is perfect. **Setup** * 30 questions, 6 per category (code, reasoning, analysis, communication, meta-alignment) * All

Related clusters