Big1 sources· last seen 13h ago· first seen 13h ago

Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge

Just finished a 3-way head-to-head. Sharing the raw results because this sub has been good about poking holes in methodology, and I'd rather get that feedback than pretend my setup is perfect. **Setup** * 30 questions, 6 per category (code, reasoning, analysis, communication, meta-alignment) * All

Lead: r/LocalLLaMABigness: 54gemma31b26b-a4bqwen27b

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

125 upvotes across 1 sub

📈 Google Trends

Claude AI: 72/100

Full methodology: How scoring works

Receipts (all sources)

Gemma 4 31B vs Gemma 4 26B-A4B vs Qwen 3.5 27B — 30-question blind eval with Claude Opus 4.6 as judge

REDDIT · r/LocalLLaMA · 13h ago · ⬆ 125 · 💬 64

score 113

Related clusters

Gemma 4 31B beats several frontier models on the FoodTruck Bench

1 sources · bigness 29 · 1d ago