Rising1 sources· last seen 6h ago· first seen 6h ago

We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost.

We built **YC-Bench**, a benchmark where an LLM plays CEO of a simulated startup over a full year (\~hundreds of turns). It manages employees, picks contracts, handles payroll, and survives a market where \~35% of clients secretly inflate work requirements after you accept their task. Feedback is de

Lead: r/LocalLLaMABigness: 26gavellmsrunglm-5nearly
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
62
105 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works

Receipts (all sources)

We built **YC-Bench**, a benchmark where an LLM plays CEO of a simulated startup over a full year (\~hundreds of turns). It manages employees, picks contracts, handles payroll, and survives a market where \~35% of clients secretly inflate work requirements after you accept their task. Feedback is de