Rising1 sources· last seen 1h ago· first seen 1h ago
I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.
Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from 1980 to 2013. Here’s why it might matter for local model wo
Lead: r/LocalLLaMABigness: 28built103b-tokenusenetcorpus1980
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
66
127 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.
REDDIT · r/LocalLLaMA · 1h ago · ⬆ 127 · 💬 56
score 130
Posted this to r/MachineLearning a couple weeks ago (30K views, 100+ upvotes) and have been meaning to share it here where the fine-tuning angle is more directly relevant. I spent years building and processing a complete Usenet corpus from 1980 to 2013. Here’s why it might matter for local model wo