Big2 sources· last seen 17h ago· first seen 17h ago

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA

I benchmarked vision-capable LLMs (the "just attach the PDF and let the model read it" pattern) against OCR-based pipelines on 30 long, image-heavy PDFs from MMLongBench-Doc ([https://github.com/mayubo2333/MMLongBench-Doc](https://github.com/mayubo2333/MMLongBench-Doc)). There were 171 questions in

Lead: r/artificialBigness: 62vision-capablellmsocrlong-documentincluding

Open primary source

📡 Coverage

2 news sources

🟠 Hacker News

🔴 Reddit

67 upvotes across 2 subs

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA

REDDIT · r/artificial · 17h ago · ⬆ 40 · 💬 17

score 100

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA

REDDIT · r/LocalLLaMA · 17h ago · ⬆ 27 · 💬 14

score 98