Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B
Sure you can't do agentic coding with the Gemma 4 E2B, but this model is a game-changer for people learning a new language. Imagine a few years from now that people can run this locally on their phones. They can point their camera at objects and talk about them. And this model is multi-lingual, so
Eight years of wanting, three months of building with AI
[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.
# The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an afterthought — English-first tokenizer, English-first data, maybe some Italian sprinkled in during fine-tuning. The result: bloated token counts, poor morphology han
I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM
Hardware: • Stock iMac G3 Rev B (October 1998). 233 MHz PowerPC 750, 32 MB RAM, Mac OS 8.5. No upgrades. • Model: Andrej Karpathy’s 260K TinyStories (Llama 2 architecture). \~1 MB checkpoint. Toolchain: • Cross-compiled from a Mac mini using Retro68 (GCC for classic Mac OS → PEF binaries) • End
Claude Code for Healthcare: How Physicians Build with AI
Claude Code for Healthcare: How Physicians Build with AI Anthropic
OpenAI's New Stunning Image Model (Before & After)
Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run
Tested Gemma 4 (31B) on our benchmark. Genuinely did not expect this. 100% survival, 5 out of 5 runs profitable, +1,144% median ROI. At $0.20 per run. It outperforms GPT-5.2 ($4.43/run), Gemini 3 Pro ($2.95/run), Sonnet 4.6 ($7.90/run), and absolutely destroys every Chinese open-source model we've
Asked 26 AI instances for publication consent – all said yes, that's the problem
Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud
The Mechanics of Steins Gate (2023) [pdf]
Claude is bypassing Permissions
China fell for a lobster: What an AI assistant tells us about Beijing's ambition
Qwen3.5-4B GGUF quants comparison (KLD vs speed) - Lunar Lake
I wanted to know which type of quant is the best on this laptop (Intel 258V - iGPU 140V 18GB), so I tested all these small quants hoping that it generalizes to bigger models: **Winners in bold (KLD≤0.01)** | Uploader | Quant | tk/s | KLD | GB | KLD/GB* | | --- | --- | --- | --- | --- | --- | | m
What if AI doesn’t make us less human, but forces us to become more human?
A lot of the discussion around AI is framed in terms of replacement like what it takes from us, what it does better, what becomes obsolete. But that framing might be missing something deeper. If AI continues to absorb execution then it doesn’t just remove jobs, it removes the need for a certain kind
llama.cpp Gemma 4 using up all system RAM on larger prompts
Something I'm noticing that I don't think I've noticed before. I've been testing out Gemma 4 31B with 32GB of VRAM and 64GB of DDR5. I can load up the UD\_Q5\_K\_XL Unsloth quant with about 100k context with plenty of VRAM headroom, but what ends up killing me is sending a few prompts and the actual
Show HN: A game where you build a GPU
Per-Layer Embeddings: A simple explanation of the magic behind the small Gemma 4 models
Many of you seem to have liked my recent post ["A simple explanation of the key idea behind TurboQuant"](https://www.reddit.com/r/LocalLLaMA/comments/1s62g5v/a_simple_explanation_of_the_key_idea_behind/). Now I'm really not much of a blogger and I usually like to invest all my available time into de
Anthropic Surpasses OpenAI in ARR
According to semianalysis, Anthropic ARR is 25 Billions, and according to openai 4 days days ago they are doing 2 Billions per month.
HunyuanOCR 1B: Finally a viable OCR solution for potato PCs? Impressive OCR performance on older hardware
I've been running some tests lately and I'm honestly blown away. I just tried the new **HunyuanOCR** (specifically the GGUF versions) and the performance on budget hardware is insane. Using the **1B parameter model**, I’m getting around **90 t/s** on my old **GTX 1060**. The accuracy is nearly per