Cluster1 sources· last seen 2h ago· first seen 2h ago
Running Llama 3.2 1B entirely on an AMD NPU on Linux (Strix Halo, IRON framework, 4.4 tok/s)
I got Llama 3.2 1B running inference entirely on the AMD NPU on Linux. Every operation (attention, GEMM, RoPE, RMSNorm, SiLU, KV cache) runs on the NPU; no CPU or GPU fallback. As far as I can tell, this is the first time anyone has publicly documented this working on Linux. ## Hardware - AMD Ryze
Lead: r/LocalLLaMABigness: 17runningmetaentirelyamdnpu
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
35
13 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
Running Llama 3.2 1B entirely on an AMD NPU on Linux (Strix Halo, IRON framework, 4.4 tok/s)
REDDIT · r/LocalLLaMA · 2h ago · ⬆ 13 · 💬 2
score 114
I got Llama 3.2 1B running inference entirely on the AMD NPU on Linux. Every operation (attention, GEMM, RoPE, RMSNorm, SiLU, KV cache) runs on the NPU; no CPU or GPU fallback. As far as I can tell, this is the first time anyone has publicly documented this working on Linux. ## Hardware - AMD Ryze