Cluster1 sources· last seen 3h ago· first seen 3h ago
Gemma-4-31B NVFP4 inference numbers on 1x RTX Pro 6000
Ran a quick inference sweep on gemma 4 31B in NVFP4 (using [nvidia/Gemma-4-31B-IT-NVFP4](https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4)). The NVFP4 checkpoint is 32GB, half of the BF16 size from google (63GB), likely a mix of BF16 and FP4 roughly equal to FP8 in size. This model uses a ton of V
Lead: r/LocalLLaMABigness: 21gemma-4-31bnvfp4inferencenumbersrtx
📡 Coverage
10
1 news source
🟠 Hacker News
0
🔴 Reddit
47
29 upvotes across 1 sub
📈 Google Trends
0
Full methodology: How scoring works
Receipts (all sources)
Gemma-4-31B NVFP4 inference numbers on 1x RTX Pro 6000
REDDIT · r/LocalLLaMA · 3h ago · ⬆ 29 · 💬 15
score 118
Ran a quick inference sweep on gemma 4 31B in NVFP4 (using [nvidia/Gemma-4-31B-IT-NVFP4](https://huggingface.co/nvidia/Gemma-4-31B-IT-NVFP4)). The NVFP4 checkpoint is 32GB, half of the BF16 size from google (63GB), likely a mix of BF16 and FP4 roughly equal to FP8 in size. This model uses a ton of V