← Back to Feed

Cluster1 sources· last seen 2h ago· first seen 2h ago

llama.cpp DeepSeek v4 Flash experimental inference

Hi, [here you can find](https://github.com/antirez/llama.cpp-deepseek-v4-flash) experimental llama.cpp support for DeepSeek v4, and [here](https://huggingface.co/antirez/deepseek-v4-gguf) there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even quantized at

Lead: r/LocalLLaMABigness: 20metacppdeepseekflashexperimental

Open primary source

📡 Coverage

10

1 news source

🟠 Hacker News

0

🔴 Reddit

43

15 upvotes across 1 sub

📈 Google Trends

0

Full methodology: How scoring works

Receipts (all sources)

llama.cpp DeepSeek v4 Flash experimental inference

REDDIT · r/LocalLLaMA · 2h ago · ⬆ 15 · 💬 24

score 114

Hi, [here you can find](https://github.com/antirez/llama.cpp-deepseek-v4-flash) experimental llama.cpp support for DeepSeek v4, and [here](https://huggingface.co/antirez/deepseek-v4-gguf) there is the GGUF you can use to run the inference with "just" (lol) 128GB of RAM. The model, even quantized at

Related clusters

Benchmark: Windows 11 vs Lubuntu 26.04 on Llama.cpp (RTX 5080 + i9-14900KF). I didn't expect the gap to be this big.

1 sources · bigness 23 · 4h ago

Experts-Volunteers needed for Vulkan on ik_llama.cpp

1 sources · bigness 22 · 8h ago

Will llama.cpp multislot improve speed?

1 sources · bigness 17 · 5h ago

[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

1 sources · bigness 4 · 1d ago