Cluster1 sources· last seen 2h ago· first seen 2h ago

[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage

Config * CtxSize: 131,072 * GpuLayers: 99 * CpuMoeLayers: 38 * Threads: 16 * BatchSize/UBatchSize: 4096/4096 * CacheType K/V: q8\_0 * Tool Context: file mode (tools.kilocode.official.md) |Metric|M Model|XL Model|Difference| |:-|:-|:-|:-| |**Avg Tokens/sec**|28.92|29.78|**+0.86 (+3.0%)**| |**Median

Lead: r/LocalLLaMABigness: 14qwen335ba3busedtop

Open primary source

📡 Coverage

1 news source

🟠 Hacker News

🔴 Reddit

6 upvotes across 1 sub

📈 Google Trends

Full methodology: How scoring works

Receipts (all sources)

[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage

REDDIT · r/LocalLLaMA · 2h ago · ⬆ 6

score 110