Every popular local LLM, and exactly which hardware runs it. Each guide is a full fit matrix: which GPUs and machines hold the model, the best GGUF quant, the file size, and theoretical plus owner-measured tokens per second. The numbers are computed from the same engine as our Can I Run It? calculator, so they match the tool exactly.
Not sure where to start? If you know your hardware, the fastest path is the calculator (it tests any model against your exact machine). These pages are the reverse: pick the model, see every machine that runs it.
The guides, by how much memory they need
Runs on almost any GPU (12 GB and up)
- Ministral-3-3B – 3.8B dense
- Granite-4.1-8B – 8B dense
- Llama 3.1 8B – 8B dense
- Ministral-3-8B – 9B dense
- Ministral-3-14B – 14B dense
Needs about 16 GB
- GPT-OSS-20B – 21 total / 3.6 activeB MoE
- Devstral-Small-2 – 24B dense
- Gemma 3 27B – 27B dense
- GLM-4.7-Flash – 29.1 total / 14.6 activeB MoE
- Granite-4.1-30B – 30B dense
- Qwen3-30B-A3B – 30 total / 3 activeB MoE
Needs about 24 GB
- Qwen3-32B – 32B dense
- Mixtral 8x7B – 47 total / 13 activeB MoE
- Kimi-Linear-48B-A3B-Instruct – 48 total / 3 activeB MoE
Needs 32 GB or more
- Llama-3.3-70B – 70B dense
Needs a 64 GB+ unified-memory box
- Mistral-Small-4 – 119 total / 6.5 activeB MoE
- GPT-OSS-120B – 120 total / 5.1 activeB MoE
- Mistral-Medium-3.5 – 128B dense
Frontier scale: a 256 GB+ box, a multi-GPU server, or the cloud
- MiniMax-M2.7 – 229B dense
- DeepSeek-V4-Flash – 284 total / 13 activeB MoE
- GLM-4.7 – 358B MoE
- MiniMax-M1-80k – 456 total / 45.9 activeB MoE
- Nemotron-3-Ultra-550B – 550 total / 55 activeB MoE
- DeepSeek-V3.2 – 685 total / 37 activeB MoE
- GLM-5.2 – 753B MoE
- Kimi-K2-Thinking – 1000 total / 32 activeB MoE
- DeepSeek-V4-Pro – 1600 total / 49 activeB MoE
Or use the tools directly
- Can I Run It? tests any model against any hardware, including your own machine.
- Quant picker shows which GGUF file to download and the full quant ladder.
- Cost calculator weighs buying hardware against renting a cloud GPU.
- All hardware, one sortable chart compares every machine we track.
