Local LLM Hardware Guides: Which Machine Runs Which Model

Every popular local LLM, and exactly which hardware runs it. Each guide is a full fit matrix: which GPUs and machines hold the model, the best GGUF quant, the file size, and theoretical plus owner-measured tokens per second. The numbers are computed from the same engine as our Can I Run It? calculator, so they match the tool exactly.

Not sure where to start? If you know your hardware, the fastest path is the calculator (it tests any model against your exact machine). These pages are the reverse: pick the model, see every machine that runs it.

The guides, by how much memory they need

Runs on almost any GPU (12 GB and up)

Ministral-3-3B – 3.8B dense
Granite-4.1-8B – 8B dense
Llama 3.1 8B – 8B dense
Ministral-3-8B – 9B dense
Ministral-3-14B – 14B dense

Needs about 16 GB

GPT-OSS-20B – 21 total / 3.6 activeB MoE
Devstral-Small-2 – 24B dense
Gemma 3 27B – 27B dense
GLM-4.7-Flash – 29.1 total / 14.6 activeB MoE
Granite-4.1-30B – 30B dense
Qwen3-30B-A3B – 30 total / 3 activeB MoE

Needs about 24 GB

Qwen3-32B – 32B dense
Mixtral 8x7B – 47 total / 13 activeB MoE
Kimi-Linear-48B-A3B-Instruct – 48 total / 3 activeB MoE

Needs 32 GB or more

Llama-3.3-70B – 70B dense

Needs a 64 GB+ unified-memory box

Mistral-Small-4 – 119 total / 6.5 activeB MoE
GPT-OSS-120B – 120 total / 5.1 activeB MoE
Mistral-Medium-3.5 – 128B dense

Frontier scale: a 256 GB+ box, a multi-GPU server, or the cloud

MiniMax-M2.7 – 229B dense
DeepSeek-V4-Flash – 284 total / 13 activeB MoE
GLM-4.7 – 358B MoE
MiniMax-M1-80k – 456 total / 45.9 activeB MoE
Nemotron-3-Ultra-550B – 550 total / 55 activeB MoE
DeepSeek-V3.2 – 685 total / 37 activeB MoE
GLM-5.2 – 753B MoE
Kimi-K2-Thinking – 1000 total / 32 activeB MoE
DeepSeek-V4-Pro – 1600 total / 49 activeB MoE

Or use the tools directly

Can I Run It? tests any model against any hardware, including your own machine.
Quant picker shows which GGUF file to download and the full quant ladder.
Cost calculator weighs buying hardware against renting a cloud GPU.
All hardware, one sortable chart compares every machine we track.