The open-weight race has mostly been a text race. MiniMax just changed that. MiniMax M3, released June 1, 2026, is the first open-weight frontier model that's natively multimodal: it reads text, images and video in one model, carries a million-token context, and lands near the top of the open-weight leaderboards. It's a genuine milestone. It also comes with two catches the launch coverage mostly skips: a license that isn't what "open" usually means, and a hardware bill to match its 428 billion parameters. We haven't run it ourselves; this synthesizes MiniMax's docs, the technical report, and the hardware math.
What it is, and what MiniMax claims
M3 is a Mixture-of-Experts model with 428B total parameters and ~23B active per token (see our MoE explainer for why that ratio is what makes a model this size movable at all). What sets it apart is that it's natively multimodal, per MiniMax, it was trained on mixed text/image/video data from step 0 rather than having vision bolted on afterward, and it ships a 1-million-token context. On the independent Artificial Analysis index it's among the leading open-weight models, and MiniMax reports a top open-weight result on SWE-Bench Pro. The weights are on Hugging Face at MiniMaxAI/MiniMax-M3 with SGLang, vLLM and Transformers serving paths.
The research: MiniMax Sparse Attention
The thing that makes a 1M-token multimodal context tractable is an architecture change, documented in a verifiable paper: MiniMax Sparse Attention (MSA) (arXiv:2606.13392). MSA keeps a Grouped-Query-Attention backbone but layers block-level sparse selection on top, operating on real, uncompressed key-values rather than a lossy summary. The reported payoff is large: validated on a 109B-parameter MoE, MSA matches dense GQA quality while cutting per-token attention compute by 28.4× at 1M context, and with a co-designed kernel delivers up to 14.2× faster prefill and 7.6× faster decoding on an H800 (analysis here). This is the same class of "make long context affordable to serve" work as GLM-5.2's IndexShare, the real engineering behind the headline number.
The catch most coverage skips: the license
"Open weights" and "open source" are not the same thing, and M3 is a clean example. The weights are downloadable, but they ship under the MiniMax Community License, not MIT or Apache, and commercial use requires a separate agreement. The local-AI community flagged this immediately: by the strict definition, critics argue, it isn't "open source" at all. To MiniMax's credit, a representative acknowledged the license "isn't perfect" and said it shouldn't be called a "modified MIT license." None of that makes M3 less impressive, but if you're building a product on it, read the license before you build, not after. (Contrast GLM-5.2's straight MIT, or Qwen3's Apache-2.0.)
Can you run it? The hard reality
Short version: not on a normal machine. 428B parameters is a lot of memory even sparsely activated:
| Precision | Memory needed | What runs it |
|---|---|---|
| FP16 (full) | ~931 GB | A multi-GPU server |
| Q4_K_M (4-bit) | ~220–250 GB | Several RTX 5090s / A100s, or a 256GB+ Mac |
| 2-bit dynamic | ~138 GB | A 256GB Mac Studio, or a multi-GPU rig, quality drops |
So the realistic local options, per Unsloth's GGUF notes, are a high-VRAM multi-GPU workstation or a big unified-memory Mac for the 2-bit build, otherwise you rent the GPUs by the hour or use the API. And there's a point worth making: if all you want is multimodality on local hardware, M3 is the wrong tool. A small vision-language model in the 7–30B range will read your images on a single 24 GB card today; M3's multimodality is a frontier-research milestone, not a "run vision at home" upgrade.
To see exactly where your hardware lands, run it through our Can I run it? calculator; use the quant picker for the right GGUF if you do have the memory; and if you're weighing a workstation build against renting, the cost calculator shows the break-even.
The verdict
MiniMax M3 is a real landmark, the first open-weight frontier model you can download that genuinely sees and reads, with serious engineering (MSA) behind its long context. But for a local-AI audience it's an "admire from a distance" release: you need a server or a high-end Mac to run it at all, and a careful read of the license before you build anything commercial on it. It pushes the frontier of what open weights can do; it does not change what most people can run at home. For that, the small MoE models are still the answer.
Sources & how we researched this
We have not run MiniMax M3 first-hand. This synthesizes the Hugging Face model card and GitHub repo (specs, modalities, serving, license); the verified MiniMax Sparse Attention paper and an independent analysis (the MSA numbers); Artificial Analysis for the ranking; Unsloth's GGUF sizes for the hardware math; and community/press coverage for the license discussion. Benchmark and parameter figures are the creators'/sources' claims; treat them as directional.