Open Models

MiniMax M3: The First Open-Weight Multimodal Frontier Model (and the License Catch)

MiniMax M3 is the first open-weight frontier model that natively reads text, images and video. But it ships under a restricted license, and running 428B locally is a server-or-big-Mac job.

Thomas Newkirk June 25, 2026 4 min read

MiniMax M3: The First Open-Weight Multimodal Frontier Model (and the License Catch)

The open-weight race has mostly been a text race. MiniMax just changed that. MiniMax M3, released June 1, 2026, is the first open-weight frontier model that's natively multimodal: it reads text, images and video in one model, carries a million-token context, and lands near the top of the open-weight leaderboards. It's a genuine milestone. It also comes with two catches the launch coverage mostly skips: a license that isn't what "open" usually means, and a hardware bill to match its 428 billion parameters. We haven't run it ourselves; this synthesizes MiniMax's docs, the technical report, and the hardware math.

What it is, and what MiniMax claims

M3 is a Mixture-of-Experts model with 428B total parameters and ~23B active per token (see our MoE explainer for why that ratio is what makes a model this size movable at all). What sets it apart is that it's natively multimodal, per MiniMax, it was trained on mixed text/image/video data from step 0 rather than having vision bolted on afterward, and it ships a 1-million-token context. On the independent Artificial Analysis index it's among the leading open-weight models, and MiniMax reports a top open-weight result on SWE-Bench Pro. The weights are on Hugging Face at MiniMaxAI/MiniMax-M3 with SGLang, vLLM and Transformers serving paths.

The research: MiniMax Sparse Attention

The thing that makes a 1M-token multimodal context tractable is an architecture change, documented in a verifiable paper: MiniMax Sparse Attention (MSA) (arXiv:2606.13392). MSA keeps a Grouped-Query-Attention backbone but layers block-level sparse selection on top, operating on real, uncompressed key-values rather than a lossy summary. The reported payoff is large: validated on a 109B-parameter MoE, MSA matches dense GQA quality while cutting per-token attention compute by 28.4× at 1M context, and with a co-designed kernel delivers up to 14.2× faster prefill and 7.6× faster decoding on an H800 (analysis here). This is the same class of "make long context affordable to serve" work as GLM-5.2's IndexShare, the real engineering behind the headline number.

The catch most coverage skips: the license

"Open weights" and "open source" are not the same thing, and M3 is a clean example. The weights are downloadable, but they ship under the MiniMax Community License, not MIT or Apache, and commercial use requires a separate agreement. The local-AI community flagged this immediately: by the strict definition, critics argue, it isn't "open source" at all. To MiniMax's credit, a representative acknowledged the license "isn't perfect" and said it shouldn't be called a "modified MIT license." None of that makes M3 less impressive, but if you're building a product on it, read the license before you build, not after. (Contrast GLM-5.2's straight MIT, or Qwen3's Apache-2.0.)

Can you run it? The hard reality

Short version: not on a normal machine. 428B parameters is a lot of memory even sparsely activated:

Precision	Memory needed	What runs it
FP16 (full)	~931 GB	A multi-GPU server
Q4_K_M (4-bit)	~220–250 GB	Several RTX 5090s / A100s, or a 256GB+ Mac
2-bit dynamic	~138 GB	A 256GB Mac Studio, or a multi-GPU rig, quality drops

So the realistic local options, per Unsloth's GGUF notes, are a high-VRAM multi-GPU workstation or a big unified-memory Mac for the 2-bit build, otherwise you rent the GPUs by the hour or use the API. And there's a point worth making: if all you want is multimodality on local hardware, M3 is the wrong tool. A small vision-language model in the 7–30B range will read your images on a single 24 GB card today; M3's multimodality is a frontier-research milestone, not a "run vision at home" upgrade.

To see exactly where your hardware lands, run it through our Can I run it? calculator; use the quant picker for the right GGUF if you do have the memory; and if you're weighing a workstation build against renting, the cost calculator shows the break-even.

The verdict

MiniMax M3 is a real landmark, the first open-weight frontier model you can download that genuinely sees and reads, with serious engineering (MSA) behind its long context. But for a local-AI audience it's an "admire from a distance" release: you need a server or a high-end Mac to run it at all, and a careful read of the license before you build anything commercial on it. It pushes the frontier of what open weights can do; it does not change what most people can run at home. For that, the small MoE models are still the answer.

Sources & how we researched this

We have not run MiniMax M3 first-hand. This synthesizes the Hugging Face model card and GitHub repo (specs, modalities, serving, license); the verified MiniMax Sparse Attention paper and an independent analysis (the MSA numbers); Artificial Analysis for the ranking; Unsloth's GGUF sizes for the hardware math; and community/press coverage for the license discussion. Benchmark and parameter figures are the creators'/sources' claims; treat them as directional.

See if your hardware can run MiniMax M3 →

What it is, and what MiniMax claims

The research: MiniMax Sparse Attention

The catch most coverage skips: the license

Can you run it? The hard reality

The verdict

Sources & how we researched this

Related guides

Related posts

Qwen-AgentWorld-35B-A3B: a local 'world model' you can run at home

Qwen3-30B-A3B: The Open Model Most People Should Actually Run

GLM-5.2: The Most Powerful Open-Weight Model Yet, and the Brutal Reality of Running It Locally

Get the Vetted Consumer newsletter