On June 26, 2026, OpenAI previewed GPT-5.6, a three-model family (Sol, Terra, and Luna) that it calls its most capable work yet. Here is the catch most coverage buried: you cannot use it. The preview went to a small group of partners whose participation was shared with the U.S. government, pending a national-security review, with general availability promised only in "the coming weeks." Even once it opens up, it is a closed, API-only model: you rent it by the token, and every prompt you send leaves your machine.
So this is a fair moment to ask the question this site exists to answer. If the frontier just moved again behind a paywall and a government gate, how far behind is the model you can download and run yourself today, what does it take to run it, and what does that cost compared with paying GPT-5.6 rates? We have not run GPT-5.6 (almost nobody has), so what follows synthesizes the announcement, independent capability tracking, and the hardware math, with every source linked.
What GPT-5.6 is, and why you can't have it
GPT-5.6 comes in three tiers: Sol (the flagship, for the hardest workloads), Terra (balanced, for everyday work), and Luna (fast and cheap). OpenAI claims gains in coding, biology, and cybersecurity, and Sol adds two new reasoning settings: a "max" effort that lets it think longer, and an "ultra" mode that spins up subagents to chew through complex tasks. The published API pricing, per million tokens:
| Model | Input / 1M | Output / 1M |
|---|---|---|
| GPT-5.6 Sol | $5.00 | $30.00 |
| GPT-5.6 Terra | $2.50 | $15.00 |
| GPT-5.6 Luna | $1.00 | $6.00 |
The release itself is the headline. As Axios and CNN reported, the model is going out under restrictions, to a short list of approved organizations, while it clears additional national-security review. For everyone else, GPT-5.6 is a press release. That is the practical backdrop for going local: the strongest closed models are getting harder to access, more expensive per token, and more tightly coupled to sending your data to someone else's servers.
How far behind is open, really?
Less than you would guess. The cleanest measurement comes from Epoch AI, which tracks a single Capabilities Index across models. Their finding for 2026: since January, the best open-weight models have trailed the best closed models by an average of about four months, or roughly 8 index points (measured January 1 to May 28, 2026). That gap has widened by a month versus their 2023-2025 reading, so this is not a victory lap. But four months is a remarkably short lag for something you can download for free and run on your own hardware.
The gap is also uneven by task. A Q2 2026 analysis by Digital Applied found that on coding benchmarks the gap has "effectively closed," while closed models still lead reasoning by a few points (3 to 8 points on tests like GPQA Diamond and Humanity's Last Exam) and keep a clearer lead on multimodal. In concrete terms from that same write-up: an open model like Gemma 4 31B already posts 84.3% on GPQA Diamond, and an open coding model like MiniMax M2.7 lands at 56.22% on SWE-Pro. These are not frontier-beating numbers, but for most real work they are well past the bar where the closed premium pays for itself.
The open models leading right now
Mid-2026, the front of the open pack is crowded, and most of it ships from Chinese labs. The current standouts, by reputation on the open-weight leaderboards (the Artificial Analysis Intelligence Index and community boards like BenchLM):
- GLM-5.2 (Z.ai), a roughly 750-billion-parameter mixture-of-experts model that tops several open-weight intelligence rankings. The brutal reality of running it locally is the whole story, which we covered separately.
- DeepSeek-V4, the value leader, frontier-class on coding for a fraction of closed-model pricing.
- Kimi K2 (Moonshot), the agentic-stability pick, consistent tool calling across long sessions.
- Qwen3 (Alibaba), the strongest open family on math and reasoning, and the source of the small, fast 30B-class MoE models that real people run at home.
- MiniMax M3, which we flagged as the first open-weight model to combine frontier coding, long context, and native multimodality in one release.
The catch, and the reason this is a hardware site and not a leaderboard site: the models at the very top of that list are enormous. A 750-billion-parameter model does not fit on your gaming GPU.
What you can run, in two tiers
Sort the runnable options into two buckets.
Tier 1: frontier-class open, but you need a real machine (or you rent). GLM-5.2, DeepSeek-V4, and Kimi K2 are the models that get closest to GPT-5.6, and they are the hardest to host. Even compressed to 4-bit, a 700-billion-parameter mixture-of-experts model needs hundreds of gigabytes of fast memory, which means a multi-GPU server, a maxed-out unified-memory box, or renting cloud GPUs by the hour. Our per-model pages lay out the specifics: GLM-5.2, DeepSeek-V4, and Kimi K2. For most individuals, the real verdict on this tier is: run it on rented hardware when you need it, and read our note on when renting beats owning.
Tier 2: very good, and it fits on hardware you can buy. This is where local AI gets real for normal budgets. The sweet spot in mid-2026 is the mixture-of-experts design, where a model is large in total but only activates a few billion parameters per token, so it loads big but runs fast.
| Model class | What it takes | Gets you |
|---|---|---|
| gpt-oss-120B (MoE) | A 64-128 GB unified-memory box, an 80 GB card, or dual 24 GB GPUs | The strongest "runs at home" open model, frontier-adjacent on many tasks |
| 30-35B-A3B (MoE), e.g. Qwen3 / AgentWorld | A 24 GB card or a 32 GB-plus unified machine | Fast, capable daily-driver quality at low cost |
| 70B dense, e.g. Llama 3.3 | 48 GB across two cards, or a 64 GB Mac | Solid general model, slower than the MoEs |
None of these will top GPT-5.6 Sol on a hard reasoning benchmark. All of them will handle the bulk of everyday coding, writing, summarizing, and retrieval that most people reach for a frontier API to do. The mechanics of why "120B" can fit where "70B" struggles come down to active parameters and quantization, which we walk through in the MoE explainer, the quantization guide, and the VRAM-sizing explainer.
The cost case: self-host versus GPT-5.6 rates
This is where the closed frontier looks least attractive for steady, high-volume use. Take a heavy workload of 20 million output tokens a month (an agent loop, a coding assistant running all day). At GPT-5.6 Sol's $30 per million, that is $600 a month in output alone, before input tokens. Even Luna, the cheap tier, runs $120 a month at that volume.
Now run a Tier 2 model locally. A 30B-A3B model on a single high-end consumer GPU generates on the order of 100 tokens per second. One million tokens then takes under three hours; at roughly 575 watts and about $0.17 per kWh, that is close to $0.30 of electricity per million tokens. Twenty million tokens a month costs you single-digit dollars in power. The trade is upfront: you buy the hardware once, then the marginal cost per token is rounding error, and your data never leaves the building. That privacy point is not abstract here, the whole reason GPT-5.6 is gated is who gets to see what it can do.
The break-even depends entirely on your volume, your hardware price, and your local power cost, so do not take our example as your answer. Put your real numbers into the cost calculator, which compares buy versus rent versus API directly. For bursty or occasional heavy use, renting a cloud GPU often wins; for steady daily use, owned hardware usually pays for itself within months.
So what should you do?
| If you... | The move |
|---|---|
| Need the absolute best and can send data to an API | Wait for GPT-5.6 general access (or use the current closed frontier). The top is still closed. |
| Want frontier-adjacent quality and own a 24 GB-plus GPU or a 64 GB-plus unified box | Run gpt-oss-120B or a 30-35B-A3B model locally. This covers most real work. |
| Want the strongest open model and have a server budget, or rent | GLM-5.2, DeepSeek-V4, or Kimi K2 on multi-GPU or rented cloud. |
| Care most about privacy or steady high volume | Self-host. The per-token cost and the data-control case both favor local. |
The pattern of 2026 is clear enough: the closed frontier keeps inching ahead and getting harder to reach, while open weights stay roughly a quarter behind and run on hardware you control. GPT-5.6 being locked to a handful of approved companies is the sharpest version of that story yet. The good news is that "a quarter behind the frontier" is, for the first time, something you can download and run in your own house.
Sources and how we researched this
We have not tested GPT-5.6 first-hand. This piece synthesizes: OpenAI's GPT-5.6 Sol preview and the reporting on its restricted release from Axios, CNN, and 9to5Mac (the source for the per-tier pricing); the open-versus-closed gap measurement from Epoch AI; the per-task gap breakdown and the cited open-model scores from Digital Applied; and open-weight rankings from the Artificial Analysis Intelligence Index and BenchLM. The hardware fit and cost figures are our own calculations using the model sizes and the assumptions stated above. Your numbers will vary by model, quant, runtime, and power cost.
Related guides
- Can I run it? calculator, check whether a specific open model fits your exact machine
- Quant picker, find the right GGUF file to download for your hardware
- Cost calculator, buy vs rent vs API for any model and workload
- Mixture-of-Experts, explained, why a 120B model can run where a 70B struggles