Blog
GPUs for Local LLM

RTX 5090 vs RTX Pro 6000 for AI: A Benchmark Deep-Dive (and Why VRAM Wins)

A developer ran the RTX 5090, a 1,000W Lightning 5090, and the 96GB RTX Pro 6000 through real AI workloads. The surprising takeaway for buyers: raw VRAM beats raw wattage.

RTX 5090 vs RTX Pro 6000 for AI: A Benchmark Deep-Dive (and Why VRAM Wins)

Everyone benchmarks the RTX 5090 for games. Almost nobody pushes it on AI. So when developer-YouTuber Alex Ziskind lined up a regular RTX 5090, a 1,000-watt "Lightning" 5090, and NVIDIA's RTX Pro 6000 and ran them all through local-LLM and video-generation workloads, the results were a useful reality check for anyone about to spend serious money on a desktop AI rig.

The three contenders

The test bench had a regular RTX 5090 (32 GB VRAM, 600 W cap), a 5090 "Lightning" (liquid-cooled, limited edition, a wild 1,000 W cap), and the RTX Pro 6000 Blackwell — the workstation card with 96 GB of VRAM (but, notably, also capped at 600 W).

What the benchmarks actually showed

A few findings that matter if you're buying for AI, not frame rates:

  • For prompt processing (prefill), the Pro 6000 is in another league — roughly twice as fast as either 5090.
  • But for chatting (token generation), they're basically tied — ~160 tok/s across the board, and the pricey Pro 6000 was actually a hair slowest. So if you just talk to models, the expensive card buys you nothing.
  • VRAM is the real story. A 14B model in full BF16 (~28 GB of weights) simply won't fit on a 32 GB 5090 once you add the runtime overhead — both 5090s collapsed under concurrency, while the 96 GB Pro 6000 didn't flinch (12,000+ tok/s). That headroom is what you're really paying for.
  • The Lightning's 1,000 watts is mostly theater for AI. Across every LLM test it never pulled more than ~695 W (inference is memory-bound); video generation topped out near 650 W. Only a pure matrix-multiply torture test actually hit 1,000 W. The upside: it runs ~20° cooler and quieter than a stock 5090.
NVIDIA GeForce RTX 5090
The RTX 5090 — tap to see current eBay listings

What viewers are saying

  • "Been wanting to see this exact setup comparison — thank you again!" — @JustinFYI
  • "The RTX Pro 6000 runs games just fine as well, you know. Thanks for the video, Alex." — @b1lleman
  • "Excellent video, as always." — @accuratecalcs

The bottom line

If you're training models or running sustained, pure-compute workloads (and want a card that stays cool and quiet), the 1,000 W RTX 5090 Lightning earns its keep. But for most local-AI work — inference and image/video generation — that extra wattage sits unused, and the thing that actually unlocks bigger models is raw VRAM. That's why, for AI-only builds, the 96 GB RTX Pro 6000 keeps winning despite being the "slower" card on paper. Watch Alex's full breakdown above for the numbers.

Get the Vetted Consumer newsletter

Reviews, buying advice, and field notes. Delivered monthly.