How we research and score hardware

Vetted Consumer reviews local-LLM hardware without testing most of it first-hand, and we say so on every page. What we do instead is aggregation done carefully: we cross-check what owners report against independent lab tests and the published research, cite everything, and keep the incentives clean. This page is the full method.

Where our information comes from

Owner reports, threads on r/LocalLLaMA, r/LocalLLM, r/ollama, r/MachineLearning and product-specific communities (r/MacStudio, r/framework, NVIDIA Jetson forums). We prioritize accounts from people who own the hardware, especially people who own both sides of a comparison, and we link every quote to its source so you can verify it.
Independent technical testing, first-hand lab benchmarks from outlets like The Register, Phoronix, Chips and Cheese, ServeTheHome, and established YouTube reviewers. These are labeled as lab tests, distinct from owner sentiment.
Primary sources, peer-reviewed papers (arXiv, with IDs we verify before citing), official vendor spec sheets and documentation, and project docs for llama.cpp, vLLM, Ollama, and MLX.

How sources are weighted

Specs come from vendors. Performance claims need at least one of: multiple independent owner reports agreeing, or an independent lab test. Where owners and reviewers disagree, we say so. Single-seed benchmarks, marketing numbers, and unverifiable claims get flagged or excluded, we have skipped widely-shared benchmark posts because their own comment sections demonstrated the methodology was unsound. We always include critical threads for balance: if owners are unhappy, that goes in the article.

Scoring

Products in our database carry a 0–10 score only after a full review. Until then they're marked pending review and show no number, we don't publish fake precision. Scores weigh: local-LLM capability (memory capacity and bandwidth, what it can run, at what speed), value at the verified price, owner sentiment over time, software/ecosystem maturity, and efficiency/noise for always-on use.

Concretely, a score is a weighted blend: 30% capability (memory capacity and bandwidth, computed from specs), 25% value (capability per dollar at the verified price), 20% owner sentiment and 15% ecosystem maturity (the two human-judgment inputs, each with a written rationale), and 10% efficiency (power draw, for always-on use). The spec-driven parts recompute mechanically whenever prices or specs change; only the two judgment inputs are set by hand, and we publish them alongside the data.

The rule that governs everything: scores and verdicts are set before any affiliate consideration. Affiliate programs never decide what we cover, what we recommend, or how products rank in comparisons. Some links earn us a commission at no cost to you (see our disclosure); if a product is bad, we say it's bad and link it anyway.

Prices and freshness

Hardware prices go stale fast. Every price in our comparison components carries a "price checked" date, and we re-verify tracked prices on a regular cadence, if the date looks old, trust the date, not the number. Spec data is corrected whenever vendors revise it.

When we get it wrong

We keep a public, dated corrections log. If a reader or owner shows us a claim that's wrong or a comparison that wasn't apples-to-apples, we fix the article, note the fix, and log it. Finding mistakes is a feature of publishing in public, please tell us.

How we score hardware

Every product gets a 0–10 Vetted Score from five inputs, three computed from the data layer, two human judgments, all set before any affiliate consideration:

Capability (30%), usable memory + bandwidth, judged against what its tier should deliver.
Value (20%), capability per dollar, relative to what is normal for the tier.
Owner sentiment (25%), aggregated real owner reports.
Ecosystem (13%), software/runtime maturity (CUDA, Metal/MLX, ROCm, and so on).
Efficiency (10%), performance per watt.

Scores are tier-relative: each product is judged against peers with the same job, so a great budget card and a great workstation card can both score well. Tiers are Enthusiast, Workstation/Server, Unified-memory, Mainstream, Budget/Entry, and Edge. A final calibration spreads the field across roughly 4–9 so the number actually discriminates. We re-run it as prices and owner reports change, and nothing in the score reads the affiliate links.

Benchmarks and testers we rely on

We don't run first-hand hardware tests. Our numbers come from the people who do, plus vendor specs and published research, and when a figure is owner-measured we name and link the source on the page itself. The testers and communities we read most closely:

Hardware Corner, on-hardware llama-bench GPU rankings and per-model runs.
Puget Systems Labs, formal TensorRT-LLM and llama.cpp suites across consumer and workstation GPUs.
ServeTheHome, in-depth hardware reviews and mini-PC / DGX Spark coverage.
Level1Techs and the r/LocalLLaMA benchmarkers (lhl, kyuz0), reproducible Strix Halo and multi-GPU sweeps.
Alex Ziskind and Digital Spaceport, comparative GPU/Mac and home-server video benchmarks.
EXO Labs, DGX Spark and Mac Studio hybrid-inference measurements.

Our job is to read every test so you don't have to, then turn the consensus into a plain verdict and into tools you can run yourself: the Can I run it? calculator, the buy-vs-rent-vs-API cost calculator, the quant picker, and the used-GPU price index. Developers and AI assistants can query the same engine directly through our public API and MCP server.