Blog

How we research and score hardware

Vetted Consumer reviews local-LLM hardware without testing most of it first-hand — and we say so on every page. What we do instead is aggregation done carefully: we cross-check what owners report against independent lab tests and the published research, cite everything, and keep the incentives clean. This page is the full method.

Where our information comes from

  • Owner reports — threads on r/LocalLLaMA, r/LocalLLM, r/ollama, r/MachineLearning and product-specific communities (r/MacStudio, r/framework, NVIDIA Jetson forums). We prioritize accounts from people who own the hardware — especially people who own both sides of a comparison — and we link every quote to its source so you can verify it.
  • Independent technical testing — first-hand lab benchmarks from outlets like The Register, Phoronix, Chips and Cheese, ServeTheHome, and established YouTube reviewers. These are labeled as lab tests, distinct from owner sentiment.
  • Primary sources — peer-reviewed papers (arXiv, with IDs we verify before citing), official vendor spec sheets and documentation, and project docs for llama.cpp, vLLM, Ollama, and MLX.

How sources are weighted

Specs come from vendors. Performance claims need at least one of: multiple independent owner reports agreeing, or an independent lab test. Where owners and reviewers disagree, we say so. Single-seed benchmarks, marketing numbers, and unverifiable claims get flagged or excluded — we have skipped widely-shared benchmark posts because their own comment sections demonstrated the methodology was unsound. We always include critical threads for balance: if owners are unhappy, that goes in the article.

Scoring

Products in our database carry a 0–10 score only after a full review. Until then they're marked pending review and show no number — we don't publish fake precision. Scores weigh: local-LLM capability (memory capacity and bandwidth — what it can actually run, at what speed), value at the verified price, owner sentiment over time, software/ecosystem maturity, and efficiency/noise for always-on use.

Concretely, a score is a weighted blend: 30% capability (memory capacity and bandwidth, computed from specs), 25% value (capability per dollar at the verified price), 20% owner sentiment and 15% ecosystem maturity (the two human-judgment inputs, each with a written rationale), and 10% efficiency (power draw, for always-on use). The spec-driven parts recompute mechanically whenever prices or specs change; only the two judgment inputs are set by hand, and we publish them alongside the data.

The rule that governs everything: scores and verdicts are set before any affiliate consideration. Affiliate programs never decide what we cover, what we recommend, or how products rank in comparisons. Some links earn us a commission at no cost to you (see our disclosure); if a product is bad, we say it's bad and link it anyway.

Prices and freshness

Hardware prices go stale fast. Every price in our comparison components carries a "price checked" date, and we re-verify tracked prices on a regular cadence — if the date looks old, trust the date, not the number. Spec data is corrected whenever vendors revise it.

When we get it wrong

We keep a public, dated corrections log. If a reader or owner shows us a claim that's wrong or a comparison that wasn't apples-to-apples, we fix the article, note the fix, and log it. Finding mistakes is a feature of publishing in public — please tell us.

Get the Vetted Consumer newsletter

Reviews, buying advice, and field notes. Delivered monthly.

Almost there — check your inbox and click the confirmation link. ✓

Something went wrong — please try again, or email hello@vettedconsumer.com.