H-Studio logo
Start a project
performance · 29 May 2026 · 13 min

Why Lighthouse Scores Lie (And What Actually Matters)

A 98 in Lighthouse can sit happily next to falling rankings and complaining users. Here's why lab scores and the field data Google actually uses diverge, what the Core Web Vitals really measure in 2026, and how high-performing teams optimize for reality instead of the number.

Author
Anna Hartung
  • lighthouse
  • performance
  • seo
  • core-web-vitals
  • crux

A developer inspecting performance metrics and Core Web Vitals in browser devtools.

Lighthouse has become the most misunderstood tool in modern web development. Teams celebrate hitting 95+, going all-green, "solving performance" — and then rankings stall, conversions soften, and real users complain that the site feels slow. This isn't a coincidence or bad luck. It's the predictable result of optimizing for a number that answers a different question than the one Google is asking. Lighthouse measures how fast your site could be under ideal conditions. Google measures how fast it is for the people actually using it. Those are not the same thing, and the gap between them is where a lot of "but our score is great" confusion lives.

Key takeaways

PointDetails
Lighthouse ≠ rankingLighthouse is a lab test under ideal conditions; Google ranks on field data from real Chrome users (CrUX), not your laptop's synthetic run.
Lab and field divergeA 98 lab score and failing real-world metrics coexist routinely — most visibly in the gap between lab TBT and field INP.
INP is the 2026 metric to watchINP replaced FID in March 2024. It's the metric most sites fail, and fixing it is an architecture change, not a tweak — and it rarely shows up in Lighthouse.
Judge at p75 over 28 daysA URL is "good" only when LCP under 2.5s, INP under 200ms and CLS under 0.1 all pass at the 75th percentile, aggregated over CrUX's rolling 28-day window.
CWV is a modest tiebreakerCore Web Vitals are a confirmed but modest signal — content quality and E-E-A-T matter more; CWV mainly break ties in competitive niches and on mobile.

The core problem: Lighthouse measures a controlled fantasy

A Lighthouse run happens in a clean room: a simulated device, a fixed and throttled network profile, no browser extensions, no competing tabs, none of the noise of real life. That's deliberate and useful — it makes the test repeatable, which is exactly what you want for debugging and catching regressions. But it also means the result describes a controlled best case, not the messy reality of a user on a three-year-old Android phone, on a patchy mobile connection, with a dozen tabs open and a battery-saver throttling the CPU.

Google, for ranking purposes, looks at that messy reality. Its signal comes from the field — aggregated, anonymized data from real Chrome users — not from a synthetic lab run on your laptop. So a pristine lab score and a poor real-world experience can coexist comfortably, and when they do, the lab score is the one that's lying to you. Not maliciously; it's just answering "how fast could this be?" while you needed the answer to "how fast is this, for them, right now?"

Why high Lighthouse scores often correlate with bad SEO decisions

Here's the uncomfortable part: chasing the score actively pushes teams toward choices that hurt real users. To make the lab number climb, teams delay content behind interactions, lazy-load aggressively past the point of sense, lean on hydration tricks, mask slow data with skeleton screens, and defer rendering in ways that look great to a synthetic test and feel worse to a human. You can move the main content out of the initial paint to flatter one metric and degrade the actual experience in the process. The green number goes up; the outcome goes down. Optimizing for the measurement instead of the thing being measured is a classic trap, and Lighthouse makes it unusually easy to fall into because the reward (a satisfying score) is immediate and the cost (worse field data) is delayed and invisible on your machine.

The three performance worlds — and why only one ranks you

It helps to separate three realities that get blurred together.

Lab metrics (Lighthouse). Synthetic, repeatable, developer-friendly. Genuinely good for debugging, catching regressions, and comparing two versions of a change under identical conditions. Limited for predicting rankings, real UX, or business outcomes — because the conditions aren't real.

Field metrics (CrUX). The Chrome User Experience Report: real users, real devices, real networks. This is the data Google's page-experience signal draws on. If your field data is bad, a beautiful Lighthouse score won't save you, because Google isn't looking at your lab run — it's looking at your users.

Business metrics (the ones everyone forgets). Bounce rate, conversion, scroll depth, real interaction delay. These are the outcomes performance is supposed to protect, and they often track real-world speed far more faithfully than any lab composite. A site can be "98 in Lighthouse" and still bleed conversions because the experience that matters — the one users actually have — was never the one being optimized.

The practical hierarchy: debug in the lab, judge by the field, and validate against the business.

A monitoring dashboard tracking real-user latency, errors and Core Web Vitals over time.

The Lighthouse "wins" that lose in reality

A few specific patterns recur often enough to name.

Artificially delaying LCP. Hide the main content, defer its render, show a placeholder, and Largest Contentful Paint looks better in the lab. Real users still wait for the real content, and Google sees that wait in the field. You've optimized the appearance of speed, not speed.

Over-aggressive JavaScript deferral. Deferring scripts reliably boosts the lab score, but pushed too far it delays interactivity, breaks analytics timing, and — most importantly in 2026 — causes INP regressions. The page benchmarks beautifully and feels laggy the moment someone tries to use it. This is the single biggest lab-vs-field divergence, and it deserves its own point below.

Chasing CLS while breaking UX. Layout-shift fixes that reserve enormous blocks of space or freeze the layout unnaturally can technically lower Cumulative Layout Shift while making the page feel stiff or broken. CLS is a signal about stability, not a target to be gamed; a "perfect" CLS achieved by harming usability is a loss dressed as a win.

The TBT-vs-INP trap (the part most teams miss)

This is the technical heart of why a great Lighthouse score and bad field responsiveness can live side by side. Lighthouse, running in the lab, can't measure real interactions because there's no real user clicking. So it uses a proxy for responsiveness: Total Blocking Time (TBT), which estimates how much the main thread was blocked during load. Google's actual ranking metric for responsiveness is Interaction to Next Paint (INP), which is a field measurement of how quickly the page responds across all of a real user's interactions during a session.

TBT and INP are related but routinely diverge. A page can have low TBT during the brief load window Lighthouse observes, and still have terrible INP because the laggy interactions happen later — when a user opens a heavy dropdown, filters a large list, or triggers state changes in a bloated client app. Lighthouse never sees those moments; real users live in them. So "great TBT, great Lighthouse score, failing INP in the field" is not a paradox — it's the normal outcome for interaction-heavy single-page apps. If you remember one thing: the lab's responsiveness number and the field's responsiveness number are different metrics, and only the field one ranks you.

Pro tip: If your Lighthouse score is green but users say the site feels laggy, stop staring at TBT and pull your field INP from CrUX or the Search Console Core Web Vitals report. The gap between the two is almost always where the real problem is hiding.

What actually matters in 2026 (Google reality)

Let's be precise about the field signals worth your attention.

Field Core Web Vitals, assessed at the 75th percentile. Google's three Core Web Vitals are LCP (loading), INP (responsiveness, which replaced FID in March 2024), and CLS (visual stability). The "good" thresholds are LCP under 2.5 seconds, INP under 200 milliseconds, and CLS under 0.1. Crucially, a URL is judged "good" only when all three pass at the 75th percentile of real visits — meaning the experience of your slower-than-median users counts, not your best-case one — aggregated over a rolling 28-day window in CrUX. One fast test on your machine means nothing against that. (And note the granularity: Google evaluates per-URL where it has enough data, and falls back to a page-group or whole-origin assessment when a specific URL has too little traffic — so a slow template can drag down pages that individually lack data.)

Backend latency (TTFB). The silent killer. Time to First Byte sits upstream of everything: a slow API or origin server delays LCP no matter how optimized the frontend is, and most lab runs — served warm, close to the test machine — hide it. Users on real networks don't get that courtesy. Slow backends quietly cap your LCP and, with it, your rankings.

Interaction under load (INP). As above, INP punishes heavy client logic, oversized JavaScript bundles, and stateful UIs that block the main thread. It's the metric most sites fail in 2026, and fixing it isn't a tweak — it means breaking up long tasks, deferring non-critical work, and yielding to the main thread, i.e. a genuine shift in how the frontend is architected. It rarely shows up in a Lighthouse score.

Predictability. Google favors pages that behave consistently — that don't degrade under load and don't surprise users. Performance volatility is itself a risk: a site that's fast on a good day and falls apart under traffic looks worse in aggregated field data than a site that's merely steady.

A calibration worth stating plainly, because it cuts both ways: Core Web Vitals are a confirmed but modest ranking signal. Google has been consistent that content relevance, quality, and E-E-A-T matter more — CWV act mainly as a tiebreaker between otherwise comparable pages, with the effect most visible in competitive niches and on mobile. So the goal isn't to worship the field numbers any more than the lab ones; it's to stop bleeding users and stop handing competitors an easy edge. Don't expect green CWV to rescue thin content — and don't let poor CWV quietly disadvantage content that deserves to win.

Lighthouse is a tool, not a KPI

Used correctly, Lighthouse is genuinely helpful — for local debugging, for comparing the before-and-after of a specific change, for catching regressions in CI. Used incorrectly, it's actively dangerous — as a performance validation, as SEO proof, or as a number to put in a client report and call performance "done." The two clean statements to keep in mind: a green Lighthouse score does not mean a fast site, and a genuinely fast site does not always score green. Treat it as an instrument, not a verdict.

Why this matters more in modern frameworks

Frameworks like Next.js cut both ways. They make it easy to game lab metrics — to defer, hide, and stage rendering so the synthetic test is delighted — and equally easy to build genuinely fast systems through server rendering, sensible code-splitting, and disciplined data loading. The framework isn't the villain or the hero; the architecture is. The same tool that lets you hide latency behind a skeleton also lets you eliminate that latency at the source. Which one you get depends on whether the team is optimizing the number or the experience.

What high-performing teams do instead

The teams that consistently win in search and in conversion share a posture. They monitor CrUX, not Lighthouse, as their source of truth. They track real-user Core Web Vitals continuously rather than running a test before launch and forgetting it. They set explicit performance budgets so regressions get caught at the door. They optimize data flow and backend latency, not animations and vanity micro-optimizations. They treat performance as a shared backend-and-frontend responsibility, because TTFB and INP live on opposite ends of the stack. And a practical force-multiplier: they fix templates, not individual URLs — a single LCP problem in a blog template affects every post, so one fix repairs hundreds of pages at once. In short, they stop celebrating numbers and start controlling systems.

A product team mapping performance budgets and field metrics on a whiteboard.

Pro tip: Set a performance budget in CI tied to field-relevant metrics, not just a Lighthouse threshold. A budget that fails the build when a bundle balloons or a long task creeps in catches INP regressions before they ever reach real users.

The H-Studio approach: reality-based performance

At H-Studio we don't treat a Lighthouse score as success. We look at real-user Core Web Vitals, backend latency, regression risk, actual SEO impact, and business outcomes. If Lighthouse improves as a side effect of fixing those, wonderful. If it doesn't, we care a great deal less — because Google largely doesn't. The point of performance work isn't a satisfying lab number; it's a site that's fast for real people, consistently, in the conditions they actually use it.

And the final thought is really the whole article in one line: Lighthouse doesn't lie on purpose. It just answers the wrong question precisely. The right question is the only score that reliably matters — how fast is your site for real users, consistently, at the 75th percentile, on the devices and networks they actually have? Answer that, and the lab number tends to follow. Chase the lab number, and you can spend a quarter making it green while the experience that ranks you, and the experience that converts, quietly gets worse.

— Anna

Get a performance audit based on real data

If your Lighthouse scores are green but rankings or conversions are slipping, you're optimizing for the wrong metrics — and the fix is to measure what Google and your users actually feel. We start with React performance optimization for the INP and main-thread problems that lab scores hide, pair it with Core Web Vitals and technical SEO to repair the field signals that move rankings, and reach upstream into DevOps and cloud engineering when TTFB and backend latency are the real ceiling. Browse all our engineering services, or get in touch and we'll audit what your real users experience — not what your laptop reports.

FAQ

Is a high Lighthouse score bad for SEO?

Not inherently — but optimizing for it can be. Lighthouse is a lab test under ideal conditions; Google ranks on field data from real users (CrUX). Tactics that flatter the lab score (hiding content, over-deferring JS) can worsen the real experience Google measures.

What does Google actually use to rank — Lighthouse or Core Web Vitals?

Field Core Web Vitals, from real Chrome users, assessed at the 75th percentile over a rolling 28-day window. Lighthouse is a lab tool that doesn't directly feed rankings.

What are the Core Web Vitals thresholds in 2026?

LCP under 2.5 seconds, INP under 200 milliseconds, CLS under 0.1 — and all three must pass at the 75th percentile for a "good" assessment. INP replaced FID as the responsiveness metric in March 2024.

Why does my Lighthouse score look great but my site feels laggy?

Most often the TBT-vs-INP gap. Lighthouse uses Total Blocking Time (a lab proxy) for responsiveness; Google uses INP (real interactions across a session). Interaction-heavy apps can have good TBT at load and poor INP later, when users actually interact.

How much do Core Web Vitals affect rankings?

They're a confirmed but modest signal — a tiebreaker between otherwise comparable pages, strongest in competitive niches and on mobile. Content relevance and quality matter more, but poor CWV can still cost you against similar competitors.

Keep reading

More from the engineering stream.

  1. Post · 001
    09 Jun 2026

    Headless / Next.js Website vs. WordPress for German B2B Companies

    Next.js with a headless CMS or WordPress for your B2B website? An honest comparison of performance, SEO, security, 3-year cost and migration — and when each one is the right call.

    Read post
  2. Post · 002
    30 May 2026

    The 5-Day Architecture Sprint: How Early Architecture Can Help Avoid a €50k Rewrite

    Software projects fail at scope far more often than at code. The 5-Day Architecture Sprint is a fixed-scope, architecture-first method that maps workflows, validates the stack, surfaces risks (including GDPR and data residency) and produces a roadmap, ADRs and estimates — before a line of production code.

    Read post
  3. Post · 003
    29 May 2026

    Why Most MVPs Fail Technically Before Product–Market Fit

    Post-mortems blame 'no market need' — but there's a quieter killer: the MVP becomes technically unusable as a foundation before PMF arrives. Why Minimum Viable Architecture matters, and how to build an MVP you can iterate on instead of rebuild.

    Read post
All posts
Get started  ·  011

Let’s build what
moves you forward.

From product idea to production system — we help you define, build and hand over software your team can run.

Studio
H-Studio Berlin
Senior delivery · DACH region
Contact
hello@h-studio-berlin.de
+49 176 41762410
Office
Schmidstraße 2F-K
10179 Berlin