Category: technical SEO · ~15 min read · Library article · Author: Anna Hartung
If you run a site built on Next.js, React, or any other component-based framework and you've opened a Semrush Site Audit, you've probably met the warning: "Low text-to-HTML ratio." It sounds like a verdict — as if Google has quietly downgraded your pages for being too code-heavy. In practice, it's one of the most misread signals in technical SEO, and on a modern stack it fires constantly on pages that are completely healthy.
This guide goes past the one-line "ignore it" advice you'll find elsewhere. We'll cover what the ratio actually measures, why frameworks like Next.js trip it by design, whether Google treats it as a ranking factor at all, and — the part that actually matters — how to tell the difference between a harmless warning and a genuine rendering problem hiding behind it.
Key takeaways
| Point | Detail |
|---|---|
| It's not a ranking factor | Google's John Mueller has repeatedly confirmed the text-to-HTML ratio is not, and never was, a Search ranking signal. |
| Next.js trips it by design | Component frameworks emit deep structural markup, hydration data, and scripts, so the ratio reads low even on perfectly healthy pages. |
| The real question is rendering | Ask whether your content is present in the server HTML crawlers receive — not whether a percentage clears some threshold. |
| AI crawlers don't run JavaScript | GPTBot, ClaudeBot, and PerplexityBot read raw HTML only, so client-rendered content is effectively invisible to them. |
| Fix the cause, not the number | Padding text to raise the ratio never moves rankings; address thin content or a rendering gap instead and the ratio follows. |
Is text-to-HTML ratio a Google ranking factor?
The short answer is no, and Google has been unusually blunt about it. Search Advocate John Mueller has repeatedly said the code-to-text (or text-to-HTML) ratio is not, and never has been, a factor in Google Search. Asked directly on more than one occasion, he's described it as a metric that makes "absolutely no sense at all for SEO," and Gary Illyes has echoed the same line — that it doesn't matter as long as the page has decent content. There is no internal "ratio" threshold a page has to clear to rank.
So where does the "ideal ratio of 25–70%" figure that some tools quote come from? Not from Google. That range is folklore that circulates among auditing tools and SEO blogs; it describes a correlation that used to hold on simple static sites — text-heavy pages tended to be cleaner and lighter — not a rule Google enforces. Treating it as a target is exactly the mistake Mueller was warning against.
That doesn't make the warning meaningless, though. It makes it indirect. A genuinely bloated page — megabytes of markup, scripts, and styles wrapped around a sentence of content — can be slow, hard to crawl efficiently, and thin on substance. None of those is "the ratio"; they're separate, real problems that a low ratio sometimes points at. The ratio is the check-engine light, not the engine.
What does Semrush actually measure — and why does Next.js trigger it?
Semrush's Site Audit raises the warning when a page's text-to-HTML ratio drops to 10% or less, calculated roughly as the byte size of the visible text divided by the byte size of the full uncompressed HTML document. Crucially, Semrush files this under Warnings, not Errors — its lowest-priority tier, sitting below the issues that actually break crawling or indexing. Semrush itself keeps the check around largely for continuity; it rarely retires established issue types, even ones whose SEO relevance has faded.
Now consider how a Next.js or React page is built. The framework ships a tree of components, each wrapped in its own structural markup. Layout primitives, design-system wrappers, navigation, card grids, filters, and repeated UI elements all emit <div> after <div> — the "divitis" that visual builders and component systems produce as a side effect. On top of that sits the framework's own runtime: hydration payloads, inlined JSON state, preload tags, and script references in the <head>. The result is a page where the markup is large by construction, while the visible text on a given page — say, a service overview or a portfolio grid — is comparatively short. Divide one by the other and you land under 10% without anything being wrong.
This is why the warning clusters on exactly the page types where it's least informative: listing and index pages (blog indexes, tag archives, case-study overviews), legal and utility pages (privacy policy, terms, brand disclaimer), and design-led landing pages that lead with imagery and components rather than long copy. E-commerce and Shopify stores hit it for the same reason — product grids plus third-party scripts inflate the denominator. The metric was designed for a web made of hand-written HTML documents. Component-rendered applications simply don't fit its assumptions.
The question that actually matters: can Google see your content?
Here's the reframe that turns this from a vanity metric into a useful diagnostic. The right question was never "is my ratio above some number?" It's "is my real content present in the HTML that crawlers receive — and can search engines render it reliably?" On a JavaScript framework, that question has real teeth, and it's worth understanding the mechanics.
Googlebot processes JavaScript pages in two separate waves. In wave one, it fetches the raw HTML your server returns and indexes whatever is immediately present — text, links, metadata, structured data. This pass is fast. If your content is in that initial HTML, it's eligible for indexing right away. In wave two, pages that depend on JavaScript are placed in a render queue, where Google's Web Rendering Service — a headless, evergreen Chromium kept current with modern JS support — executes the scripts, builds the final "rendered DOM," and updates the index. The catch is that this second wave is queued against finite resources and can lag the first by hours, days, or in some cases weeks. The delay is a function of render-queue capacity, not of whether Googlebot can run your code; server-rendered HTML skips the queue entirely.
This is where rendering strategy decides everything, and Next.js gives you the full spread:
In client-side rendering (CSR) — a plain React SPA — the server returns a near-empty shell, often little more than a <div id="root"></div> plus a script bundle. In wave one, Googlebot sees essentially no content, no internal links, and frequently no real metadata. Everything depends on wave two succeeding and arriving in time. If rendering is delayed or fails (a script error, a slow API, a misconfiguration), there is nothing to index. Internal linking — the connective tissue Google uses to discover and distribute authority across your site — is invisible on the first pass. A CSR page can show a low text-to-HTML ratio and a genuine indexing problem at the same time, and the ratio is the symptom people notice while the rendering dependency is the disease.
In server-side rendering (SSR) and static site generation (SSG), the HTML arrives complete: content, links, meta tags, and structured data are all in wave one. This is the reliable baseline for anything you want ranked. Next.js also offers incremental static regeneration (ISR) — pre-built pages refreshed in the background — and per-route control, so you can statically render your marketing and content pages for SEO while leaving genuinely app-like, logged-in screens as CSR. The point isn't "never use client rendering"; it's "never make public, rankable content depend on it."
So when the Semrush warning appears, the productive move is not to chase the percentage. It's to confirm that the page's meaningful content exists in the server response. The fastest checks: view the page's raw source (not the rendered DevTools DOM) and search for a core sentence; and run the URL through Google Search Console's URL Inspection to see the actual rendered HTML Google captured. If your headings and key paragraphs are there, a 7% ratio is a non-issue. If they're missing from the raw HTML and only appear after JavaScript runs, you've found something worth fixing — and it has nothing to do with the ratio.
The 2026 twist: AI crawlers raise the stakes on initial HTML
There's a newer reason to care about what's in your server-rendered HTML, and it's reshaping the long tail of this topic. The crawlers behind AI answer engines — GPTBot, ClaudeBot, PerplexityBot, and others — generally do not execute JavaScript at all. They read the raw HTML and stop. So content that only materialises in wave-two rendering is effectively invisible to them, even though Google might eventually index it.
As more discovery shifts toward answer engines and AI overviews — the territory people now call Answer Engine Optimization (AEO) and Generative Engine Optimization (GEO) — a CSR architecture quietly excludes you from citation and inclusion in those surfaces. The old advice ("Google can render JS, so you're fine") was already shaky because of the wave-two delay; against AI crawlers it simply doesn't hold. Server-rendering your substantive content is no longer just a Google-indexing nicety. It's the price of being readable by the systems that increasingly mediate discovery.
When the warning is genuinely safe to ignore
Put together, a low text-to-HTML ratio is almost always harmless when the page is doing its job and its content is server-rendered. A blog index that lists posts, a tag page, a privacy policy, a visually rich landing page where the framework emits deep component markup — these will trip the 10% threshold routinely, and that's expected. Google doesn't score the ratio; it asks whether the page satisfies its intent, whether it can render and index the content, and whether the page is internally linked and reachable. If a human can tell what the page offers within a few seconds and the core text is present in the raw HTML, the warning is noise. Inflating such a page with filler text to "fix" the number does nothing for rankings and often hurts the design and conversion rate it was built for.
When a low ratio is a symptom worth chasing
The warning earns your attention when it travels with other signals — that's the discipline. Treat it as a prompt to investigate, then look for the things that actually matter:
A page with genuinely little content behind the markup (the classic example: a service page that's two sentences and a grid of cards) targeting a commercial or informational query it has no depth to win. Semrush flagging Low Word Count or Thin Content on the same URL — those are the more meaningful warnings, and a low ratio next to them corroborates a real gap. Google Search Console showing weak impressions, or the page sitting in "Crawled – currently not indexed" or "Discovered – currently not indexed." And the rendering red flag above: important content present only after client-side execution, absent from the initial HTML. In every one of these cases the ratio isn't the disease — low word count, thin content, weak intent coverage, or a rendering dependency is. Fix those, and the ratio takes care of itself as a byproduct.
One specific footgun worth naming: a robots.txt rule that blocks JavaScript or CSS files. It's common on sites that blocked staging assets and never cleaned up before launch. Block those resources and you collapse wave-two rendering for every page that relies on them — Googlebot can't run what it can't fetch. That's a real, ranking-affecting problem that no ratio metric will surface.
How we audit this at H-Studio
When this warning shows up in a client audit, we don't open a text editor — we run a short sequence of checks, in priority order.
First, rendering parity: does the raw server HTML contain the page's headings, primary paragraphs, internal links, and structured data? We compare view-source against the rendered DOM and confirm with Search Console's URL Inspection what Google actually captured. Second, intent and depth: does the page have enough substance to satisfy the query it targets, judged as a reader would judge it — not by word count alone, but by whether it answers the question. Third, the corroborating signals: low word count, thin content, indexation status in GSC, and whether the page is even internally linked from somewhere relevant. Fourth, the bigger issues that almost always outrank this one: Core Web Vitals and performance, indexability and canonicalisation, and keyword cannibalization between pages competing for the same intent. Only if content quality is genuinely lacking do we recommend expanding or restructuring — and then for the reader, not for the metric.
In years of doing this, I can't recall a single case where raising a text-to-HTML ratio, on its own, moved a ranking. Every time the warning mattered, it was standing in front of a real problem — usually thin content or a client-side rendering gap — and fixing that is what changed the outcome.
Fixing it the right way (only when justified)
If an audit does turn up a genuine weakness, the work is about quality and rendering, never padding. The levers that actually help:
Make sure essential content — headings, the main copy, structured data — is server-rendered in Next.js so it lands in wave one and is readable by non-rendering crawlers. Where a page is legitimately thin for its intent, add a concise, useful introduction (often 150–300 words) that frames what the page offers and a clear "what we do / how it works" explanation, plus an FAQ answering real questions buyers ask. Keep key content out from behind tabs and accordions that hide it from the initial render. And on the code side, trim what's genuinely wasteful — unused JavaScript, redundant component nesting, oversized third-party scripts — because that helps page speed and Core Web Vitals, which are real signals, even though the ratio that improves alongside them is not the goal.
What never helps is manufacturing text to hit a number. Stuffing a clean landing page with paragraphs nobody will read inflates the ratio, dilutes the message, and tends to lower conversion — trading a phantom SEO gain for a real business loss.
What actually moves rankings here
Strip the warning back and the priorities are clear. Google ranks pages, not metrics. What it rewards is content that meets intent, delivered in HTML it can render and index reliably, on a site that's fast (Core Web Vitals — LCP, CLS, and INP, which replaced FID as the responsiveness metric in 2024 — are confirmed signals), well-structured, and internally linked so authority and discovery flow. The text-to-HTML ratio sits downstream of all of that. Improve the things that matter and the ratio drifts upward on its own; chase the ratio in isolation and you've spent effort on the one number Google told you to ignore.
For modern Next.js sites, the verdict is simple: treat "low text-to-HTML ratio" as a diagnostic hint, not an action item. Let it prompt the two questions worth asking — is my content in the server HTML, and does this page deserve to rank for its intent? — and act only on the answers.
If you want a second pair of eyes on how your Next.js site renders for crawlers, our SEO migration & relaunch work starts exactly here — rendering parity, indexability, and intent — before anyone touches a word count. A 30-minute intro call is the easiest way to start.
— Anna
Frequently asked questions
Is a low text-to-HTML ratio bad for SEO?
Not on its own. Google's John Mueller has said plainly that the ratio is not and never has been a ranking factor. It only matters indirectly, as a possible hint of page bloat, thin content, or a rendering problem — each of which you'd address directly, not by changing the ratio.
What is a good text-to-HTML ratio?
There's no Google-defined target. The "25–70%" figure quoted by some tools is industry folklore, not a Google rule. Semrush simply flags pages at 10% or below. A page can rank perfectly well below that threshold.
Why does my Next.js or React site have a low text-to-HTML ratio?
Because component frameworks emit large amounts of structural markup — nested layout elements, design-system wrappers, hydration data, and scripts — relative to visible text. The denominator is large by design, so the ratio reads low even on healthy pages.
How do I know if Google can actually see my content?
Look at the raw server HTML (view-source, not the rendered DevTools DOM) and search for a core sentence; then use URL Inspection in Google Search Console to see the rendered HTML Google captured. If your headings and main paragraphs are present, you're fine.
Does client-side rendering hurt SEO?
It can. With CSR, Googlebot's first wave sees a near-empty shell and depends on a delayed second rendering wave — and AI crawlers like GPTBot and ClaudeBot don't run JavaScript at all. Server-side rendering or static generation puts your content in the initial HTML, which is the reliable choice for anything you want ranked or cited.
Should I add text to fix the warning?
Only if the page is genuinely thin for the query it targets. Add content that helps the reader — a clear intro, an explanation of the offering, a real FAQ. Never pad a page just to raise the ratio; it doesn't help rankings and usually hurts clarity and conversion.
Edited and fact-checked by Anna Hartung.