How Web Accessibility Affects AI Citation

A page that a screen reader can read well is a page that AI can read well. Accessibility (a11y) is a machine-readability foundation before it is a compliance checkbox — semantic HTML, heading hierarchy, alt text, and table markup serve assistive technology and AI extractors at the same time. Yet most pages fail at that foundation. WCAG 2 failures were detected on 95.9% of the top one million home pages [WebAIM Million, 2026], and AI crawlers read markup, not a rendered screen [Vercel × MERJ, 2025].

How do AI crawlers actually "read" a page?

They do not view a rendered screen the way a person does. They read the raw HTML the server sends. The major AI crawlers do not execute JavaScript: an analysis of 500M+ GPTBot fetches found zero evidence of JS execution — with Gemini the exception, since it can render on Googlebot infrastructure (as measured in early 2025) [Vercel × MERJ, 2025].

These crawlers do fetch JS files (ChatGPT 11.5%, Claude 23.8%) — they fetch them but do not run them. So content drawn on the client is invisible to these readers. Raw HTML is also too bulky to process as-is — 29.3% of a Common Crawl sample exceeded 32k tokens — so AI pipelines lean on preprocessing that pulls out the main content [Dripper, 2025, preprint]. The cue for telling body content from boilerplate appears to be the semantic markup itself.

This way of reading should sound familiar. It is how a screen reader works.

Where do accessibility and machine readability overlap?

The overlap is the structural layer. Semantic HTML, heading hierarchy, alt text, table markup — the signals a screen reader uses to interpret a page are the same ones AI uses to extract from it. But not all of WCAG overlaps. Color contrast and keyboard operation, for instance, have nothing to do with AI readability.

There is an industry framing that "an LLM is a non-visual user, so meeting WCAG raises machine readability" (accessiBe, Siteimprove, and others). It is intuitive, but those pieces are all inference — none presented measured data. What is solid is the layer underneath. A page whose meaning is marked with native elements like <h2>, <table>, and <nav> carries more information to a parser than a pile of meaningless <div>s. That difference is backed by the W3C's first rule of ARIA use: if a native HTML element can do the job, use it [W3C ARIA in HTML, 2025].

Is there evidence that better accessibility gets you cited by AI?

There is no direct causal measurement yet. No A/B test showing "we raised our accessibility score and AI citations went up," and no citation-share tracking, exists anywhere we could find. So "accessibility guarantees citation" is not a true statement. What exists is mechanism evidence, plus adjacent experiments where better structure raised visibility.

The mechanism: give an LLM the same table in semantic HTML and it understands it more accurately — +6.76% over delimiter-separated natural-language text [Table Meets LLM, Microsoft, 2024]. And adding citations, statistics, and sources to improve structure and clarity raised a source's visibility inside generative engines by up to ~40% [Princeton GEO, 2024]. Both point the same way: format affects understanding and citation.

So accessibility does not guarantee citation. It builds a precondition for it — a structure AI can extract from. What is measurable is that precondition, not a citation count.

Headings, alt, tables — what shapes extraction, and how?

Three signals open three different channels. Heading hierarchy is the table of contents for passage extraction; alt text is the only text channel for image information; table markup decides whether data is extracted with its row-and-column relationships intact. And the real web is broken on all three.

Start with headings: 41.8% of pages skip a heading level (h2 straight to h4), 18.1% have multiple h1s, and 7.5% have no headings at all [WebAIM Million, 2026]. Break the hierarchy and you blur the passage boundaries. On images, 16.2% of home-page images (10.8 per page on average) sit there with no alt text [WebAIM Million, 2026]. Google uses alt text, computer vision, and surrounding text together to understand an image — and alt is the primary explicit signal of what it contains [Google Search Central, 2025]. Tables are worse: of 948,225 observed tables, only 19% had correct data-table markup (<th> and so on) [WebAIM Million, 2026].

Accessibility signalWhat a screen reader getsWhat AI gets
Heading hierarchy (h1→h2→h3)Page outline, skip-to navigationPassage boundaries, topic-unit extraction
Alt textSpoken description of the imageText representation of image content
Table markup (th/scope/caption)Row/column header relationshipsData extracted with row/column relations

(These are auto-detected, so real failures may be higher.)

Is more ARIA the fix?

No. Layering on more ARIA is not the answer. The W3C's first rule is to use a native HTML element instead of ARIA wherever one exists — bad ARIA actively harms structure. "No ARIA is better than bad ARIA" [W3C ARIA in HTML, 2025].

The data leans the same way. ARIA attributes now average 133 per page, up 27% year over year, yet pages that use ARIA carry more errors, not fewer (59.1 vs 42 on average) [WebAIM Million, 2026]. That is a correlation, not a cause — complex pages tend to use more ARIA. So the lesson is not "use more ARIA," it is "use native elements first."

How do you check your own page?

List the structural signals above and verify them as facts. Does the heading hierarchy hold? Do images have alt text? Are tables marked up as data tables? Each is a countable value. zupzup diagnoses these signals across 8 categories and 84 analyzers, and shows what to fix first through four scores that include items like table accessibility.

zupzup does not track search rankings or AI citation counts — it cannot. It diagnoses the signals that form the precondition for citation, as facts. Only what we can measure.

Conclusion / next step

Accessibility is a structural investment that serves two readers at once — a person's assistive technology and an AI's extractor. It does not guarantee citation. But the precondition is clear: if a structure cannot be extracted, citation never starts. And that precondition is measurable.

Start by checking whether your page clears it. Run an accessibility and citability diagnosis with zupzup.


References

  1. Vercel × MERJ, 2025 — The rise of the AI crawler
  2. WebAIM Million, 2026 — The WebAIM Million 2026
  3. W3C ARIA in HTML, 2025 — ARIA in HTML
  4. Table Meets LLM, Microsoft, 2024 — Sui et al., Table Meets LLM, WSDM 2024
  5. Google Search Central, 2025 — Image SEO best practices
  6. Princeton GEO, 2024 — Aggarwal et al., GEO: Generative Engine Optimization, KDD 2024
  7. Dripper, 2025 — arXiv 2511.23119 (프리프린트)
  8. 과기정통부·NIA 실태조사, 2024 — 2024 웹 접근성 실태조사(2025-03 발표)

← All posts