Published 2026-06-12

What Makes a Page AI Will Cite

Whether AI answers cite your page is decided by passage-level citability, not by your search rank. In a March 2026 Ahrefs analysis, 31.0% of the sources Google's AI Overview cited came from pages ranked beyond the organic top 100. A low rank doesn't rule you out. And citability is something you can measure — and improve.

What does AI actually look at when it decides to cite?

AI doesn't cite a whole page. It pulls a self-contained answer block — a passage — and quotes that. A short paragraph that answers one question completely, with facts, statistics, and sources inside it, is the signal. So "does this paragraph answer the question?" matters more than "does this page rank?"

This isn't a guess; it's a tested direction. The Generative Engine Optimization (GEO) paper from Princeton researchers (Aggarwal et al., KDD 2024) found that adding statistics, quotations, and cited sources to a page raised its visibility inside generative-engine answers by up to roughly 40%. Lower-ranked pages — around position five — gained the most.

Placement is a signal too. Consistent with the same Princeton finding, put the core answer high on the page and make it stand on its own. A page that buries its conclusion is a page that's harder to cite.

How is search rank (SEO) different from AI citation (GEO)?

SEO is the contest for rank in search results. GEO is the work of becoming an answer worth quoting. They correlate, but they aren't the same thing. Rank builds the foundation; it doesn't guarantee the citation. The passage decides that.

The numbers show the gap. In an Ahrefs (2026) study of 863,000 search results and 4 million AI Overview URLs, only 37.9% of AI Overview citations went to pages in the organic top 10. Another 31.2% came from positions 11–100, and 31.0% from beyond 100 (updated to the 2026-06 source — on the organic-blue-links-only cut the split is 37.1% / 26.2% / 36.7%, and either way the top 10 accounts for under half). Over time, the top-10 share of citations fell from about 76% in July 2025 to about 38% in early 2026 (Ahrefs / Search Engine Journal, 2026).

None of this means abandon SEO. By one analysis (BrightEdge / Search Engine Journal, 2025), about 52% of queries trigger no AI Overview at all. For most of those, classic organic search is still everything. SEO is the necessary base layer; GEO is a separate layer you build on top of it.

What structural conditions raise citability?

Six things. First, a self-contained direct-answer block — roughly 40–60 words — under a question-style heading. Second, a real density of facts and statistics with clear sources. Third, comparisons set in a table. Fourth, semantic HTML and structured data. Fifth, E-E-A-T and entity signals that tie an author to the operating organization. Sixth, information that isn't a rerun of everyone else's. Extractable structure — and a source that looks safe to stand behind — is what pulls citations, not polished prose.

The recommended shape for a direct answer is a short, self-contained passage that resolves one question without surrounding context (Averi, 2026). "Roughly 40–60 words" is a guideline, not a hard rule — the point is self-containment, not length. Lift that one paragraph out, and it should still read as a complete answer.

The same content can land very differently. Compare:

Before: "Our tool looks at lots of things from many angles and helps make your page better. Read on below for the details."

After: "zupzup scans your page across 8 categories and 84 analyzers. It does not track search rankings. Instead it flags the signals that shape whether your page gets found in search and answered by AI — direct-answer blocks, fact density, table structure, semantic markup — and gives you a priority order for what to fix first. Analysis runs entirely in your browser."

Lift the second version out from under its heading and it still answers one question ("what does this tool do?") on its own. It trades vague adjectives for facts, and the first sentence already carries the point.

Structured data (JSON-LD) is where you should hedge honestly. Google and Microsoft have confirmed they use schema in generative features, and it helps extraction accuracy (Search Engine Land, 2026). But a study tracking 1,885 pages that added JSON-LD found essentially zero change in AI citations (Ahrefs via Stan Ventures, 2026). Schema is hygiene that aids extraction — not a guarantee that adding it gets you cited.

The fifth signal is who is speaking. When AI picks a source to cite, it weighs whether the answer looks safe to stand behind — experience, expertise, authoritativeness, and trust (E-E-A-T) are the axes of that judgment. So a named author with real credentials, and a consistent entity signal linking the author to the operating organization, both shape citability. A page that shows who wrote it and on what basis reads as a safer source than anonymous body text. This is also authority that accrues past one page, at the site and author level; those off-page signals — site-level authority, freshness, off-site mentions — are covered separately in our article on GEO in 2026.

The sixth is unique information. AI avoids repeating what it can't verify, so a page that merely restates the existing top results is rarely the one it picks (Surfer, 2025). A page carrying numbers you measured, first-hand observation, or a viewpoint of your own is easier to lift as "information you can only get here" than one more retelling of the same fact. Saying what everyone else says is a weak reason to be cited.

Classic SEO vs. GEO checkpoints

Checkpoint	Classic SEO view	GEO (AI citation) view
Unit	Page / keyword rank	Passage (self-contained answer block)
Headings	Include the keyword	Question form + direct answer right below
Facts & stats	Nice to have	Cited, with real density
Comparisons	Prose in the body	Structured as a table
Structured data	For rich results	Aids extraction (citation boost unproven)
Author / trust signals	Byline optional	E-E-A-T + entity (author↔organization) — looks safe to cite
Information uniqueness	Keyword coverage is enough	Avoid restatement — unique data/viewpoint is the reason to cite
Access control	robots.txt	robots.txt (llms.txt currently inert)

Query fan-out: AI splits one question into many

AI search breaks one question into several sub-questions, searches them in parallel, then synthesizes an answer. This is called query fan-out. A page that answers many sub-questions of a topic gets cited more readily than one aimed at a single keyword.

Google has described this mechanism on the record. In its May 2025 announcement, Google said AI Mode runs a custom version of Gemini, and that Deep Search issues hundreds of sub-queries from a single question (Google, 2025). Industry analyses estimate a typical query expands into roughly 8–12 sub-questions (iPullRank and others, 2025). The more sub-questions one page shows up in, the higher its odds of being cited. And beyond a single page, a cluster of pages — one per sub-question, owned at the site level — does even better; that cluster strategy is covered in our article on GEO in 2026.

Does something like llms.txt work?

Right now, not really. As of 2025–2026, the major AI bots don't actually read llms.txt files, so the file does nothing for citation. To control crawler access, use robots.txt, not llms.txt. On a citation-readiness pass, treat llms.txt as a low-priority item and put your effort elsewhere.

The evidence is solid. After Semrush (2025) installed an llms.txt file and watched August–October 2025, the major bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — visited that file zero times. Google has said so officially: John Mueller stated that no AI system currently uses llms.txt (2025), and Gary Illyes confirmed Google doesn't support it and has no plans to (Search Central Live APAC, 2025-07-23). By contrast, the major crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot — all respect robots.txt (Search Engine Land, 2025). Spend your checks on signals that actually fire.

Does AI citation affect traffic quality?

This one is hard to state flatly. One vendor analysis found ChatGPT referral traffic converting better than non-branded organic; a separate academic study found it converting worse. It splits by category and buying path. So zupzup makes no promise about conversion — only what we can measure.

In detail: an analysis of GA4 data across 94 e-commerce brands (Visibility Labs via Search Engine Land, 2025) put ChatGPT referral conversion at 1.81% versus 1.39% for non-branded organic. Yet in the same analysis, average order value ran 14.3% lower — even one dataset points in different directions depending on the metric. And Kaiser & Schulze's academic study (Marketing Science, 2025) reported ChatGPT referral traffic converting and earning less per session than Google organic or paid. The two results conflict, so don't read a citation as a revenue guarantee.

How do you check whether your page is citation-ready?

Turn the conditions above into a checklist and walk through them — question headings with self-contained answers, fact-and-source density, table structure, semantic HTML and schema, a named author and operating organization, unique information, robots.txt access. zupzup runs this as 8 categories and 84 analyzers, and gives you not a score but a priority order: what to fix first.

zupzup does not track search rankings or AI citation counts. We don't promise what we can't track. Instead, it diagnoses the signals that shape whether your page gets found in search and answered by AI — reporting them as facts, and validating actual reachability, anchor-link reachability, and table accessibility across layers. Analysis stays in your browser; your page content is never sent to a server.

Conclusion / Next steps

It's citability, not rank. AI doesn't take your page whole — it lifts the passage that answers the question and cites that. Self-contained answers, fact-and-source density, table structure, semantic markup: these signals are measurable, and once you measure them, you can see what to fix first.

If you want to see, on one screen, whether your page meets these conditions today, run it through zupzup. Not a score — a direction. Only what we can measure.

→ Diagnose your page with zupzup