Back to glossary
Technical

Chunking (Passage Retrieval)

Chunking is the process by which AI engines slice web pages into smaller, semantically coherent passages — typically a few hundred tokens each — that can be independently indexed, retrieved, and cited.

What is Chunking (Passage Retrieval)?

Chunking is the unglamorous, structural process that determines whether your content has any chance of being retrieved by an AI engine. Before any large language model generates an answer, the source content it might draw from has been broken down into smaller units — chunks — that are individually embedded, indexed, and made retrievable. A single web page is rarely retrieved as a whole; instead, it is split into anywhere from a handful to several dozen passages, each of which competes independently for relevance to a user query. The chunk, not the page, is the atomic unit of AI retrieval — and once you internalize that, a great deal of GEO strategy stops being abstract and becomes concrete.

The mechanics matter. Chunking strategies vary across engines and indexing pipelines, but most use some combination of fixed token windows (for example, 256 to 512 tokens per chunk), semantic boundary detection (splitting at paragraph or section breaks), and overlap (where chunks share some content with their neighbors to preserve context). The output of this process is a collection of self-contained passages, each tagged with metadata about its source page, position, and surrounding structure. When a user query is processed, the engine retrieves chunks — not URLs — and the language model composes its answer from the chunks that scored highest on semantic relevance to the query.

The strategic implication for content creators is direct and counterintuitive: writing long, flowing prose that builds an argument across many paragraphs may be excellent for a human reader but actively harmful for AI retrieval. If the answer to a likely user question is spread across three paragraphs that depend on each other, no single chunk will fully contain the answer, and the page may simply not surface. By contrast, a page that opens each section with a self-contained, BLUF-style answer — followed by supporting context — produces chunks that each carry retrievable, citable value on their own. This is why FAQ pages, structured comparison tables, and definition-led entries (exactly like the one you are reading) tend to dominate AI citations far in excess of their traditional SEO weight.

Chunking is also why technical content extractability — clean HTML, semantic headings, structured data, and proper paragraph breaks — translates so directly into AI visibility outcomes. A page where the HTML structure mirrors the logical structure of the content gives the chunker clean boundaries to split on, producing coherent, self-contained passages. A page built as one undifferentiated block of text, or as a JavaScript-rendered single-page application with no clear DOM structure, gives the chunker nothing to work with and produces fragmented, low-relevance passages. Two pages with identical word-for-word content can therefore have radically different AI visibility outcomes purely on the basis of how they are structured for chunking.

Why it matters

Key points about Chunking (Passage Retrieval)

1

The chunk — not the page or the URL — is the atomic unit of AI retrieval, meaning every paragraph and section on a page competes independently for visibility in AI-generated answers

2

Self-contained passages dramatically outperform argument-style prose: an answer that lives entirely inside one chunk is retrievable, while an answer split across three dependent paragraphs may not surface at all

3

HTML and structural quality directly affect chunking quality — clean semantic markup, proper headings, and clear paragraph breaks give the chunker coherent boundaries, while undifferentiated text blocks produce fragmented, low-value chunks

4

BLUF-style writing, FAQ blocks, comparison tables, and definition-led sections are disproportionately effective for AI visibility precisely because they produce chunks that are individually complete and citable

5

Two pages with identical content can have very different AI visibility outcomes based purely on how they are structured for chunking — making content architecture, not just content quality, a core GEO discipline

Frequently asked questions about Chunking (Passage Retrieval)

How large is a typical chunk?
Most production AI retrieval systems use chunks in the range of 200 to 800 tokens, with 512 tokens being a common default. Some engines use much smaller passages (100 to 200 tokens) for high-precision retrieval, and some use larger ones (up to 1,500 tokens) when context preservation matters more than retrieval granularity. The exact size is rarely disclosed by AI engines, but the strategic implication is constant: each section of your content should make sense at roughly paragraph-to-short-section scale.
Do AI engines use the same chunking strategy?
No — chunking strategies vary across engines, retrieval pipelines, and even across query types within the same engine. Perplexity, Google AI Overviews, ChatGPT Search, and Gemini all use different chunking and embedding approaches, and these are continuously tuned. The practical takeaway is that you cannot optimize for one specific chunk size; instead, write content that produces coherent, self-contained passages at multiple scales.
Can I control how my content gets chunked?
Indirectly, yes. You cannot dictate chunk boundaries to an AI engine, but you strongly influence them through HTML structure, heading hierarchy, paragraph length, list formatting, and structured data. A page with semantic HTML, clear <h2> and <h3> boundaries, well-bounded paragraphs, and consistent FAQ or definition patterns will be chunked far more cleanly than one without — and the resulting chunks will carry more retrievable value.
How does Chunking relate to embeddings?
Chunking comes first, embeddings come second. The page is split into chunks, each chunk is then converted into an embedding (a high-dimensional vector representing its meaning), and those embeddings are what get stored in the retrieval index. When a user query arrives, the query is also embedded and matched against the chunk embeddings via vector search. Bad chunking produces incoherent embeddings; good chunking produces clean, semantically focused embeddings that retrieve well.
Does Chunking apply to PDFs and other document formats?
Yes. AI engines that index PDFs, Word documents, and other formats apply chunking to those as well, often using format-specific heuristics (page breaks, section headings, table boundaries). The same principles apply: well-structured documents with clear sections and self-contained passages chunk cleanly and surface in AI answers; long, undifferentiated documents do not. This is particularly relevant for B2B brands publishing whitepapers, research reports, and technical documentation.

Related terms

BLUF (Bottom Line Up Front)

A content structuring principle originating from military communication that places the most critical information — the conclusion, recommendation, or key takeaway — in the opening sentence or paragraph, ensuring that readers and AI extraction systems capture the essential message even if they process nothing else.

Read definition →
Content Extractability

Content extractability measures how easily AI engines can identify, isolate, and cite specific pieces of information from your web content — determined by factors including BLUF structure, heading hierarchy, clean HTML, citable claims, FAQ blocks, and the separation of distinct ideas into parseable units that AI retrieval systems can process and quote.

Read definition →
Embeddings (Vector Search)

Embeddings are mathematical representations of text — high-dimensional vectors in which semantically similar concepts cluster together — that allow AI engines to retrieve content based on meaning rather than exact keyword matches.

Read definition →
RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is the mechanism by which AI engines fetch real-time information from the web, databases, or document repositories and inject it into the language model's context window before generating an answer — enabling AI systems like Perplexity, Google AI Overviews, and ChatGPT with browsing to produce responses grounded in current, source-backed data rather than relying solely on static training knowledge.

Read definition →

Want to measure your AI visibility?

Our AI Visibility Intelligence Platform analyzes your brand across ChatGPT, Perplexity, Gemini, Claude and Grok — and turns these concepts into actionable scores.