AI Search Engine Deep Dive

How ChatGPT Works

The world's most used AI — and why it plays by completely different rules than Perplexity

Founded

November 2022

Weekly users

800M

Daily queries

1B+

Fortune 500

92%

Architecture

Training + RAG

Cites sources

In search mode

Most guides about getting cited by ChatGPT treat it as a single system. It isn't. ChatGPT has two fundamentally different operating modes — and optimizing for the wrong one is the most common mistake brands make.

Everything on this page is sourced from official OpenAI documentation, peer-reviewed academic papers, and direct public statements from OpenAI's CEO and founders. Where we don't have a verified source, we say so explicitly.

What is ChatGPT?

This is the most important thing to understand about ChatGPT.

ChatGPT does not operate as a single system. It has two fundamentally distinct modes that work in completely different ways — and your visibility strategy depends entirely on which mode is active when your prospect asks their question.

Mode 1 — Training Data Mode: No web retrieval. Answers from the model's learned knowledge. Static — your content cannot enter this mode retroactively. Triggered by general knowledge questions, creative tasks, and reasoning.

Mode 2 — Search Mode (ChatGPT Search): Live web retrieval via Bing and partner providers. Answers from real-time sources with citations. Dynamic — optimizable right now. Triggered by current events, comparisons, "best X in 2026", and time-sensitive queries.

If you're optimizing for ChatGPT citations, you need to focus almost exclusively on Search Mode. Training data has a knowledge cutoff and cannot be influenced by publishing new content today.

Technical architecture

How ChatGPT retrieves and generates answers

When a query triggers Search Mode, ChatGPT follows a documented process before generating a response. But first, you need to understand the three crawlers that feed the system — and how to configure your robots.txt for each.

"If a biotech researcher asked 'what's the latest on the development of drugs that target CCR8 for cancer?', ChatGPT might initially query using 'CCR8 immunotherapy drug development 2025.' After reviewing the initial results, it may send additional queries like 'CHS-114 conference 2025.'"
OpenAI Help Center — ChatGPT search official documentation, direct quote

Three Crawlers, Three Purposes

OpenAI operates three distinct web crawlers, each with a different function. Understanding which one matters for visibility — and how to configure your robots.txt for each — is a technical prerequisite before any content strategy.

GPTBot — Crawls publicly accessible websites to collect training data for OpenAI's language models. This is for the model's static knowledge, not for real-time search results. (robots.txt tag: GPTBot. Impact: long-term, indirect.)

OAI-SearchBot — Indexes content specifically for ChatGPT Search results — the real-time, cited answers users see when ChatGPT searches the web. (robots.txt tag: OAI-SearchBot. Impact: direct — this determines if you appear in cited search responses.)

ChatGPT-User — Accesses content on demand when a user initiates browsing, uses plugins, or triggers GPT Actions. (robots.txt tag: ChatGPT-User. Impact: situational.)

OpenAI has confirmed that OAI-SearchBot and GPTBot share crawl information with each other to avoid duplicate crawling — meaning allowing both maximizes efficiency.

The practical implication: if OAI-SearchBot is blocked in your robots.txt, you cannot appear in ChatGPT's cited search responses — regardless of how well your content is written. This is a binary technical gate.

Confirmed: OpenAI Developer Documentation — "Overview of OpenAI Crawlers" (developers.openai.com).

Query Rewriting and Fan-Out

ChatGPT doesn't send your exact question to a search provider. It rewrites the query — often into multiple targeted sub-queries — to maximize retrieval quality.

This multi-query behavior — called query fan-out — means your content needs to match the sub-queries ChatGPT generates, not just the original user question.

Confirmed: OpenAI Help Center documentation, direct quote with official example.

Retrieval via Bing + Partner Providers

ChatGPT Search retrieves content primarily through Microsoft Bing's index, supplemented by partner search providers and direct publisher agreements.

Publisher partnerships announced at ChatGPT Search launch include: Axel Springer, News Corp, Wall Street Journal, The Atlantic, Reuters, Le Monde, Associated Press, Financial Times, Conde Nast, Hearst, and Time. Content from these partners carries elevated trust signals in ChatGPT's retrieval system.

Confirmed: Multiple OpenAI sources. Publisher partnerships announced at ChatGPT Search launch (October 2024).

Synthesis and Inline Citations

ChatGPT synthesizes retrieved content into a response with inline citations. Users can hover over citations to preview the source and click through to the original page.

Confirmed: OpenAI Help Center — "ChatGPT responses that use search will contain inline citations."

Deep Research — A Different Beast

Standard ChatGPT Search and Deep Research are not the same product. They operate differently, cite differently, and require different strategies. Deep Research is increasingly the mode used by decision-makers doing serious vendor evaluation.

In OpenAI's own words: "Deep research is OpenAI's next agent that can do work for you independently — you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst."

It is powered by the o3-deep-research model — a specialized version of OpenAI's o3, optimized for web browsing and data analysis. A lightweight version using o4-mini-deep-research is also available.

The pipeline works in three steps: (1) Clarification — an intermediate model (gpt-4.1) asks follow-up questions to understand intent, preferences, and constraints. (2) Prompt Rewriting — the same model rewrites the query into a detailed, expanded prompt. (3) Agentic Research — the o3-deep-research model autonomously decomposes the expanded prompt into sub-questions, browses hundreds of sources, and synthesizes a citation-rich report.

Key difference from standard Search Mode: standard ChatGPT Search performs one query and synthesizes immediately. Deep Research performs multiple iterative searches, reads deeply, pivots based on what it finds, and takes 5-30 minutes to complete.

Why this changes everything for B2B brands: when someone runs a Deep Research query like "compare the top AI visibility platforms for enterprise brands in France", the agent will decompose it into sub-questions, browse your website, your reviews on G2 and Capterra, your press mentions, your case studies — and do the same for competitors. Depth of source coverage matters more here than in standard search. Third-party review platforms are read and cited. Your content can be restricted or prioritized by the user — getting your domain onto trusted lists is a long-term strategic asset.

Confirmed: OpenAI — "Introducing Deep Research" (February 2025). OpenAI Help Center — Deep Research FAQ. OpenAI API Documentation — Deep Research guide.

What we know — and what we don't

Intellectual honesty is the point of this page. Most content about ChatGPT optimization mixes verified facts with educated guesses without distinguishing between them. We don't do that.

Confirmed by official sources

Two distinct operating modes: training data and search mode
Three separate crawlers with distinct functions (GPTBot, OAI-SearchBot, ChatGPT-User)
Search mode uses Bing's index + partner providers
Query rewriting / fan-out behavior: one question becomes multiple sub-queries
Publisher partnership program with named media partners (Axel Springer, Reuters, Le Monde...)
800 million weekly active users (Sam Altman, DevDay, October 2025)
ChatGPT Search available to all logged-in users globally since February 2025
Deep Research powered by o3-deep-research model with 3-step agentic pipeline

Not publicly disclosed

The exact weighting between Bing rankings and ChatGPT's internal scoring
How domain authority is factored into source selection
The specific signals that trigger Search Mode vs Training Data Mode
Whether schema markup directly influences citation probability
How freshness is scored relative to authority

Google Search vs Perplexity vs ChatGPT Search

The same question, three completely different systems.

	Google Search	Perplexity	ChatGPT Search
Default mode	Links	Real-time RAG	Training data + optional search
Retrieval	Periodic crawl	Real-time, every query	Real-time via Bing (search mode only)
Citations	No	Always — inline numbered	Yes — in search mode
Crawler	Googlebot	PerplexityBot	GPTBot + OAI-SearchBot + ChatGPT-User
Index used	Google's own	Proprietary	Bing + partners
Publisher deals	Google News partnerships	Publishers Program	Axel Springer, WSJ, Reuters, Le Monde...
Optimization	SEO	GEO — always active	GEO — search mode only
Traffic generated	Click-through	Referral possible	Referral possible (search mode)

The critical difference from Perplexity: Perplexity always retrieves in real time. ChatGPT only retrieves in real time when Search Mode is triggered. For queries that don't trigger search, ChatGPT answers entirely from training data — and there is no optimization lever for that mode beyond long-term brand building across the web.

Practical implications

What this means for your brand's visibility

Five implications derived directly from ChatGPT's confirmed architecture.

1. Optimize for Search Mode — not the base model

Content published today cannot enter ChatGPT's training data retroactively. The only optimizable channel right now is Search Mode, which retrieves via Bing. Strong Bing indexation is therefore a direct prerequisite for ChatGPT citations.

Source: OpenAI Help Center + ChatGPT Search announcement

2. Allow OAI-SearchBot in your robots.txt

Without OAI-SearchBot access, you cannot appear in ChatGPT's search-cited responses. This is the single most impactful technical action you can take today — and it takes five minutes.

Source: OpenAI Developer Documentation

3. Structure content for sub-queries, not just the main question

ChatGPT rewrites queries into multiple targeted sub-queries before searching. Your content needs to answer the specific sub-questions a researcher would ask, not just the broad topic.

Source: OpenAI Help Center, official query rewriting example

4. Publisher partnerships create a trust tier

OpenAI has signed direct agreements with Axel Springer, News Corp, WSJ, Reuters, Le Monde, The Atlantic and others. Earning coverage in these publications is the highest-impact external authority signal for ChatGPT citations.

Source: OpenAI — "Introducing ChatGPT search" announcement, October 2024

5. Deep Research reads deeper than standard search

Deep Research browses hundreds of sources and reads case studies, methodology pages, review platforms. For B2B brands, this means content depth and third-party presence on G2, Capterra, and Trustpilot directly impacts citation in vendor evaluations.

Source: OpenAI — "Introducing Deep Research", February 2025

Frequently asked questions about ChatGPT

How is ChatGPT different from Perplexity?

The fundamental difference: Perplexity always retrieves from the web in real time for every query. ChatGPT has two modes — it only searches the web when Search Mode is triggered. For other queries, it answers entirely from training data with no web retrieval and no citations. This means optimizing for ChatGPT requires a different strategy focused specifically on Search Mode triggers and Bing indexation.

Does ChatGPT cite its sources?

Only in Search Mode. When ChatGPT searches the web, responses include inline citations that users can hover over to preview and click through to the original source. In Training Data Mode (no web search), there are no citations — the model answers from its learned knowledge without attribution.

What is the difference between ChatGPT Search and Deep Research?

ChatGPT Search performs a quick web search, retrieves a few sources, and synthesizes an answer in seconds. Deep Research is an agentic mode that autonomously decomposes your question into sub-queries, browses hundreds of sources over 5-30 minutes, and produces a comprehensive citation-rich report. Deep Research is increasingly used for vendor evaluation and competitive analysis.

How do I get my brand cited by ChatGPT?

Focus on Search Mode: ensure OAI-SearchBot is allowed in your robots.txt, optimize for Bing indexation, structure content to answer sub-queries (not just broad topics), build presence on publisher partner platforms, and maintain rich profiles on review sites like G2 and Capterra for Deep Research visibility.

Which robots.txt bot should I allow for ChatGPT?

For search citations, OAI-SearchBot is essential — it's what indexes content for ChatGPT Search results. GPTBot feeds training data (long-term influence). ChatGPT-User handles on-demand browsing. OpenAI confirms these bots share crawl data, so allowing all three maximizes efficiency and visibility.

Sources cited on this page

Every factual claim on this page is sourced. We link to primary sources directly.

OpenAI Help Center — ChatGPT search: how it works [source] Official documentation
OpenAI — Introducing ChatGPT search (October 2024, updated February 2025) — October 2024 [source] Official documentation
OpenAI Developer Documentation — Overview of OpenAI Crawlers [source] Official documentation
Sam Altman — OpenAI DevDay keynote (reported by TechCrunch) — October 2025 Founder statement
OpenAI — Introducing Deep Research — February 2025 [source] Official documentation
OpenAI API Documentation — Deep Research guide [source] Official documentation
Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024, Princeton / IIT Delhi) — 2024 [source] Academic paper
Profound — Analysis of 730,000 ChatGPT conversations with citations, Oct-Dec 2025 — 2025 [source] Independent study

Other AI search engines

Claude

The reasoning engine that searches when it needs to — not by default

Read deep dive → Google Gemini

One model, many surfaces — and one robots.txt tag that determines if your brand gets cited

Read deep dive → Google AI Overviews

The AI feature that reaches more people than any other product in the world

Read deep dive → Grok

The only AI engine trained on real-time social media data — and what that means for your brand

Read deep dive → Microsoft Copilot

The only AI engine that retrieves from both the public web and your organization's private data

Read deep dive → Perplexity AI

The answer engine that cites its sources

Read deep dive →

Does your brand appear when your prospects ask ChatGPT about what you do?

Most brands don't know. Storyzee runs systematic prompt testing across Perplexity, ChatGPT, Gemini and Claude — and turns the results into a score out of 100 with a prioritized action plan.

Get your free AI Visibility demo All AI search engines