AI Search Engine Deep Dive
How ChatGPT Works
The world's most used AI — and why it plays by completely different rules than Perplexity
Founded
November 2022
Weekly users
800M
Daily queries
1B+
Fortune 500
92%
Architecture
Training + RAG
Cites sources
In search mode
Most guides about getting cited by ChatGPT treat it as a single system. It isn't. ChatGPT has two fundamentally different operating modes — and optimizing for the wrong one is the most common mistake brands make.
Everything on this page is sourced from official OpenAI documentation, peer-reviewed academic papers, and direct public statements from OpenAI's CEO and founders. Where we don't have a verified source, we say so explicitly.
What is ChatGPT?
This is the most important thing to understand about ChatGPT.
ChatGPT does not operate as a single system. It has two fundamentally distinct modes that work in completely different ways — and your visibility strategy depends entirely on which mode is active when your prospect asks their question.
Mode 1 — Training Data Mode: No web retrieval. Answers from the model's learned knowledge. Static — your content cannot enter this mode retroactively. Triggered by general knowledge questions, creative tasks, and reasoning.
Mode 2 — Search Mode (ChatGPT Search): Live web retrieval via Bing and partner providers. Answers from real-time sources with citations. Dynamic — optimizable right now. Triggered by current events, comparisons, "best X in 2026", and time-sensitive queries.
If you're optimizing for ChatGPT citations, you need to focus almost exclusively on Search Mode. Training data has a knowledge cutoff and cannot be influenced by publishing new content today.
Technical architecture
How ChatGPT retrieves and generates answers
When a query triggers Search Mode, ChatGPT follows a documented process before generating a response. But first, you need to understand the three crawlers that feed the system — and how to configure your robots.txt for each.
"If a biotech researcher asked 'what's the latest on the development of drugs that target CCR8 for cancer?', ChatGPT might initially query using 'CCR8 immunotherapy drug development 2025.' After reviewing the initial results, it may send additional queries like 'CHS-114 conference 2025.'"
OpenAI Help Center — ChatGPT search official documentation, direct quote
Three Crawlers, Three Purposes
OpenAI operates three distinct web crawlers, each with a different function. Understanding which one matters for visibility — and how to configure your robots.txt for each — is a technical prerequisite before any content strategy.
GPTBot — Crawls publicly accessible websites to collect training data for OpenAI's language models. This is for the model's static knowledge, not for real-time search results. (robots.txt tag: GPTBot. Impact: long-term, indirect.)
OAI-SearchBot — Indexes content specifically for ChatGPT Search results — the real-time, cited answers users see when ChatGPT searches the web. (robots.txt tag: OAI-SearchBot. Impact: direct — this determines if you appear in cited search responses.)
ChatGPT-User — Accesses content on demand when a user initiates browsing, uses plugins, or triggers GPT Actions. (robots.txt tag: ChatGPT-User. Impact: situational.)
OpenAI has confirmed that OAI-SearchBot and GPTBot share crawl information with each other to avoid duplicate crawling — meaning allowing both maximizes efficiency.
The practical implication: if OAI-SearchBot is blocked in your robots.txt, you cannot appear in ChatGPT's cited search responses — regardless of how well your content is written. This is a binary technical gate.
Query Rewriting and Fan-Out
ChatGPT doesn't send your exact question to a search provider. It rewrites the query — often into multiple targeted sub-queries — to maximize retrieval quality.
This multi-query behavior — called query fan-out — means your content needs to match the sub-queries ChatGPT generates, not just the original user question.
Retrieval via Bing + Partner Providers
ChatGPT Search retrieves content primarily through Microsoft Bing's index, supplemented by partner search providers and direct publisher agreements.
Publisher partnerships announced at ChatGPT Search launch include: Axel Springer, News Corp, Wall Street Journal, The Atlantic, Reuters, Le Monde, Associated Press, Financial Times, Conde Nast, Hearst, and Time. Content from these partners carries elevated trust signals in ChatGPT's retrieval system.
Synthesis and Inline Citations
ChatGPT synthesizes retrieved content into a response with inline citations. Users can hover over citations to preview the source and click through to the original page.
Deep Research — A Different Beast
Standard ChatGPT Search and Deep Research are not the same product. They operate differently, cite differently, and require different strategies. Deep Research is increasingly the mode used by decision-makers doing serious vendor evaluation.
In OpenAI's own words: "Deep research is OpenAI's next agent that can do work for you independently — you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst."
It is powered by the o3-deep-research model — a specialized version of OpenAI's o3, optimized for web browsing and data analysis. A lightweight version using o4-mini-deep-research is also available.
The pipeline works in three steps: (1) Clarification — an intermediate model (gpt-4.1) asks follow-up questions to understand intent, preferences, and constraints. (2) Prompt Rewriting — the same model rewrites the query into a detailed, expanded prompt. (3) Agentic Research — the o3-deep-research model autonomously decomposes the expanded prompt into sub-questions, browses hundreds of sources, and synthesizes a citation-rich report.
Key difference from standard Search Mode: standard ChatGPT Search performs one query and synthesizes immediately. Deep Research performs multiple iterative searches, reads deeply, pivots based on what it finds, and takes 5-30 minutes to complete.
Why this changes everything for B2B brands: when someone runs a Deep Research query like "compare the top AI visibility platforms for enterprise brands in France", the agent will decompose it into sub-questions, browse your website, your reviews on G2 and Capterra, your press mentions, your case studies — and do the same for competitors. Depth of source coverage matters more here than in standard search. Third-party review platforms are read and cited. Your content can be restricted or prioritized by the user — getting your domain onto trusted lists is a long-term strategic asset.
What we know — and what we don't
Intellectual honesty is the point of this page. Most content about ChatGPT optimization mixes verified facts with educated guesses without distinguishing between them. We don't do that.
Confirmed by official sources
- Two distinct operating modes: training data and search mode
- Three separate crawlers with distinct functions (GPTBot, OAI-SearchBot, ChatGPT-User)
- Search mode uses Bing's index + partner providers
- Query rewriting / fan-out behavior: one question becomes multiple sub-queries
- Publisher partnership program with named media partners (Axel Springer, Reuters, Le Monde...)
- 800 million weekly active users (Sam Altman, DevDay, October 2025)
- ChatGPT Search available to all logged-in users globally since February 2025
- Deep Research powered by o3-deep-research model with 3-step agentic pipeline
Not publicly disclosed
- The exact weighting between Bing rankings and ChatGPT's internal scoring
- How domain authority is factored into source selection
- The specific signals that trigger Search Mode vs Training Data Mode
- Whether schema markup directly influences citation probability
- How freshness is scored relative to authority
Google Search vs Perplexity vs ChatGPT Search
The same question, three completely different systems.
| Google Search | Perplexity | ChatGPT Search | |
|---|---|---|---|
| Default mode | Links | Real-time RAG | Training data + optional search |
| Retrieval | Periodic crawl | Real-time, every query | Real-time via Bing (search mode only) |
| Citations | No | Always — inline numbered | Yes — in search mode |
| Crawler | Googlebot | PerplexityBot | GPTBot + OAI-SearchBot + ChatGPT-User |
| Index used | Google's own | Proprietary | Bing + partners |
| Publisher deals | Google News partnerships | Publishers Program | Axel Springer, WSJ, Reuters, Le Monde... |
| Optimization | SEO | GEO — always active | GEO — search mode only |
| Traffic generated | Click-through | Referral possible | Referral possible (search mode) |
The critical difference from Perplexity: Perplexity always retrieves in real time. ChatGPT only retrieves in real time when Search Mode is triggered. For queries that don't trigger search, ChatGPT answers entirely from training data — and there is no optimization lever for that mode beyond long-term brand building across the web.
Practical implications
What this means for your brand's visibility
Five implications derived directly from ChatGPT's confirmed architecture.
1. Optimize for Search Mode — not the base model
Content published today cannot enter ChatGPT's training data retroactively. The only optimizable channel right now is Search Mode, which retrieves via Bing. Strong Bing indexation is therefore a direct prerequisite for ChatGPT citations.
Source: OpenAI Help Center + ChatGPT Search announcement
2. Allow OAI-SearchBot in your robots.txt
Without OAI-SearchBot access, you cannot appear in ChatGPT's search-cited responses. This is the single most impactful technical action you can take today — and it takes five minutes.
Source: OpenAI Developer Documentation
3. Structure content for sub-queries, not just the main question
ChatGPT rewrites queries into multiple targeted sub-queries before searching. Your content needs to answer the specific sub-questions a researcher would ask, not just the broad topic.
Source: OpenAI Help Center, official query rewriting example
4. Publisher partnerships create a trust tier
OpenAI has signed direct agreements with Axel Springer, News Corp, WSJ, Reuters, Le Monde, The Atlantic and others. Earning coverage in these publications is the highest-impact external authority signal for ChatGPT citations.
Source: OpenAI — "Introducing ChatGPT search" announcement, October 2024
5. Deep Research reads deeper than standard search
Deep Research browses hundreds of sources and reads case studies, methodology pages, review platforms. For B2B brands, this means content depth and third-party presence on G2, Capterra, and Trustpilot directly impacts citation in vendor evaluations.
Source: OpenAI — "Introducing Deep Research", February 2025
Frequently asked questions about ChatGPT
How is ChatGPT different from Perplexity?
Does ChatGPT cite its sources?
What is the difference between ChatGPT Search and Deep Research?
How do I get my brand cited by ChatGPT?
Which robots.txt bot should I allow for ChatGPT?
Sources cited on this page
Every factual claim on this page is sourced. We link to primary sources directly.
- OpenAI Help Center — ChatGPT search: how it works [source] Official documentation
- OpenAI — Introducing ChatGPT search (October 2024, updated February 2025) — October 2024 [source] Official documentation
- OpenAI Developer Documentation — Overview of OpenAI Crawlers [source] Official documentation
- Sam Altman — OpenAI DevDay keynote (reported by TechCrunch) — October 2025 Founder statement
- OpenAI — Introducing Deep Research — February 2025 [source] Official documentation
- OpenAI API Documentation — Deep Research guide [source] Official documentation
- Aggarwal et al. — GEO: Generative Engine Optimization (KDD 2024, Princeton / IIT Delhi) — 2024 [source] Academic paper
- Profound — Analysis of 730,000 ChatGPT conversations with citations, Oct-Dec 2025 — 2025 [source] Independent study
Other AI search engines
The reasoning engine that searches when it needs to — not by default
Read deep dive → Google GeminiOne model, many surfaces — and one robots.txt tag that determines if your brand gets cited
Read deep dive → Google AI OverviewsThe AI feature that reaches more people than any other product in the world
Read deep dive → GrokThe only AI engine trained on real-time social media data — and what that means for your brand
Read deep dive → Microsoft CopilotThe only AI engine that retrieves from both the public web and your organization's private data
Read deep dive → Perplexity AIThe answer engine that cites its sources
Read deep dive →Does your brand appear when your prospects ask ChatGPT about what you do?
Most brands don't know. Storyzee runs systematic prompt testing across Perplexity, ChatGPT, Gemini and Claude — and turns the results into a score out of 100 with a prioritized action plan.