Only 25% of cited sources overlap between ChatGPT‘s different reasoning modes [Study]

Most AI visibility strategies treat ChatGPT as a single system. The data shows that it might not be wise.

When ChatGPT operates in high-reasoning mode, it cites a different set of brands, surfaces different source types, and behaves differently than when it’s in minimal reasoning mode.

Kevin Indig calls this gap between what shows in one model versus another “reasoning lift.” To investigate it, we partnered with Kevin and analyzed data from the Semrush AI Visibility Toolkit.

Here’s what we found:

Key takeaways

ChatGPT with higher reasoning is essentially a different search engine. Only 25.6% of cited domains overlap between minimal and high reasoning for the same prompts. Nearly three in four cited sources are different.

Citation behavior changes dramatically with higher reasoning on. When comparing low reasoning to high reasoning, the citation rate jumps from 50% to 68%, the sources per response nearly double (2.6 → 4.5), and the high-reasoning model fires 4.6x more internal sub-queries.

Source types shift when reasoning turns on. Reddit and other user-generated content (UGC) sites lose roughly half their share of citations in Thinking mode compared to Instant mode, while government, academic, and official documentation sites gain ground.

Under high reasoning, the same brand often stays in the conversation from a buyer’s first question to their last. This happened in 4 of the 20 journeys we tested. Under minimal reasoning, full-funnel persistence was rare.

Switching from minimal to high reasoning affects some industries far more than others. Citation rates for Finance content jump by 28 percentage points. Consumer Tech barely changes.

Top-of-funnel content has real value under high reasoning. Brands cited in a user’s early research questions tend to keep appearing in their later, more specific queries from the same conversation — but only with a high-reasoning mode.

Switching from minimal to high reasoning affects some industries far more than others. Citation rates for Finance content jump by 28 percentage points. Consumer Tech barely changes.

Methodology

We partnered with Kevin Indig from Growth Memo to analyze data from the Semrush AI Visibility Toolkit.

We ran 100 prompts through GPT-5.2 twice: once with minimal reasoning, and once with high reasoning. So, we got 200 total responses.

In ChatGPT’s interface, minimal reasoning corresponds to Instant mode (the default fast-response experience), and high reasoning corresponds to Thinking mode (the deeper, multi-step research mode).

Instant is the default experience, while Thinking mode is designed for more complex, multi-step tasks.

The 100 prompts we analyzed cover 20 buyer journeys across four categories:

B2B SaaS
Finance
Consumer Tech
Health and Lifestyle

Each buying journey breaks into five stages:

Problem: Recognizing a need or pain point
Exploration: Researching what options exist
Comparison: Evaluating alternatives side by side
Validation: Confirming the leading choice
Selection: Committing to a specific brand or product

For each response, we tracked:

Citation rate: The share of responses that cite at least one external source
Average citations: The number of sources per cited response
Fan-out queries: The number of sub-queries the model runs to research a prompt before answering

Let’s explore the findings.

1. High reasoning cites sources and uses web searches much more

When you turn high reasoning on, ChatGPT relies more heavily on active research:

Citation rate: This climbs from 50% in Instant mode to 68% in Thinking mode (+18 percentage points)

Citation rate in minimal vs high reasoning in ChatGPT

Average citations: The number of citations per response nearly doubles from Instant mode to Thinking mode (2.6 to 4.5)
Fan-out queries: The number of sub-queries run is 4.6x higher in thinking mode than in Instant mode

Citations and fan-out queries per response: minimal vs high reasoning in ChatGPT

High reasoning also pulled from 173 unique domains across the test set vs. 127 for minimal reasoning. And 99 of those domains that show using the high-reasoning mode never appear under minimal reasoning at all.

At the same time, high-reasoning mode gives only slightly longer responses. This means that the increase in citations isn’t simply a byproduct of generating more text. Instead, the model is doing substantially more research behind the scenes and packing more evidence into roughly the same length of output.

Average response length: minimal vs high reasoning in ChatGPT

This matters even for free-tier users, because ChatGPT routes complex prompts (comparisons, evaluations, regulatory questions, and other multi-step decisions) into high-reasoning mode automatically.

For brands, the implication is direct: when your audience asks one of those complex questions, you’re not competing for a single placement in one response. You’re competing for visibility across every sub-search the model runs along the way to that answer.

2. Each reasoning mode cites different domains

For the same prompt, only 25.6% of cited domains are shared between minimal- and high-reasoning modes. Almost three in four cited sources are different.

The overall source mix also shifts:

Reddit appearances drop from 15% with low reasoning to 7% with high reasoning
UGC and review sites shrink from 14.3% with low reasoning to 6% with high reasoning
Government and academic sources quadruple from 1.9% with low reasoning to 8.8% with high reasoning
Official documentation and support pages grow from 12.4% with low reasoning to 17.5% with high reasoning
Brands appear almost equally (62.4% with low reasoning v.s 60.6% with high reasoning)

Share of citations by source type: minimal vs high reasoning in ChatGPT

“The brand that wins under minimal reasoning is not the brand that wins under high reasoning. The mix of source types is different. The stages where citations appear are different. These are two different systems.”

— Kevin Indig, Growth Advisor

Here’s the practical implication: If most of your AI citations currently come from Reddit threads, Quora, or UGC review sites, you’re winning via Instant mode but might be losing via Thinking mode.

To balance performance in both modes, focus your content investment on the source types high reasoning actually pulls from.

That means owning more official documentation and reference pages on your own site, publishing original research that gives writers and academics something to cite, and getting your brand referenced in .gov, .edu, and trade-association resources through partnerships, expert contributions, and data sharing.

3. The biggest mode gap shows up early in the buyer journey

The citation rate gap between minimal and high reasoning isn’t constant. It depends on where the user sits in the buyer journey, and what kind of question they’re asking at that point.

To illustrate, a buyer evaluating CRM software might progress through the five stages using these questions:

Problem: “How do I know if my sales team needs a CRM?”
Exploration: “What types of CRM software exist for B2B SaaS?”
Comparison: “HubSpot vs. Salesforce vs. Pipedrive for a 50-person sales team”
Validation: “Is HubSpot worth the price for mid-market B2B companies?”
Selection: “How do I get started with HubSpot Sales Hub?”

Across all 20 journeys, three patterns stood out:

Early in the journey, the two modes barely overlap. At the Problem stage, the citation rate in high reasoning mode is 35 percentage points higher than in minimal reasoning. By the Validation stage, the gap shrinks to 5 points. Minimal-reasoning mode often answers early-funnel questions without citing external sources, while high-reasoning mode is more likely to research and cite them.

The Comparison stage is where high-reasoning mode does the most research. It fires 24 sub-queries per Comparison prompt, compared to 5.5 for minimal reasoning. Average citations per response peak here too: 9.8 with high reasoning vs. 5.8 with minimal reasoning.

At the Selection stage, high reasoning still pulls more sources than minimal reasoning. Each high-reasoning response cites 4.7 sources on average, vs. 2.6 for minimal reasoning. Both modes cite the web heavily here; high reasoning just goes deeper.

Citation rate by buyer journey stage: minimal vs high reasoning in ChatGPT

Across the 100 prompts we tested, minimal reasoning ran 245 web searches in total. High reasoning ran 1,130 web searches, almost 5x more. Most of that extra research happens at the Comparison and Selection stages, when the user is choosing between specific products.

Fan-out queries follow the same shape and are substantially higher under high reasoning at every stage. They spike at Comparison (24 sub-queries per response vs. 5.5 for minimal reasoning) and again at Selection (15.4 vs. 2.6), which are the stages where the model is actively working through specific product options.

Fan-out queries per response by buyer journey stage: minimal vs high reasoning in ChatGPT

When high-reasoning mode gets a prompt like “Salesforce vs. HubSpot vs. Pipedrive for a 50-person sales team,” it doesn’t just search for that specific prompt. It breaks the question into roughly 8 sub-queries (things related to pricing tiers, API integrations, security compliance, and developer documentation) and runs a separate search for each one.

The brand that wins the answer isn’t necessarily the one that ranks for the original prompt. It’s the one that has pages showing up clearly across many of those sub-searches.

How high reasoning in ChatGPT turns one prompt into multiple retrievals

What this means is you shouldn’t dismiss top-of-funnel content as just brand awareness. Most users ask a mix of casual and complex prompts, and the complex ones trigger high-reasoning mode automatically.

Treat your early-funnel content pieces as citation sources. Name your product, methodology, or framework explicitly, so the AI has something to attribute when it surfaces those pages.

4. Under high-reasoning mode, brands persist across the journey

LLM sessions are conversations rather than single queries. So a key question is: Does a brand cited at the start of a journey carry through to the end?

Under high reasoning, yes. Under minimal reasoning, no.

We measured brand persistence by checking whether a brand cited at the Problem stage survived to the Selection stage of the same journey:

Minimal reasoning: No journeys show this kind of full-funnel persistence
High reasoning: Brand continuity is maintained in four of the 20 journeys

High reasoning also returns to the same source more than once within a single answer. In 51 of 100 high-reasoning responses, the same domain appears multiple times in the same response (vs. 26 of 100 for minimal).

This is a different effect than journey persistence: anchoring is about depth (how heavily the model leans on one source within a single answer), while persistence is about continuity (whether the same brand keeps appearing across a multi-step conversation).

“Top-of-funnel content isn’t just brand awareness for AI visibility. Under high-reasoning mode, it’s a leading indicator of where the model lands at decision time.”

— Kevin Indig, Growth Advisor

To ensure brand continuity, audit your AI visibility across full buyer journeys and intent categories. In the AI Visibility Toolkit, open the Questions report and explore the key topics your customers ask AI tools, categorized by intent and funnel stage.

Exploring AI search intent in Semrush AI Visibility Toolkit

Then, analyze the specific questions people ask across each stage and topic.

Exploring audience questions grouped by intent in the Semrush AI Visibility Toolkit

Finally, head to the Narrative Drivers report to see how your brand appears in key conversations across the funnel compared to your competitors.

If you show up for decision-stage prompts (Comparison, Validation, Selection) but not for early-stage ones (Problem, Exploration), that’s a gap worth closing.

With high-reasoning mode, brands cited early in a journey often continue to be cited later, so investing in Problem-stage content can compound your existing Selection-stage visibility.

5. Reasoning lift varies sharply by category

Not all categories we analyzed benefit from increased citation rates equally when the high-reasoning mode turns on. It varies by industry:

Finance: A 28 percentage point increase in citation rate from low reasoning to high reasoning
Health and Lifestyle: A 24 percentage point increase in citation rate from low reasoning to high reasoning
B2B SaaS: A 16 percentage point increase from low reasoning to high reasoning
Consumer Tech: A 4 percentage point increase from low reasoning to high reasoning

Citation rate by content category: minimal vs high reasoning in ChatGPT

Consumer Tech stands out.

Even though high reasoning runs more sub-queries per Consumer Tech prompt (13.4) than any other category we tested, it ends up citing many of the same brands and sources as minimal reasoning.

In other words, the extra research barely changes the Consumer Tech answer, which suggests ChatGPT already has strong internal knowledge of common Consumer Tech topics from its training data and doesn’t need fresh research to land on the same brands.

For Finance and Health brands, optimizing for high reasoning means producing the content the model actively pulls into its sub-searches.

In practice, that means publishing official product documentation, white papers backed by your own data, and structured content (clear claims per section, named entities, explicit stats) the model can pull cleanly into a single sub-query response.

How to adjust your AI visibility strategy for each reasoning mode

The findings suggest minimal-reasoning and high-reasoning behavior shouldn’t be treated as a single visibility surface. They pull from different sources, favor different content types, and can produce very different winners for the same brand.

The goal is not to pick one mode and optimize for it. It’s to make sure you’re visible in both.

Here’s how:

Split your tracking by reasoning mode. Use a tool like Prompt Tracking to group the prompts you already monitor into two buckets: complex queries (multi-criteria evaluation, side-by-side comparisons, regulatory or compliance questions) and simple queries (definitions, single-factor lookups, basic “what is X” questions). Track citation rate, mention rate, and the top cited domains for each bucket separately. Where the two buckets diverge most is where reasoning lift is reshaping who wins.

Build a two-track content strategy. For minimal-reasoning visibility, invest in comparison-stage content, Reddit, and review-site presence, and clear product-focused pages on your own site. For high-reasoning visibility, invest in early-funnel education, official product documentation, white papers, and authoritative reference material that lives at a citable URL.

Map and audit your priority buyer journeys by stage. For each priority journey, write down the question a buyer would ask at each of the five stages (Problem, Exploration, Comparison, Validation, Selection). Then run those questions through ChatGPT with Thinking mode on and note where your brand appears and where it drops out. Stages where you’re missing are your highest-leverage content gaps.

Understanding these differences starts with measuring AI visibility at the prompt and journey level.

The Semrush AI Visibility Toolkit shows you which prompts and intent categories drive your brand’s visibility in AI answers, which sources influence those answers, and how your presence shifts across the buyer journey.

Even without a built-in reasoning-mode filter, that data is what tells you where reasoning lift is most likely to be in play and where to invest in closing the gap.

Source_link