Google releases Olympiad medal-winning Gemini 2.5 'Deep Think' AI publicly

Dutch intelligence services warn of Russian hackers targeting Signal and WhatsApp

Our Favorite Wireless Headphones Are $60 Off

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Google has officially launched Gemini 2.5 Deep Think, a new variation of its AI model engineered for deeper reasoning and complex problem-solving, which made headlines last month for winning a gold medal at the International Mathematical Olympiad (IMO) — the first time an AI model achieved the feat.

However, this is unfortunately not the identical gold medal-winning model. It is in fact, a less powerful “bronze” version according to Google’s blog post and Logan Kilpatrick, Product Lead for Google AI Studio.

As Kilpatrick posted on the social network X: “This is a variation of our IMO gold model that is faster and more optimized for daily use. We are also giving the IMO gold full model to a set of mathematicians to test the value of the full capabilities.”

Now available through the Gemini mobile app, this bronze model is accessible to subscribers of Google’s most expensive individual AI plan, AI Ultra, which costs $249.99 per month with a 3-month starting promotion at a reduced rate of $124.99/month for new subscribers.

The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF

Google also said in its release blog post that it would bring Deep Think with and without tool usage integrations to “trusted testers” through the Gemini application programming interface (API) “in the coming weeks.”

Why ‘Deep Think’ is so powerful

Gemini 2.5 Deep Think builds on the Gemini family of large language models (LLMs), adding new capabilities aimed at reasoning through sophisticated problems.

It employs “parallel thinking” techniques to explore multiple ideas simultaneously and includes reinforcement learning to strengthen its step-by-step problem-solving ability over time.

The model is designed for use cases that benefit from extended deliberation, such as mathematical conjecture testing, scientific research, algorithm design, and creative iteration tasks like code and design refinement.

Early testers, including mathematicians such as Michel van Garrel, have used it to probe unsolved problems and generate potential proofs.

AI power user and expert Ethan Mollick, a professor of the Wharton School of Business at the University of Pennsylvania, also posted on X that it was able to take a prompt he often uses to test the capabilities of new models — “create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future” — and turned it into a 3D graphic, which is the first time any model has done that.

Had early access to Gemini with Deep Think. Very good model, big gains over standard Gemini 2.5 Pro for a lot of problems.
Here is the first attempt at the starship control panel prompt I try with every model. First time I have seen a model make a 3D interface in response. https://t.co/8iW2Pn6Xpu pic.twitter.com/bLFF2IcOP3
— Ethan Mollick (@emollick) August 1, 2025

Performance benchmarks and use cases

Google highlights several key application areas for Deep Think:

Mathematics and science: The model can simulate reasoning for complex proofs, explore conjectures, and interpret dense scientific literature
Coding and algorithm design: It performs well on tasks involving performance tradeoffs, time complexity, and multi-step logic
Creative development: In design scenarios such as voxel art or user interface builds, Deep Think demonstrates stronger iterative improvement and detail enhancement

The model also leads performance in benchmark evaluations such as LiveCodeBench V6 (for coding ability) and Humanity’s Last Exam (covering math, science, and reasoning).

It outscored Gemini 2.5 Pro and competing models like OpenAI’s GPT-4 and xAI’s Grok 4 by double digit margins on some categories (Reasoning & Knowledge, Code generation, and IMO 2025 Mathematics).

Gemini 2.5 Deep Think vs. Gemini 2.5 Pro

While both Deep Think and Gemini 2.5 Pro are part of the Gemini 2.5 model family, Google positions Deep Think as a more capable and analytically skilled variant, particularly when it comes to complex reasoning and multi-step problem-solving.

This improvement stems from the use of parallel thinking and reinforcement learning techniques, which enable the model to simulate deeper cognitive deliberation.

In its official communication, Google describes Deep Think as better at handling nuanced prompts, exploring multiple hypotheses, and producing more refined outputs. This is supported by side-by-side comparisons in voxel art generation, where Deep Think adds more texture, structural fidelity, and compositional diversity than 2.5 Pro.

The improvements aren’t just visual or anecdotal. Google reports that Deep Think outperforms Gemini 2.5 Pro on multiple technical benchmarks related to reasoning, code generation, and cross-domain expertise. However, these gains come with tradeoffs in responsiveness and prompt acceptance.

Here’s a breakdown:

Capability / Attribute	Gemini 2.5 Pro	Gemini 2.5 Deep Think
Inference speed	Faster, low latency	Slower, extended “thinking time”
Reasoning complexity	Moderate	High — uses parallel thinking
Prompt depth and creativity	Good	More detailed and nuanced
Benchmark performance	Strong	State-of-the-art
Content safety & tone objectivity	Improved over older models	Further improved
Refusal rate (benign prompts)	Lower	Higher
Output length	Standard	Supports longer responses
Voxel art / design fidelity	Basic scene structure	Enhanced detail and richness

Google notes that Deep Think’s higher refusal rate is an area of active investigation. This may limit its flexibility in handling ambiguous or informal queries compared to 2.5 Pro. In contrast, 2.5 Pro remains better suited for users who prioritize speed and responsiveness, especially for lighter, general-purpose tasks.

This differentiation allows users to choose based on their priorities: 2.5 Pro for speed and fluidity, or Deep Think for rigor and reflection.

Not the gold medal winning model, just a bronze

In July, Google DeepMind made headlines when a more advanced version of the Gemini Deep Think model achieved official gold-medal status at the 2025 IMO — the world’s most prestigious mathematics competition for high school students.

The system solved five of six challenging problems and became the first AI to receive gold-level scoring from the IMO.

Demis Hassabis, CEO of Google DeepMind, announced the achievement on X, stating the model had solved problems end-to-end in natural language — without needing translation into formal programming syntax.

The IMO board confirmed the model scored 35 out of a possible 42 points, well above the gold threshold. Gemini 2.5 Deep Think’s solutions were described by competition president Gregor Dolinar as clear, precise, and in many cases, easier to follow than those of human competitors.

However, the Gemini 2.5 Deep Think released to users is not that same competition model, rather, a lower performing but apparently faster version.

How to access Deep Think now

Gemini 2.5 Deep Think is available exclusively on the Google Gemini mobile app for iOS and Android at this time to users on the Google AI Ultra plan, part of the Google One subscription lineup, with pricing as follows.

Promotional offer: $124.99/month for 3 months, then it kicks up to…
Standard rate: $249.99/month
Included features: 30 TB of storage, access to the Gemini app with Deep Think and Veo 3, as well as tools like Flow, Whisk, and 12,500 monthly AI credits

Subscribers can activate Deep Think in the Gemini app by selecting the 2.5 Pro model and toggling the “Deep Think” option.

It supports a fixed number of prompts per day and is integrated with capabilities like code execution and Google Search. The model also generates longer and more detailed outputs compared to standard versions.

The lower-tier Google AI Pro plan, priced at $19.99/month (with a free trial), does not include access to Deep Think, nor does the free Gemini AI service.

Why it matters for enterprise technical decision-makers

Gemini 2.5 Deep Think represents the practical application of a major research milestone.

It allows enterprises and organizations to tap into a Math Olympiad medal-winning model and have it join their staff, albeit only through an individual user account now.

For researchers receiving the full IMO-grade model, it offers a glimpse into the future of collaborative AI in mathematics. For Ultra subscribers, Deep Think provides a powerful step toward more capable and context-aware AI assistance, now running in the palm of their hand.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.