Technology

Introducing Gemini 2.5 Deep Think: Google's 'deep thinking' Ai

It is the most advanced version (for the time being) of the Gemini 2.5 model

6 August 2025

3' min read

It is the most advanced version (for the time being) of the Gemini 2.5 model, and its prerogative can be summarised as follows: to tackle complex problems with an unprecedented ability to reason. Deep Think, which Google has started to make available only to subscribers to the AI Ultra plan ($249.99 per month) via a dedicated button in the prompt bar, thus becomes an 'add on' in its own right after its official christening at I/O 2025 and is the best performing LLM tool in terms of 'problem solving' thanks to an approach based on so-called parallel thinking. More time to think and a depth of analysis that takes a new step forward are thus the cornerstones of a technology capable of simultaneously exploring multiple hypotheses, evaluating them and combining them over time to arrive at the most effective and coherent solution.

Mathematics Champion

Other Gen AI models, such as Grok 4 (the most advanced tool of Elon Musk's xAI) have already integrated the benefits of 'parallel thinking', but some of the best-known benchmarks in this field (e.g. LiveCodeBench, a test dedicated to programming) have recognised Gemini 2.5 Deep Think's far superior performance qualities compared to those shown during its presentation a few months ago. Its ability, as several tech sites have reported in recent hours, is particularly evident for scientific and mathematical problems of high complexity. In a demo, Deep Think achieved results equivalent to a bronze medal at the International Mathematical Olympiad (IMO) 2025, a remarkable achievement for a model that can be used on a daily basis.

Gemini 2.5 Deep Think - benchmarks

In fact, at the top of the ranking in this benchmark is an academic variant of this model, which can take hours to arrive at an answer, whereas the current release is faster, preserves the depth of reasoning, and generates longer, more articulate and refined answers, also thanks to compatibility with tools such as Google Search. Looking at the current capabilities deployed by the various players in generative AI, in short, Google's new multi-agent intelligence surpasses that of OpenAI o3 and Grok 4 in all the official tests in the public domain, raising the bar even higher in the sphere of code writing, where it is the sparse Mixture-of-Experts architecture (also employed in GPT-4 and in the more advanced versions of Mixtral) that makes the difference, capable of selectively activating the best parameters for each token.

The areas of use

The key to Deep Think's operation is, as mentioned above, that of 'parallel thinking', i.e. both the AI's ability not to follow a single logical path but to simultaneously evaluate a multitude of ideas, an approach that gives the Gemini model significantly more inference time (referred to as 'thinking time'), prompting it to explore extended reasoning paths and consequently take on more notions in order to solve problems in a more intuitive, creative and efficient manner. The potential of the tool, as observed by some insiders, is especially evident in 'tasks' requiring iterative development such as website design, where Deep Think has shown that it can improve both aesthetics and functionality, and code optimisation. In general, as confirmed by Google, the technology exploits 'reinforcement' learning techniques specifically developed to encourage longer and more articulated reasoning, which is essential to raise the level of accuracy in advanced mathematical problems or to interpret particularly complex scientific texts.

Possible limits: from power to safety

Google has announced that in the coming weeks it will also make Deep Think's functionalities available via APIs for the benefit of developers and companies, paving the way for new applications in the professional and research spheres, and has in fact also admitted that its latest creature requires greater computational resources than traditional models. And then, as someone pointed out, there could be security issues. Deep Think's official model card highlights how the model touches the CBRN Uplift Level 1 alert threshold, essentially opening the door to malicious use of the technology in chemical, biological, radiological or nuclear contexts. The critical risk thresholds, BigG points out in this regard, have not been exceeded and ad hoc measures have been implemented to monitor usage and block abusive accounts. There is then a further issue related to the tone of the replies, which are indeed more 'delicate' but also tend to overlook 'harmless' requests more frequently than they should. As all LLM models approach the thresholds of 'critical capability levels', the question of finding the right balance between thinking ability and the dangerousness of the output provided to the user becomes more central than ever in the debate on the new frontiers of AI.