Gemini 2.5, our most intelligent AI model, is being shown off today. An experimental version of 2.5 Pro, our first 2.5 release, debuts at #1 on LMArena by a significant margin and is cutting-edge on a wide range of benchmarks. Gemini 2.5 models are thinking models that can think through their thoughts before responding, which makes them better at what they do and more accurate.
A system’s capacity for “reasoning” in the field of artificial intelligence encompasses more than just classification and prediction. It means that it can analyze data, draw logical conclusions, take into account context and nuance, and make well-informed choices. For a long time, we’ve explored ways of making AI smarter and more capable of reasoning through techniques like reinforcement learning and chain-of-thought prompting. Building on this, we recently introduced our first thinking model, Gemini 2.0 Flash Thinking.
With Gemini 2.5, we have combined a significantly improved base model with enhanced post-training to achieve a new level of performance. In the future, we will directly incorporate these thinking capabilities into all of our models so that they can support even more capable, context-aware agents and handle more complex problems.
The Gemini 2.5 Pro is here! Our most advanced model for challenging tasks is Gemini 2.5 Pro Experimental. It has a significant lead on the LMArena leaderboard, which measures consumer preferences, indicating a highly capable model with high-quality style. 2.5 Pro also excels in common coding, math, and science benchmarks thanks to its strong reasoning and code capabilities. Vertex AI will soon have access to Gemini 2.5 Pro, which is currently available in Google AI Studio and the Gemini app for Gemini Advanced users.
In the coming weeks, we will also introduce pricing, allowing users to use 2.5 Pro with higher rate limits for scaled production. Improved reasoning Gemini 2.5 Pro performs admirably on a variety of benchmarks that call for sophisticated reasoning. In math and science benchmarks like GPQA and AIME 2025, 2.5 Pro leads without cost-increasing test-time techniques like majority voting. On Humanity’s Last Exam, a dataset created by hundreds of subject matter experts to capture the human frontier of knowledge and reasoning, it also scores a state-of-the-art 18.8% across models without the use of tools. sophisticated coding We’ve been focusing on coding performance, and Gemini 2.5 has made a significant improvement over 2.0, with more to come. In addition to editing and transforming code, 2.5 Pro excels at creating visually appealing web applications and agentic code applications. Gemini 2.5 Pro scores 63.8% with a custom agent setup on SWE-Bench Verified, the industry standard for agentic code evaluations. By generating the executable code from a single line prompt, 2.5 Pro can use its reasoning capabilities to create a video game.
Building on the best of Gemini
Native multimodality and a lengthy context window are two features of Gemini models that are built upon in Gemini 2.5. With strong performance that surpasses that of previous generations, 2.5 Pro ships today with a context window that holds one million tokens (million more are on the way). It is able to comprehend large datasets and deal with complicated problems from a variety of information sources, such as text, audio, images, video, and even entire code repositories. Gemini 2.5 Pro is now available for developers and businesses to experiment with in Google AI Studio. Gemini Advanced users can select it from the model dropdown on desktop and mobile devices. In the coming weeks, it will be on Vertex AI. We always welcome feedback so that we can continue to rapidly enhance Gemini’s impressive new abilities with the intention of making our AI more useful.