Products

Gemini 3 Flash updates with Agentic Vision: here's how it works

Google introduces AI capable of actively analysing images

by Jader Liberatore

3' min read

Translated by AI
Versione italiana

3' min read

Translated by AI
Versione italiana

 

Agentic Vision can interpret the content of the images, formulate the code to make changes and provide more accurate answers

Loading...

 

Despite being advanced, most artificial intelligence models capable of reading images perform a static scan by observing the input with a single glance; in fact, the loss of a less visible detail such as a road sign in the distance or a barely readable serial code of a microchip would force them to have to guess.

In this regard, Google has announced the debut of an important new feature in Gemini 3 Flash and christened it Agentic Vision, aimed at converting image comprehension into an agentic process involving reasoning so that the model can return more accurate feedback: the combination of visual reasoning with code execution, in fact, allows the model to generate plans to inspect, zoom in and manipulate images and provide answers based on visual evidence. In other words, the model provides a cycle consisting of three different phases: Think performs an analysis of the query and the initial image, Act executes Python code to actively intervene on the images with cropping, rotations, annotations and analysis, and finally, Observe adds the processed image to the model's context window to analyse it with data from a better context before generating a final answer.

From Gemini to start-ups, there are many developers who have already integrated this functionality: for instance, the artificial intelligence-based PlanCheckSolver.com platform, used for validating floor plans, has improved accuracy by 5 per cent by enabling code execution with Gemini 3 Flash to inspect high-resolution input.

Of course, this is only the beginning, as future updates include the introduction of implicit behaviour in the model, as well as the addition of further tools such as web search and reverse image search as well as the extension of the functionality to other model dimensions besides Flash. As announced in the blogpost by Rohan Doshi, Product Manager at Google DeepMind, Agentic Vision is already available via the Gemini API in Google AI Studio and Vertex AI; the release is also underway in the Gemini app by selecting the Thinking item from the model drop-down menu.

But there is another important news for developers, namely the introduction of the premium benefits of the Google Developer Program also in the Google AI Pro and Google AI Ultra subscriptions at no extra cost, in order to bridge the gap from an idea written in chat to an app distributed online. Until now, GDP Premium was the gateway to more powerful models such as Gemini 3 Pro and was ideal for developing and testing new solutions, but when a project was ready to be taken into production, the path became complicated: separate billing management on Google Cloud was needed, introducing unnecessary obstacles in a flow that should have been seamless. The inclusion of GDP Premium in the Google AI Pro and Google AI Ultra plans changes this scenario as, thanks to the integrated Google Cloud credits, the resources needed for deployment are already available: the result is a more seamless experience that allows one to go from experimentation to publication without interruption, using the same tools from start to finish. Those with an active subscription can therefore take advantage of the new benefits by visiting the Google Developer Programme.

Copyright reserved ©
Loading...

Brand connect

Loading...

Newsletter

Notizie e approfondimenti sugli avvenimenti politici, economici e finanziari.

Iscriviti