DeepMind: here are the AI models that help robots perform their tasks
Called Gemini Robotics-ER 1.5, where 'ER' stands for Embodied Reasoning, it is already available via Google AI Studio's Gemini API and is the latest version of the language model from DeepMind, the Google unit that under the Alphabet umbrella is engaged in the race ahead on artificial intelligence. Its prerogative? The ability to think longer to help robots detect objects and understand the physical world more accurately and contextually, find information in real time on the Web (via Google Search) and transfer skills from one machine to another. A thinking and strategic 'brain' that estimates the progress and assesses the risk of a given process and whose abilities, as confirmed by those directly involved (the post on DeepMind's blog is very illustrative in this regard), allow a second agent model, Gemini Robotics 1.5 (accessible only to selected partners), to control the robots' actions in the real world, pushing them beyond elementary actions and training them to perform more complex tasks.
From simple tasks to multi-step tasks: the machine thinks before it acts
.The two new Gemini models actually fulfil two important requirements of the paradigm that is increasingly bringing robotics to intersect with artificial intelligence: the first ('ER') deploys functions to orchestrate reasoning, while the second is configured as a Vision Language Action (VLA) system capable of converting visual information and linguistic instructions into motor commands, thus enabling the execution and transfer of certain actions in the real world. Google's intention seems clear: to provide robots with capabilities that will elevate them from the status of mere executors to that of intelligent agents capable of adapting to environments and making a new leap forward compared to the first releases of the Gemini Robotics models, which took place last spring, and to the intuition of bringing multimodal AI (that of Gemini 2.0) inside physical platforms, increasing the level of capabilities both at the level of reasoning and action. How does this design translate into practice? It's easy to say: from single, simple operations such as folding a sheet of paper or opening a zip fastener, robots will be able, thanks to the new models, to manage multi-step activities, such as separating laundry by colour or packing a suitcase taking into account the weather at a given destination, or sorting waste according to regulations consulted online. The summary of this evolution can be seen in the words of Carolina Parada, head of robotics at DeepMind, who explained how 'with this update we are moving from single instructions to a real understanding and resolution of physical problems', with the aim of enabling the machines to 'think several steps ahead before acting'. The robot, in other words, processes a sequence of reasoning in natural language, unpacks complex tasks into simpler operations and describes its decisions, making the process more transparent and using its ability to learn to transfer instructions to another robot, avoiding the need to train each individual platform. In essence, there is a shift from the ability to execute instructions to the ability to build logical chains to solve problems, paying more attention to the security factor in the form of semantic checks and constantly updated datasets to ensure compliance with physical and behavioural constraints.
Opportunities and obstacles in a competition between giants
.The sharing of skills between different robots is one of the main and distinctive features of Gemini Robotics 1.5, and DeepMind has demonstrated in this regard how the skills developed on ALOHA2 (a system with two mechanical arms) work without adaptation on the two-armed Franka and even on Apptronik's Apollo humanoid. And this, as they point out at Google, is a prospect that opens up unified AI models capable of controlling robots with very different configurations and enabling, as already explained, the transfer of knowledge from one machine to another. While the potential is therefore undoubtedly considerable, the obstacles still to be overcome for a large-scale deployment of these technologies are several and concern, first and foremost, the need to further raise the level of dexterity, reliability and safety proper to robots piloted by AI models. What is intuitive for humans, DeepMind's own engineers admit, is still extremely complex for machines. On the other hand, the progressive advancement of the capacity to 'process' data on a large scale and the reasoning capabilities of models such as Gemini suggest that a drastic turning point (as ChatGPT was for GenAI) for artificial intelligence applications linked to robotics cannot be ruled out. What is certain is that there are many who want to play (and win) this game, from Google to OpenAI to Tesla, all of which are engaged in the development of robots with embedded generative tools. In the background, the goal is common: to revolutionise entire sectors such as manufacturing, logistics and healthcare with increasingly autonomous machines.


