Usa-Iran, se i due belligeranti dichiarano vittoria
di Ugo Tramballi
2' min read
2' min read
Yesterday, Thursday 7 August 2025, the finals of the AI Chess Exhibition Tournament ended: the AI chess tournament organised by Google DeepMind to inaugurate the new benchmarking platform Kaggle Game Arena where the best artificial intelligence LLM models will compete in different games in order to evaluate the strategic and complex reasoning capabilities developed so far. Games, in fact, would be important benchmarks to be able to evaluate the models and, to date, only certain engines such as Stockfish or models such as AlphaZero are capable of playing at even very high levels. On this topic, Kate Olszewska and Meg Risdal, product managers at Google and Kaggle respectively, write in an article published on Google's blog: 'Games offer a clear and unambiguous signal of success. Their defined structure and measurable results make them the ideal test bed for evaluating models and agents. They force models to demonstrate numerous skills, including strategic reasoning, long-term planning and dynamic adaptation against an intelligent opponent, providing a solid indicator of their overall problem-solving intelligence. The value of games as benchmarks is further enhanced by their scalability - the difficulty increases with the intelligence of the opponent - and the ability to analyse and visualise the model's reasoning, offering a glimpse into its strategic thinking process."
Models from Deepseek, Google, Anthropic and Moonshot AI also took part in the knockout competition, however, the final head-to-head match saw Sam Altman's artificial intelligence defeat Elon Musk's Grok 4 with a 4-0 victory; during the semi-finals, however, Grok beat Google's Gemini 2.5 Pro in a play-off, while o3 won over the more agile o4-mini with a 4-0 victory.
Obviously, beyond simply measuring the capabilities of the two models, the challenge took on a more personal significance for Sam Altman and Elon Musk. Ten years ago, the two had founded OpenAI before Musk chose to take an independent route by starting the competing company xAI and taking legal action to prevent OpenAI from becoming a for-profit organisation, contrary to what was agreed at the founding stage.
Nevertheless, this challenge between artificial intelligences marks a symbolic rather than a technical point. None of the models involved were born to perform these tasks; in fact, they are designed to write, generate images, programme and answer even complex questions, but they falter in situations where rigorous logic is required. However, this limitation shows that artificial intelligence, however advanced it may be, has not yet achieved the intelligence that we recognise as properly human: the game of chess, with its complexity, is therefore a test that highlights the ambitions of those who develop these models rather than an actual superiority of machines.