Digital Economy

Claude makes it four: Opus 4 and Sonnet 4 arrive and change everything

Anthropic aims to turn artificial intelligence into a true working partner: more precise, more autonomous, more human.

23 May 2025

3' min read

The race to have the best performing gen Ai does not stop. Anthropic relaunches the challenge in the world of artificial intelligence with Claude Opus 4 and Claude Sonnet 4, the evolution of its proven models designed to tackle the most complex tasks - from software development to content generation and multi-step reasoning - marking a concrete leap forward towards its stated goal: to turn AI into a true virtual collaborator.

With Claude 4, Anthropic is aiming high: 'we want to set a new standard for man-machine collaboration'. And that is not just a claim. The new models are able to support prolonged activities, integrate external tools, maintain information consistency and solve problems on a large scale. In short: more reliable, more intelligent, more useful.

Claude Opus 4: the AI that programmes (better than most humans)

Opus 4 is the flagship model and, according to Anthropic, the best coding model in the world. The benchmarks speak for themselves with 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, results that place it at the top of international rankings for real programming tasks. In testing, he managed to work autonomously on a complex project for almost seven consecutive hours. A feat that has impressed companies such as Rakuten, Replit and Cursor, who describe it as a tool capable of writing code on multiple files, fixing bugs, following complex instructions and maintaining consistency on complex projects.

Claude Sonnet 4: controlled power, refined thinking

Claude Sonnet 4 also makes a quantum leap over its predecessor, version 3.7. It scores 72.7 per cent on SWE-bench, responds more accurately to instructions, handles codebases more effectively and solves complex problems with more refined reasoning. GitHub has already integrated it into its new Copilot agent, while companies such as Sourcegraph, iGent and Augment Code emphasise its positive impact on code quality, navigation and autonomy in multifunctional tasks.

Both models are hybrid, i.e. capable of providing instantaneous answers or of activating a prolonged thinking mode, so-called 'extended thinking'. During this phase, models can access external tools, such as web searches or local files, alternating reasoning and action in a fluid and coordinated manner. Not only that, they can use multiple tools in parallel, improve responses and build a persistent memory. When authorised by developers, they are able to save and update relevant information, maintaining cognitive continuity over articulated projects and over time.

Claude 4 also shows remarkable progress on the behavioural side. According to Anthropic, the new models reduce shortcut behaviour by 65 per cent compared to the previous version, avoiding hasty, often invented and unfinished solutions in favour of more accurate and articulate answers. In addition, a thought synthesis system has been introduced: in longer passages, an auxiliary model summarises the reasoning for the end user, maintaining transparency without sacrificing security. For advanced users, a Developer Mode is available, giving access to the entire model thinking flow.

Claude Code: for developers

Another significant change is the general availability of Claude Code, the suite of tools that brings Claude into the heart of the development environment. Users can now integrate the model directly in the terminal, in IDEs such as VS Code and JetBrains, or in the background via SDK. Claude Code supports GitHub Actions and allows users to build custom AI agents with an extensible SDK. A beta integration with GitHub is also available, which can be installed with a simple command. On the API front, Anthropic introduces four new features: a code execution tool, an MCP connector, a file management API and the ability to cache prompts for one hour. Features designed to extend agent capabilities

AI built with Claude and make them even more flexible and autonomous.

Feedback from companies that have already tested the new models confirms the promises. Yusuke Kaji, General Manager AI at Rakuten, recounts how Opus 4 programmed autonomously for almost seven hours on an open source project, describing it as a huge leap in Ai capabilities. Pablo Arredondo of Thomson Reuters highlighted the model's effectiveness on an extremely complex legal task, while Michele Catasta of Replit extolled its accuracy in multi-file editing. Block, Databricks, Cognition and Manus also praised the new features, from execution speed to contextual understanding and the quality of responses in concrete scenarios.