Digital Economy

Anthropic launches Claude Opus 4.5: Ai giants battle over code

New model tops benchmarks with half price and a challenge that redefines the boundaries of artificial intelligence

by Marco Trabucchi

26 November 2025

4' min read

Translated by AI

Versione italiana

4' min read

Translated by AI

Versione italiana

The war on artificial intelligence has a new battleground: autonomous programming. And this week saw a tight duel between two giants vying for technological supremacy. On the one hand Anthropic, with its Claude Opus 4.5 launched last Monday. On the other Google, which a few days earlier had introduced the Gemini 3, including the Pro model. A rivalry that is not just a matter of prestige, but is worth billions of dollars and redraws the balance of the tech sector.

Claude first for coding

When it comes to AI models, benchmarks are the ultimate yardstick. And here the battle gets interesting. According to data published by the company, Claude Opus 4.5 scored 80.9 per cent on SWE-bench Verified, one of the most frequently cited tests to assess real-world problem-solving capabilities from GitHub repositories. A result that puts it ahead of all others: OpenAI's GPT-5.1-Codex-Max stops at 77.9 per cent, its predecessor Claude Sonnet 4.5 at 77.2 per cent, and Gemini 3 Pro - its direct rival - comes in at 76.2 per cent.

These differences may seem subtle, but in the world of AI, every percentage point counts. Especially when it comes to solving real software engineering problems. SWE-bench Verified analyses 500 authentic issues from GitHub repositories, problems that human developers have actually faced and solved. The ability of a model to understand the context, navigate a complex codebase and produce a working solution is the true test of practical intelligence.

But Anthropic does not limit itself

to claim the gold medal for coding. On OSWorld, the benchmark that measures the ability to use a computer as a human would, Claude Opus 4.5 scores 66.3 per cent, confirming it as the absolute best model for 'computer use' - the ability to navigate interfaces, click buttons, fill in forms as a human would.

The speed with which these models improve is significant in the race for better AI. Anthropic released Haiku 4.5 in October, Sonnet 4.5 in September, and now Opus 4.5. Three different models in three months. OpenAI responded with múltiple variants of GPT-5 during 2025, including Codex Max in November. Google unleashed Gemini 3 after months of development, with a performance leap that surprised even insiders.

With an effective summary, developer and AI expert Simon Willison commented: 'Models improve faster than our ability to evaluate them,' adding: 'Benchmarks struggle to keep up. It's a real problem: when models consistently exceed 70-80% on standard tests, you have to invent more difficult tests. But this also makes it more complicated to understand real progress'. A real problem for those who have to compare technologies that change practically every quarter.

Prices, revenues and strategies: where another game is played

Besides pure numbers, the other battle is monetisation. Anthropic has chosen an aggressive strategy by halving the price of its top-of-the-range model: Claude Opus 4.5 costs USD 5 per million tokens in input and USD 25 in output, compared to USD 15 for its predecessor Opus 4.1. This is a drastic cut, designed to make the model affordable for large-scale use, and makes it more competitive than its competitors GPT-5.1 and the Gemini 3 Pro itself. "It's not just about having the most powerful model," explains Alex Albert, head of developer relations at Anthropic. "It's about making it usable at scale. Opus 4.5 requires fewer tokens to solve the same problems, which means lower operating costs for those who use it intensively."

A strategy that follows an established direction: Anthropic declared $2 billion in annualised revenues in Q1 2025, more than double the previous period. In addition, the number of customers spending over $100,000 per year has grown eightfold. Maintaining this pace requires a model that companies can afford to use on a massive scale.

Beyond the tests: what really changes

The real difference, however, emerges when these models are put to the test in the real world. Anthropic claims that Opus 4.5 would score higher than any human candidate in the programming test used internally to select engineers. This is a company assessment, not an independent one, but it signals a crucial point: models are entering territories hitherto the exclusive domain of experienced developers.

Google, for its part, plays a different game: less focused on the 'single best model', more on distribution. Gemini 3 is already integrated in Search, the Gemini app, AI Studio and Vertex AI. According to data released by the company, the app has more than 650 million monthly users, while more than 13 million developers use Google's tools to build AI applications. This is a widespread distribution that no competitor can match, not even OpenAI with ChatGPT. Anthropic, on the other hand, focuses on model quality, security and alignment: according to its internal data, Opus 4.5 is among the most resistant to prompt injection attacks and the most predictable in its behaviour.

Who wins?

For now, on pure coding benchmarks, Claude Opus 4.5 has a measurable advantage. But Gemini 3 Pro dominates on advanced mathematical reasoning, multimodal comprehension and some general reasoning tests. These are different models, optimised to excel in different areas, and to reduce the choice to a simple 'winner' would be reductive. The real game will be played in the coming months, when millions of developers, companies and users will choose which model to use for their real projects. And there, in addition to benchmarks, reliability, cost, integration with existing tools, and quality of support will count. One thing is certain: this rivalry is not going to subside and 2026 promises to be the year of real competition in artificial intelligence.