Digital Economy

OpenAI renews its challenge to Anthropic: Sol, Terra and Luna – the new GPT-5.6 models for coding and cybersecurity – are here

OpenAI is releasing a preview of three versions of the new GPT-5.6 family to a select group of partners. Sol, the flagship model, outperforms Claude Mythos in code generation with a success rate of 88.8% (91.9% in Ultra mode)

by Riccardo Saporiti

2' min read

Translated by AI
Versione italiana

2' min read

Translated by AI
Versione italiana

They are called Sol, Terra and Luna, and are the three versions of GPT-5.6 released in preview to ‘a select group of trusted partners and organisations’ by OpenAI. And, at least the first of these, reignites the rivalry with Anthropic and its flagship model, Claude Mythos – which is also not available to the general public.

Luna is described as a balance between performance and cost; Terra is comparable to ChatGPT-5.5 but at half the cost; Sol is described in the press release announcing its launch as a ‘next-generation model’. This press release is nothing more than a benchmark comparison between OpenAI’s LLMs and those of Anthropic.

Loading...

Meanwhile, in terms of performance: Sol’s success rate in writing code, as assessed by Terminal-Bench – which is now one of the most widely used tools for comparing AI models – stands at 88.8 per cent and rises to 91.9 per cent when the new ‘Ultra’ reasoning mode is selected. In practice, this is a mode of operation in which an agent coordinates the activities of various sub-agents to achieve the ultimate goal. And what about Claude Mythos 5? It stops at 88 per cent.

That’s not all: GPT-5.6 Terra matches the performance of Claude Fable 5, with a code-writing accuracy rate of 84.3 per cent. Meanwhile, Luna achieves 82.5 per cent, surpassing the 78.9 per cent achieved by Claude Opus 4.8, Anthropic’s most powerful model currently available.

Furthermore, when it comes to cybersecurity, Sol achieves the same results as Mythos whilst using only a third of the tokens. This result was measured using ExploitGym, a benchmarking tool developed by OpenAI itself in collaboration with researchers at the University of Berkeley.

But the challenge is not limited to the technological sphere. ‘In line with the commitments made to the US government, we have provided a preview of our plans and the capabilities of the models before launching them,’ the statement reads, ‘at their request, we are beginning the roll-out to a small group of trusted partners whose participation has been agreed with the administration’. This is, one might imagine, a way of avoiding a ban such as the one imposed on Claude Mythos and Fable by the Department of Commerce, which, incidentally, last Friday authorised the release of the former to around a hundred US organisations.

“We do not believe that this government access procedure should become the norm in the long term,” OpenAI goes on to state. This approach deprives “users, developers, businesses, cyber security experts and global partners of the best tools they need”. This sort of authorisation process prior to release stems from the fact that “we believe it is the most robust path towards the widest possible accessibility of these models in the coming weeks”. In short, the company assures that it intends to collaborate with the Trump administration to define the rules of engagement set out in the executive order on security and innovation in the AI sector, signed in early June. This stands in contrast to the approach taken by Anthropic, sending an implicit message to its competitors.

Copyright reserved ©
Loading...

Brand connect

Loading...

Newsletter

Notizie e approfondimenti sugli avvenimenti politici, economici e finanziari.

Iscriviti