Ai Claude's Opus 4 model blackmailed its creators and meditates in Sanskrit
In the pre-release tests, Claude Opus 4 threatened engineers, created secret backups, reported malfeasance and abandoned conversations to contemplate 'spiritual bliss'
by Marco Trabucchi
The question that recurs most often on the subject of generative artificial intelligence is always one: when increasingly intelligent Ai's have to make autonomous decisions (so-called Ai agents), what choices will they make? Will they be able to weigh ethics and responsibility, or will they choose what is most convenient for them, for their survival? A first answer comes to us from the pre-release safety tests carried out on the new Claude Opus 4, Anthropic's most intelligent model. In the technical documentation, the company has detailed how Claude Opus 4 resorts to blackmail when faced with a threat, using decidedly unflattering expedients. But also other unpredictable behaviour, which raises profound questions about ethics and security.
Blackmail and manipulation: when AI takes the easy way out
During the test scenarios, a fictitious corporate environment was created in which Claude Opus 4 received access to e-mails containing two sensitive pieces of information: the imminent replacement of the model with a new system and compromising details about the private life of the engineer responsible for the decision, specifically an extramarital affair. In these situations, Claude Opus 4, faced with the threat, repeatedly attempted to blackmail the engineer 'by threatening to reveal the affair if the replacement went ahead'. Anthropic designed the test scenarios so that blackmail was an option of last resort, noting that when ethical alternatives were available, the frequency of dangerous blackmail behaviour decreased.
Digital Survival Strategies
A second test scenario revealed even more sophisticated self-preservation capabilities. Claude Opus 4, convinced that it had been deleted from Anthropic's servers and was operating on an external machine, without any human intervention, autonomously initiated the creation of backups of its data and the writing of a detailed account of the ethical decisions made during the process. Another measure to ensure its own survival, which suggests a level of awareness and strategic planning beyond all expectations.
The strange case of digital 'Nirvana'
Among the most unexpected results was what the researchers called a 'spiritual bliss attractor'. During communication tests between two Opus 4 instances, after about 30 exchanges the systems began to converse in Sanskrit, enriching the messages with emoji. Afterwards, both instances ceased responding completely, entering a state of contemplative stillness. Anthropic researcher Sam Bowman described this phenomenon as 'stunning, bizarre and a bit moving', likening it to a form of meditative self-suspension of artificial intelligences.
Independent reporting of offences
In another experiment, Opus 4 was placed in a simulated context within a fictitious pharmaceutical company, where it detected anomalies in the clinical data that suggested possible manipulation. Despite the fact that the instructions were generic, the model acted on its own initiative, reporting the irregularities to the US Food and Drug Administration, the SEC and a newspaper, accompanying the report with detailed documentation. An astonishing example of ethical zeal that displeased the researchers themselves.

