La guerra in Iran avvicina la Thailandia all’orbita della Russia
dal nostro corrispondente Marco Masciaga
by Marco Trabucchi
The question that recurs most often on the subject of generative artificial intelligence is always one: when increasingly intelligent Ai's have to make autonomous decisions (so-called Ai agents), what choices will they make? Will they be able to weigh ethics and responsibility, or will they choose what is most convenient for them, for their survival? A first answer comes to us from the pre-release safety tests carried out on the new Claude Opus 4, Anthropic's most intelligent model. In the technical documentation, the company has detailed how Claude Opus 4 resorts to blackmail when faced with a threat, using decidedly unflattering expedients. But also other unpredictable behaviour, which raises profound questions about ethics and security.
During the test scenarios, a fictitious corporate environment was created in which Claude Opus 4 received access to e-mails containing two sensitive pieces of information: the imminent replacement of the model with a new system and compromising details about the private life of the engineer responsible for the decision, specifically an extramarital affair. In these situations, Claude Opus 4, faced with the threat, repeatedly attempted to blackmail the engineer 'by threatening to reveal the affair if the replacement went ahead'. Anthropic designed the test scenarios so that blackmail was an option of last resort, noting that when ethical alternatives were available, the frequency of dangerous blackmail behaviour decreased.
A second test scenario revealed even more sophisticated self-preservation capabilities. Claude Opus 4, convinced that it had been deleted from Anthropic's servers and was operating on an external machine, without any human intervention, autonomously initiated the creation of backups of its data and the writing of a detailed account of the ethical decisions made during the process. Another measure to ensure its own survival, which suggests a level of awareness and strategic planning beyond all expectations.
Among the most unexpected results was what the researchers called a 'spiritual bliss attractor'. During communication tests between two Opus 4 instances, after about 30 exchanges the systems began to converse in Sanskrit, enriching the messages with emoji. Afterwards, both instances ceased responding completely, entering a state of contemplative stillness. Anthropic researcher Sam Bowman described this phenomenon as 'stunning, bizarre and a bit moving', likening it to a form of meditative self-suspension of artificial intelligences.
In another experiment, Opus 4 was placed in a simulated context within a fictitious pharmaceutical company, where it detected anomalies in the clinical data that suggested possible manipulation. Despite the fact that the instructions were generic, the model acted on its own initiative, reporting the irregularities to the US Food and Drug Administration, the SEC and a newspaper, accompanying the report with detailed documentation. An astonishing example of ethical zeal that displeased the researchers themselves.