Cybersecurity

Cybersecurity, what are the dangers of Ai models without 'guardrails'?

The confrontation between the Pentagon and Anthropic over the use of unrestricted artificial intelligence models reignites the debate between military needs and protection from technological risks.

by Alessia Valentini

27 February 2026

4' min read

Translated by AI

Versione italiana

4' min read

Translated by AI

Versione italiana

The recent clash between the Pentagon demanding unrestricted AI and the Anthropic company resisting in order to keep the security 'guardrails' in place has renewed a historic dilemma over the use of technological potential versus the need to protect against misuse. Anthropic calls for caution on the use of AIs so that they are not used as autonomous weapons and in mass surveillance programmes, while the Pentagon threatens to cancel contracts or force resistance by invoking national emergency regulations (Defense Production Act). The reaction of those who observe the forces at work, produces conflicting opinions, shifting the focus to ethics, when the really central issue is rather how AI equipped with 'guardrails', are technologically more resilient and thus more effective for defence, and military uses. An unrestricted AI, in fact, would be much more manipulable by digital adversaries for misuse, including those capable of making it work against its own principals. If the Pentagon continued along its line, it could expose itself to a serious boomerang, because AI is vulnerable, as security studies confirm, and its resilience must be guaranteed, as confirmed by Luca Sambucci, an AI security expert and founder of Noctive Security, and Enrico Frumento, a cybersecurity researcher at Cefriel.

AI vulnerable

The recent study titled 'AI Skills as an Emerging Attack Surface in Critical Sectors: Enhanced Capabilities, New Risks published by TrendAI, a business unit of Trend Micro, highlights concrete vulnerabilities related to AI used in defence processes and in particular in Security Operation Centres (SOCs) where digital attacks are monitored and blocked. The AI skills used in SOCs to automate alert classification, correlation rules and response programmes have become a valuable target for cyber criminals interested in manipulating them to evade detection and minimise the severity of incidents, or worse in other sectors, to manipulate AI-managed trading activities in the financial sector, to interference in clinical decisions in the healthcare sector. The study highlights a new attack pattern aimed at compromising AI skills, which must be protected in order to be resilient against manipulation and misuse. This is made all the more urgent in view of a potential massive adoption in SOC security operations in Italia, as seems to emerge from the results of a Kaspersky survey.

AI resilience for defence effectiveness

Granted, no computer system is 100% secure, but having AI systems/agents equipped with security measures gives them greater resilience against adverse manipulative attempts. A first confirmation comes from Luca Sambucci 'security measures applied on the model with training techniques against attacking adversaries (adversarial training, n.d.r.) and continuous realistic simulations of attacks (continuos red-teaming, n.d.r.) can make the AI more robust. Also people safety measures (safety, n.d.r.) make AI more adherent to policies and operational limits and less functional for unwanted or illegitimate uses under adverse inputs'. In addition to the above-mentioned security measures, Enrico Frumento confirms that 'resilience is also practised with filters on tools and actions, access control, system logs, monitoring of anomalies and the possibility of rollback', clarifying the importance 'for AI to be able to ignore hostile prompts and instructions, staying within the limits even 'under pressure' and having traceable and verifiable output'. This proves how 'security engineering applied in defence reduces the possibilities of attack: from prompt injection to data poisoning to escalation via tools, without forgetting that zero risk is, and remains, a mirage'.

Boomerang effect

If AI equipped in this way is more resilient and therefore more effective for defence, then a request to have it unrestricted would seem to be a boomerang request, due to the possibility of an adversary using that AI against its principal or worse. Enrico Frumento confirms that 'this is a realistic scenario, especially if the wording 'unrestricted' means no constraints on use combined with direct integration with operating systems, data, sensors, commands' because 'an adversary would not have to steal the model, but would only need to exploit its operational chain. In fact, AIs execute commands without intelligence, being static machines capable of correlation between elements of a multi-dimensional space' and concludes, 'the request is therefore a boomerang because it increases the attack surface on the AI itself: more capacity conjugated to fewer 'guardrails' means more ways to be used against the principal'.

AI manipulation

Luca Sambucci in this regard adds how such a demand for unrestricted AI 'and without security blocks, perhaps hides the belief that one knows how to implement protection from manipulation by the enemy. But today no one knows how to do this in a reliable and standardised way, especially when AI is connected to high-privilege data and tools'. Hence 'reasonable concern about how to protect AI with traditional security systems (firewalls) and the usual perimeter defences is illusory. The AI known today is a complex system that is profoundly different from traditional software, both in how it is assembled and how it is used. It is subject to training with an immense amount of data that could conceal targeted poisoning attempts, which are only evident once the attack has taken place'. A final observation: 'such a compromised AI, instead of failing spectacularly and obviously, could slightly deflect decisions, day after day, with changes so small that no one would notice, but just enough to achieve a military result over time. It is a type of compromise that is more 'cognitive' than executive and today there are no standardised and universally effective solutions for this type of attack'.