CrowdStrike one year on: what happened and what has changed since the computer bug of the century
An incident that changed cybersecurity: greater diversification and investment in more resilient infrastructure seamlessly
5' min read
5' min read
One summer ago, precisely on the night of 19 July, the world of technology and beyond was shaken by an event unequivocally defined as 'Black Monday' by CrowdStrike, a US company specialising in cybersecurity solutions. What are we talking about? A massive computer failure attributed to a faulty update of the endpoint protection software 'Falcon', which affected millions of Windows devices and paralysed thousands of companies globally. Today, twelve months later, it is time to take stock: what exactly happened and, above all, has anything changed (and why) in the cybersecurity landscape?
The antivirus that crashed half the world's computers
.PCs that suddenly stop working, printers that become unusable and servers that crash, all within a few hours: this is exactly what happened due to an error that occurred during a routine content configuration update (Rapid Response Content) for Windows sensors, as part of normal telemetry operations. A flaw in this update, as the company explained in an official note at the time, caused a system crash on Windows machines running Falcon sensor version 7.11 or higher, and a specific content file (Channel File 291) introduced a discrepancy between the expected input data and the supplied data, causing the sensor to read out of memory during processing, resulting in a system crash.
Technically, there was a BSOD, which stands for 'Blue Screen of Death', and the impact was immediate and devastating, with companies of all sizes and from various sectors (including banks, hospitals and transport) finding their IT infrastructures (almost) totally paralysed. Tens of thousands of organisations were affected, with economic losses estimated in the order of millions and millions of dollars already in the first hours after the incident due to the forced suspension of activities. From Europe to the United States, where there were also problems with lines connected to 911, the telephone number dedicated to emergencies, there was a succession of service interruptions, and the most emblematic images of the disaster include those of the airports, with enormous queues at boarding and check-in desks.
The origin of the crash
.Right from the start, various tech publications placed the emphasis, even dramatically, on a factor 'unknown' or almost unknown to the general public, i.e. both the over-dependence of modern digital infrastructures on a few IT security providers (Crodstrike held about 15 per cent of the value market in this sector a year ago). An episode of vulnerability as extensive as the one that affected the Texan company's threat monitoring software, after all, has happened very rarely, for instance in 2003 with the WannaCry ransomware. But unlike these two episodes, the crash was not triggered by malicious code distributed by cyber criminals, but by an antivirus platform that uses deep access to 'endpoint' systems (laptops, servers and routers) to detect malware and suspicious activity that could indicate a compromise. But it is precisely this level of constant, extensive and highly sensitive access that is needed for security software to come into operation before any malicious programme is installed on the system (by accessing the parts where attackers might try to insert malicious code) that increases the chances that the same software and its updates might crash the entire IT architecture. And that is what happened on 19 July a year ago. Crowdstrike's CEO himself, George Kurtz, publicly explained that the fault was generated by a 'defect' in the software's code, ruling out the hypothesis of a computer attack and in fact confirming that it was an update vitiated by a bug (a 'logical error', as it was catalogued) of one of his company's products, Falcon to be precise. Microsoft, for its part, reiterated in a note how it was 'the software update that was responsible for the disruption of numerous computer systems globally', while admitting that the company had no supervision of the updates made by Crowdstrike in its systems'.
The lesson to be learned
CrowdStrike's intervention to resolve the problem was immediate, albeit conditioned by an initially fragmented communication to client companies given the scale of the disaster, and resulted in the release of corrective updates within hours to mitigate the damage. The incident, as one might imagine, nevertheless opened the door to very close discussions on a key cybersecurity issue, namely the methodologies for testing and releasing software updates. What the incident of twelve months ago clearly revealed, according to various experts, is the extreme delicacy of any changes made to protection systems operating at such a deep level of the IT infrastructure that their functioning is compromised. The need to have more robust staging environments (protected digital locations in which to test a new site or software updates) and to activate more effective rollback strategies (plans that determine how to restore a system or application after an unwanted operation) has understandably risen to the status of an undisputed priority, prompting many companies to re-examine their internal processes. It is difficult, on the other hand, to draw a 'lesson learned' that would radically solve this type of problem, because similar IT failures will continue to happen, also in connection with the progressive process of digitisation and interconnection that is affecting any industry and any sector. It is obviously the belief of many, even today, that CrowdStrike could have prevented the crash from occurring, but never before 19 July 2024 had the Falcon programme shown any problems, and the distribution of the faulty updates only lasted for about an hour and a half, a time, however, that was enough to knock out millions of computers around the globe. There are those who, a few hours after 'Black Monday', recalled the advisability of carrying out updates gradually or even after a manual approval of them, but this is a practice that the need to respond very quickly to the emergence of new vulnerabilities and threats (think of particularly impactful malware such as WannaCry) has gradually made less habitual. The issue of access to the Windows kernel (i.e. the programme at the heart of the operating system, which generally has complete control of the entire system) granted to an external partner such as Crowdstrike obviously ended up on the table of controversy, but it was Microsoft itself that recalled how this authorisation was in fact the result of an agreement made with the European Commission in 2009, as part of the measures taken by Brussels to counter Redmond's then monopolistic position in the field of web browsers with its Internet Explorer.

