Microsoft users across the globe should review the state of their infrastructure security setups in the wake of the botched CrowdStrike software update that took millions of Windows devices across the world offline on Friday 19 July 2024.
As stated in a blog post, authored by Microsoft’s vice-president of enterprise and operating system security, David Weston, on 20 July 2024, “this was not a Microsoft incident” but one that “impacts our ecosystem” and had disrupted the businesses and “the daily routines of many individuals”.
According to Microsoft’s calculations, around 8.5 million Windows devices, which equates to less than 1% of the global total of Windows machines in use, were affected by the incident.
And while that percentage might seem small in the grand scheme of things, Owen Sayers – an independent security consultant with more than 20 years’ experience advising public sector and policing clients on how to secure their systems – said the numbers involved are “terrifying” when coupled with information gleaned from CrowdStrike’s own incident reporting blog.
As confirmed by its “Technical details: Falcon content update for Windows hosts” blog, the corrupt software update that caused the Friday 19 July outage was only online for 78 minutes before it was taken down and replaced with a fixed version. “It affected less than 1% of global Windows devices in that time – that’s impressive,” said Sayers, but it also has worrying implications for the state of our global IT systems.
Knowing that a bug in a third-party security product could wreak so much havoc in such a short amount of time could give nation-state hackers some food for thought on how to wage their next wave of attacks.
“The Chinese and Russians now know how to bring down global IT systems – just find a security product used by your target, and modify that code,” said Sayers. “And there is a damn good chance it’ll wipe them out within an hour and a half.”
Travel disruption
The CrowdStrike incident caused travel disruption at major airports and train stations, as well as affecting the day-to-day operations of GP surgeries, retailers and other businesses running Microsoft technologies. And, in some cases, its effects are still being felt days later.
“Folks like to think about outages like this in terms of full-day terms or even a weekend [of disruption being caused] due to the ongoing effect, but when you distil the cause down to lasting less than an hour and a half, it gets more impactful,” said Sayers.
“This time the error was in a third-party product that only a very small number of organisations use, but look at the scale and the spread of the damage.”
With this in mind, what would happen if a third-party product with higher rates of take-up within the Microsoft user community suffered a similar botched software update? Or, if Microsoft rolled out an operating system or service pack update to its user base that similarly risked bricking its customers’ devices?
It might sound like a scaremongering question to ask, but Eric Grenier, director analyst at market watcher Gartner for Technical Professionals, told Computer Weekly that any IT supplier who “hooks into” the Windows kernel in a similar way to CrowdStrike could suffer a similar fate if they were to release a defective update.
“You can even go a step higher and say that every vendor who releases an update has the potential to release a ‘bad patch’,” he said.
For this reason, Grenier said the situation should give the entire software industry pause for thought to ensure they do not become the next CrowdStrike. “This is a good time for everyone in the software industry to review their quality assurance processes as well as their software update testing processes, and fortify them the best they can,” he added.
User protection
End-user organisations whose Windows systems were unaffected by the Friday 19 July update should view the situation as a wake-up call, rather than a lucky escape, said Rich Gibbons, head of IT asset management market development and engagement at independent software licensing advisory Synyega.
“If your organisation avoided this issue, [it is] likely because they are not a CrowdStrike customer, so take this as a wake-up call,” he told Computer Weekly.
“Unfortunately, all organisations are open to the risk of their business being negatively impacted by a third-party supplier making a huge error. Accepting that risk and having a strong disaster recovery and business continuity planning [strategy] is key and must be a priority for every business.”
Having robust IT asset management (ITAM) and software asset management (SAM) systems in place is also a must, continued Gibbons. “Knowing what software and hardware you have, where it is, [as well as its] support and end of life status, last patch and update time and data are also key having an effective disaster recovery and business continuity plan, whether resources are on-premise or in the cloud within hybrid environments,” he said.
As Gartner’s Grenier points out, having a disaster recovery and business continuity strategy in place is one thing, but enterprises must also be sure to test them regularly.
“This is not the last time a vendor will release a ‘bad patch’, so to mitigate risk, client organisations will need to review [these] strategies and actually test them to be sure they meet the standard they are looking for in terms of ‘time-to-recovery’,” he said.
“Organisations should also take the opportunity to review which applications in their environment are on ‘auto-update’ and gauge the potential fall-out from a ‘bad update’.”
That’s not to say Grenier is advocating that there should be a blanket switch-off of the “auto-update” functionality across enterprises around the world to mitigate the risk of another CrowdStrike occurring.
“That should be determined by the organisation’s risk acceptance level and whether they can patch applications themselves,” he said. “Enterprises should have a documented inventory of which applications are set to ‘auto-update’, whether or not you can turn ‘auto-update’ off and know what the impact could be if a bad update or patch is delivered for each application set to ‘auto-update’.
“If they choose to manually update applications they will also need to build processes and workflows around testing updates for each application,” said Grenier.