CrowdStrike blames outage on content configuration update

0
78
Oracle enhances customer experience platform with a B2B refresh

Source is ComputerWeekly.com

Under-fire cyber firm CrowdStrike has published an initial post incident review setting out more information on the update-gone-wrong that brought down millions of Microsoft devices on 19 July, causing global chaos.

In an update posted earlier on 24 July, the firm said it had attempted to release a content configuration update for its Falcon sensor on Windows hosts early in the morning of Friday.

This ‘rapid response’ update formed part of the normal dynamic protection mechanisms used by the Falcon platform to conduct cyber threat detection and remediation activity. Essentially, the updates are used by CrowdStrike to identify new indicators of threat actor behaviour, and improve its detection and prevention capabilities.

Such cloud-delivered updates would normally pass without drawing any attention to themselves. However, this update caused Windows hosts running Falcon sensor 7.11 and upwards that were online at the time to crash.

The issue in play in fact dates back to February 2024, when Falcon sensor version 7.11 dropped containing templates to detect a new attack technique abusing named pipes – a client-server communication conduit. These templates were later stress tested and validated for use before being released to production. Three more template instances were deployed over the following weeks, again without incident.

Fast forward to 19 July, when two additional template instances for the same attack technique were lined up to be deployed. However, on this occasion, said CrowdStrike, a bug in an automated content validator used to check updates enabled one of them to pass validation checks “despite containing problematic content data”.

It was deployed based on the testing performed back in March, but when received and loaded, this problematic content in channel file 291 resulted in an out-of-bound memory condition, triggering an exception that overwhelmed Windows operating systems.

The bugged update was live for just over an hour and a quarter before CrowdStrike reverted it, from 04:09 UTC to 05:27 UTC (5:09 BST to 06:27 BST) on Friday, but this was sufficient time to cause over eight million devices worldwide to crash and display the infamous blue-screen-of-death, photos of which spread around the world.

CrowdStrike CEO George Kurtz again apologised to customers and others impacted – including the many thousands of people who experienced delayed and cancelled flights.

“All of CrowdStrike understands the gravity and impact of the situation. We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority,” said Kurtz.

Kurtz also reiterated that neither itself nor Microsoft had fallen victim to any kind of cyber attack, and reaffirmed that Linux and Mac hosts were not affected.

“CrowdStrike is operating normally, and this issue does not affect our Falcon platform systems. There is no impact to any protection if the Falcon sensor is installed. Falcon Complete and Falcon OverWatch services are not disrupted,” he said.

“We have mobilised all of CrowdStrike to help you and your teams. If you have questions or need additional support, please reach out to your CrowdStrike representative or Technical Support.

“We know that adversaries and bad actors will try to exploit events like this. I encourage everyone to remain vigilant and ensure that you’re engaging with official CrowdStrike representatives. Our blog and technical support will continue to be the official channels for the latest updates.

Kurtz added: “Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike. As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”

What happens next?

CrowdStrike has now set out an extensive preliminary plan designed to keep such an incident from occurring again.

This includes improving the resiliency of rapid response updates by performing more developer testing, update and rollback testing, stress testing, fuzzing and fault injection, stability testing, and content interface testing. More validation checks are to be added to its content validator system, and other components of its setup are to have their existing error handling enhanced.

Future rapid response deployments will also now be done on a staggered basis, gradually deployed to larger portions of the Falcon sensor base, starting with a so-called ‘canary’ deployment. As part of this, sensor and system performance will be put under enhanced monitoring, while customers will be given greater control over the delivery of such updates, which will also now come with release notes.

Source is ComputerWeekly.com

Vorig artikelCrowdStrike chaos: Enterprises urged to take protective action in wake of botched software update
Volgend artikelNetwork Architect