CrowdStrike details what went wrong in preliminary report

News
24 Jul 20243 mins
Business OperationsIndustrySecurity

Points to its validator giving the greenlight to “problematic content” in an update.

CrowdStrike has detailed what went wrong on 19 July in a preliminary post incident review report.

CrowdStrike issued a content update on 19 July, which impacted 8.5 million Microsoft Windows devices globally, taking down major organisations, supermarkets, retailers, airlines and banks.

In its report released on 24 July, CrowdStrike said the cause of the problem related to a rapid response content update with an undetected error containing “problematic content data”.

As part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques.

These updates, CrowdStrike said are a regular part of the dynamic protection mechanisms of the Falcon platform. The problematic Rapid Response Content configuration update resulted in a Windows system crash, a.k.a the blue screen of death (BSOD).

Systems in scope include Windows hosts running sensor version 7.11 and above that were online between Friday, July 19, 2024 04:09 UTC and 05:27 UTC and received the update. Mac and Linux hosts were not impacted.

As for how CrowdStrike aims to prevent this from happening again, the vendor provided a list of action points for software resiliency, testing and rapid response content deployment.

As a result, it will implement additional validation checks to its content validator for rapid response content with a new check currently in process to protect against the problematic content that caused the issue in question from being erroneously deployed in the future.

CrowdStrike is also improving rapid response content testing through a raft of testing mechanisms including local developer, content update and rollback, stress, stability and content interface testing as well as fuzzing and fault injection techniques.

Additionally, it is enhancing existing error handling in its content interpreter.

CrowdStrike said it will implement a staggered, gradual strategy for rapid response content, starting with a canary deployment.

It is also looking to improve monitoring for sensor and system performance, providing customers with more control over the delivery of rapid response content updates and provide content update details by way of release notes.

A day after the outage, the incident was labelled a “very serious incident for the Australian economy” by Home Affairs Minister and Senator Clare O’Neil.

“I’ve seen it reported that this is the biggest IT outage in world history,” she said during a media conference on 20 July. “It is absolutely possible that that’s the case, certainly the largest in the time I’ve been alive.”