Future Tech

CrowdStrike blames a test software bug for that giant global mess it made

Tan KW
Publish date: Thu, 25 Jul 2024, 08:10 AM
Tan KW
0 458,742
Future Tech

CrowdStrike has blamed a bug in its own test software for the mass-crash-event it caused last week.

A Wednesday update to its remediation guide added a preliminary post incident review (PIR) that offers the antivirus maker's view of how it brought down 8.5 million Windows boxes.

The explanation opens by detailing that CrowdStrike's Falcon Sensor ships with "sensor content" that steers and defines its threat-detection engine's capabilities. The software is also updated with "rapid response content" that allows it to detect and handle emerging malware and other unwanted system activity. This rapid response content is delivered to users in those channel files you've been hearing about.

The base sensor content includes what's called "template types," which are blocks of prepared code that can be used by rapid response content to identify malicious stuff on a system. As such, rapid response content is delivered in those channel files as so-called "template instances," which CrowdStrike describes as "instantiations of a given template type."

Thus, the sensor content defines a bunch of code templates, and the rapid response content customizes those templates, with each piece of this response content being a template instance that tackles a specific system behavior for the sensor software to observe, detect, or prevent.

In February 2024, CrowdStrike introduced a new "inter-process-communication (IPC) template type" for rapid response content to use that the vendor designed to detect "novel attack techniques that abuse Named Pipes." The IPC template type passed testing on March 5, and a rapid response template instance was released to use it.

Three more IPC template instances were deployed between April 8 and April 24. All ran without crashing 8.5 million Windows machines - although, as we reported earlier this week, Linux machines had problems with CrowdStrike in April.

On July 19, CrowdStrike introduced two more IPC template instances. One included "problematic content data," but made it into production anyway, because of what CrowdStrike described as "a bug in the content validator."

The post doesn't detail the content validator's role; we'll assume it's supposed to do what the name suggests and likely in an automated manner.

Whatever the validator does or is supposed to do, it did not prevent the release to customers of the dodgy July 19 template instance despite it being a dud. This test software should have detected that the content update was broken but approved it anyway because the validator was buggy.

CrowdStrike thus assumed the July 19 channel file release would be OK; the tests had after all passed the IPC template type delivered in March, and subsequent related IPC template instances.

History tells us that was a very bad assumption. As we concluded in our earlier analysis of the crash, Falcon loaded and parsed the new content and was confused by the broken template instance, which "resulted in an out-of-bounds memory read triggering an exception" within CrowdStrike's Windows driver-level code, which would bring down the whole box.

On reboot, it would start up and crash all over again. CrowdStrike's Falcon suite runs at the operating system level to give it good visibility for its threat detection operations. When its content interpreter is misled into accessing memory it shouldn't, however, as what happened here with the bad data, it has the potential to take out the OS and running applications with it.

"This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash," Team CrowdStrike said.

On around 8.5 million machines.

The incident report includes promises to test future rapid response content more rigorously, stagger releases, offer users more control over when to deploy it, and provide release notes.

You read that right: Release notes. Be still your beating heart.txt.

The report also includes a pledge to release a full root cause analysis once CrowdStrike has finished its investigation.

Take all the time you want: Some of us are still busy rebuilding machines you broke. ®

 

https://www.theregister.com//2024/07/24/crowdstrike_validator_failure/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment