Future Tech

CrowdStrike blames a test software bug for that giant global mess it made

Tan KW
Publish date: Wed, 24 Jul 2024, 02:27 PM
Tan KW
0 458,389
Future Tech

CrowdStrike has blamed a bug in its own test software for the mass-crash-event it caused last week.

A Wednesday update to its remediation guide added a Preliminary Post Incident Review (PIR) that offers the vendor's view of how it brought down 8.5 million Windows boxes.

The explanation opens by detailing that CrowdStrike's Falcon Sensor ships with "Sensor Content" that defines its capabilities. The software is updated with "Rapid Response Content" that allows it to detect and collect info on new threats.

Sensor Content relies on "Template Types" - code that includes pre-defined fields for threat detection engineers to leverage in Rapid Response Content.

Rapid Response Content is delivered as "Template Instances," which CrowdStrike describes as "instantiations of a given Template Type."

Each Template Instance maps to specific behaviors for the sensor software to observe, detect or prevent.

In February 2024, CrowdStrike introduced a new "InterProcessCommunication (IPC) Template Type" that the vendor designed to detect "novel attack techniques that abuse Named Pipes."

The IPC Template Type passed testing on March 5, so a Template Instance was released to use it.

Three more IPC Template Instances were deployed between April 8 and April 24. All ran without crashing 8.5 million Windows machines - although, as we reported earlier this week, Linux machines had problems with CrowdStrike in April.

On July 19, CrowdStrike introduced two more IPC Template Instances. One included "problematic content data" - but made it into production anyway, because of what CrowdStrike described as "a bug in the Content Validator."

The post doesn't detail Content Validator's role - we'll assume it's supposed to do what the name suggests.

Whatever the Validator does or is supposed to do, it did not prevent the release of the July 19 Template Instance, despite it being a dud. That happened because CrowdStrike assumed that tests that passed the IPC Template Type delivered in March, and subsequent related IPC Template Instances, meant the July 19 release would be OK.

History tells us that was a very bad assumption. It "resulted in an out-of-bounds memory read triggering an exception."

"This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash."

On around 8.5 million machines.

The incident report includes promises to test future Rapid Response Content more rigorously, stagger releases, offer users more control over when to deploy it, and provide release notes.

You read that right: release notes. Be still your beating heart.txt.

The report also includes a pledge to release a full root cause analysis, once CrowdStrike has finished its investigation.

Take all the time you want: some of us are still busy rebuilding machines you broke. ®

 

https://www.theregister.com//2024/07/24/crowdstrike_preliminary_incident_report/

Discussions
Be the first to like this. Showing 0 of 0 comments

Post a Comment