Crowdstrike outage: Community thoughts?

I took this Blue Screen of Death photo in a large retain chain store on Saturday.

Late Friday afternoon I was on my phone wondering why I couldn’t get into various internet banking and other apps on my phone - within a few hours the mass outage caused by Crowdstrike (NASDAQ CRWD) was well understood. The disruption cascaded through countless systems underpinning our daily lives, from shopping to banking, to transportation and communication, even the F1 was disrupted. Despite affecting only 1% of global Microsoft installations, the incident exposed the fragility of our interconnected digital world.

Thank you to the IT Professionals who deployed patches

First and foremost, I want to extend a massive shoutout to everyone who worked tirelessly over the weekend to deploy the patches needed to resolve the CrowdStrike outage incident on Friday! Thank you - from everyone.

This blog is a summary of the ITP Slack channel themes from the lively discussions over the weekend and well into this week. In short - we were surprised, disappointed and frustrated by this outage but can also see it as a wakeup call.

Failure in Testing Before Deployment

It turns out this was not the first outage caused by CrowdStrike this year. In April, a CrowdStrike update caused all Debian Linux servers in a civic tech lab to crash simultaneously and refuse to boot. The update proved incompatible with the latest stable version of Debian, despite the specific Linux configuration being supposedly supported. The lab's IT team discovered that removing CrowdStrike allowed the machines to boot and reported the incident. It took CrowdStrike weeks to provide a root cause analysis, revealing that the Debian Linux configuration was not included in their test matrix.

Our members were pretty frustrated to learn this. CrowdStrike’s own report on what went wrong buries the testing issues and don’t acknowledge they had done this before. More on this from RNZ.

A Lesson in Reliance on SaaS

Over the years, SaaS suppliers have developed a strong record of high-quality releases, leading many to implement upgrades with limited testing - and we as technology specialists have come to rely on this, implementing automated updates and deployment. Especially in instances for something like CrowdStrike's Falcon sensors, where speed of deployment is crucial for defense against new attacks, significant dependency is placed on companies like CrowdStrike to get testing and deployment right.

One member emphasized the importance of assessing risks when considering SaaS products. Just because a company is big and has a pretty dashboard doesn't guarantee trustworthiness. Our members discussed turning down more than one SaaS product due to concerns about data security, robustness of process and transparent documentation.

Upshot is - we need to strike a balance between rapid deployment and rigorous testing to maintain security without causing widespread disruption. We had lots of discussions on the use of staggered deployment practices to resolve this.

The Microsoft effect

The outage primarily affected Microsoft devices as we know, leading to speculation that New Zealand, with a smaller number of large corporations, might have escaped the brunt of the impact. However, this shouldn't downplay the global fragility of our interconnected digital world.

Microsoft distanced themselves from the event which was generally seen as pretty disappointing. Generally folks felt that Microsoft could have stepped up - been more proactive in communications, provided better support for customers and generally acknowledged they have a role to play in resolution support.

This situation highlights the need to consider reliance on any one platform, risk assessment for single points of failure and sparked conversation on the need for a more localised approach.

What about compensation?

Experts are calling this the largest IT outage in history and some are speculating the losses between $1Billion USD and $5.4Billion USD. Both Microsoft and Crowdstrike are being asked to cover losses by companies.

From the NZHerald “CrowdStrike’s Australian president Michael Sentonas has apologised for the cyber security company’s role in causing an outage that crippled global IT systems - and conceded it would be hard to avoid affected businesses seeking compensation or litigation.” This same article also discusses a potential EU fine - which could be up to 4% of Crowdstrike’s revenue - may also be imminent.

From our members discussion, compensation isn’t always sought in NZ but the sentiment for this outage is compensation is due for every company impacted.

More reading

Vic MacLennan

CEO of IT Professionals, Te Pou Haungarau Ngaio, Vic believes everyone in Aotearoa New Zealand deserves an opportunity to reach their potential so as a technologist by trade she is dedicated to changing the face of the digital tech industry - to become more inclusive, where everyone has a place to belong. Vic is also on a quest to close the digital divide. Find out more about her mahi on LinkedIN.

Previous
Previous

ITP Cartoon by Jim - BSOD 2024

Next
Next

Boards urged to upskill and take AI governance seriously