“You don’t have a snail problem; you have a duck deficiency.” ~ Bill Mollison
On July 19, 2024, the world had a global IT outage that impacted many different sectors, including 911 services, airlines, healthcare, and other key services. This outage was caused by an error in a driver in the CrowdStrike platform and impacted Microsoft Windows operating systems. At the time of this writing some of the effects are still being felt; however, we should note many IT teams had an impressive response to this large outage.
Monocropping
One of the things we should examine from this event is our reliance on “monocropping” of the IT infrastructure of the globe. What is monocropping? Monocropping is the phenomenon in modern agriculture of planting large swaths of a single crop in the field with no genetic or species diversity. Typically, in modern agriculture these are corn, soy, and wheat. This results in the use of massive amounts of pesticides and synthetic fertilizer on our fields and food. The pesticides are needed as the predator/prey balance is disrupted and things such as grasshoppers, snails, locusts, or aphids can run amok in the crop. It is a downward spiral as the pesticides also kill any of the beneficial predators, which generally exist in lower numbers than the prey.
Why is that relevant to cyber security, specifically in this case?
We can learn a great deal from the natural world. Nature has stress tested systems for millions of years, and all we must do is observe. In a more natural or regenerative agriculture vision, we should have strips of diverse crops interposed with natural elements. This limits the uncontrolled spread of pests and gives us a more healthy and diverse food supply.
Currently, IT is a monocropped field of a few technologies. We will explore that below.
The thesis presented as we examine some of the admittedly early fallout of this event is “Are we too dependent on a few systems?” Do we have too much corn, soy, and wheat in our IT environment, leaving us vulnerable to infestation of pests or extreme risk of failure? I think the events of the last few days speak for themselves.
So, what is the technology-based monocropping we are seeing? Windows controls approximately 72% of the share of global computing, and CrowdStrike makes up approximately 15%. Windows owns Defender Endpoint Detection and Response (EDR) accounts for another 40%. Two platforms (operating system and EDR) account for the vast majority of platforms.
How about cloud? We have seen rapid consolidation in cloud platforms that are now mission critical to everyday life and business operations.
We have an underappreciated risk in microservices architecture. Microservices are small segments of code that run in other applications. One of the problems of this scheme is that in many cases, these services are not redundant but are utilized in many critical areas. Also, there is a large non-human identity problem, where the API keys and login credentials are very complex to understand and map. This makes security monitoring difficult and the complexity results in an infeasible analytic scenario.
Natural Systems, Patterns and IT
As I have written about before, I think we have a lot to learn about how natural systems can show us ideas and patterns for IT. In this case, we have a strong example. Some of the detrimental impacts of monocropping map to cyber abstractly and some don’t. Let’s examine how this looks.
Relying on a monoculture commodity leaves a farmer vulnerable to pest infestation or crop failure, as the lack of diversity tends to disadvantage beneficial predators and advantage the pests. This leads to the use of pesticides to control pests, which arguably has health impacts for humans. The analog to cyber is that a single operating system leaves us vulnerable to massive attacks by adversaries who can find a vulnerability in that operating system. By removing competition among security vendors, we are killing off the beneficial predators who hunt the pests, again leaving us vulnerable. The tradeoff between monoculture systems and the redundancy of a polyculture in IT is offset by the increased attack surface and training impact in tool-diverse environments.
Clearly, this isn’t an easy problem to solve, but let’s examine some ways to possibly mitigate.
Options for Mitigation
First, let’s state upfront that there are no easy answers here. One of the items that complicate this is the licensing costs, both from an existing capital expenditure (CAPEX) or future licensing of multiple products. The incentives are not aligned with diversifying our infrastructure. There is another compounding issue that comes up. How can we train and equip our staff to handle a diverse EDR or operating system deployment? This could work against our goals and would be a very valid criticism of this scheme.
One way we could limit the risk to our monoculture of infrastructure is to give 20% to diversifying our mission critical units. Much as regenerative agriculture can build in strips of wildflowers or tree-lined buffers in riparian areas to attract beneficial predators or fungal soil improvement, we can diversify our EDR/operating system base. By bringing in this 20% diversity, we build in some natural defenses to our monoculture IT infrastructure to ensure we can operate in an event like July 19th.
Finally, we should examine our cloud convergence. Much as we have seen in our EDR and operating system convergence, we may have similar risks. There is a great reliance on the Microsoft cloud tech stack for much of our modern business communication and Identity and Access Management (IAM). We are tied in via email, Teams, Entra ID, and so forth. Diversifying our selections here could help with a large outage at one of the cloud providers. Others are heavily tied into Amazon Web Service (AWS). While the business case for AWS makes sense, the risk model doesn’t necessarily work out as well. Yes, we can gain regional Disaster Recovery (DR) and the ability to auto scale and swap hardware; however, we are tied to a single provider and one outage there could be catastrophic.
The natural world can teach us a great deal about recent cyber events. We should look to emulate these natural systems and diversify our core infrastructure. There could be great benefit, with some risk, to moving our core systems out of a monoculture system to a more diverse, and I would argue, healthier, ecosystem.