Early IT takeaways from the CrowdStrike outage

Early IT takeaways from the CrowdStrike outage

As the IT world recovers from the massive outage triggered by CrowdStrike’s Falcon update, CISOs and CIOs would be wise to keep a running ledger of lessons learned. Here are some initial considerations.

Whether you’ve survived the CrowdStrike incident or didn’t use CrowdStrike and are merely seeing the impact to others, taking time to learn lessons from this event is vital. After all, if you couldn’t recover easily from this, then you may be lost trying to recover from a ransomware attack.

At issue are potential shifts you might want to consider making to your staffing strategies, technical processes, and communication channels and culture, as well as your approach to ensuring hardened assets overall.

The list of lessons learned from CrowdStrike will likely grow as more information comes to light about the impacts the outage has had on organizations around the globe, but for now, the following look at the recovery process provides insights into how you might want reconsider or reinforce your strategy around key processes and resources to ensure a more robust response going forward.

Staffing rethink

Recovering from CrowdStrike has been an all-hands-on-deck event. In some instances, companies have needed humans to be able to touch and reboot impacted machines in order to recover — an arduous process, especially at scale.

If you have outsourced IT operations to managed service providers, consider that those MSPs may not have enough staff on hand to mitigate your issues along with those of their other clients, especially when a singular event has widespread fallout.

Instead, you may have only your existing staff to call on to remedy a situation — and to train folks not used to technology tasks to perform key steps in order to help get your network back online as soon as possible. Alternatively, you may need to consider shipping replacement equipment or alternative ways that you can reinstall or refresh operating systems, as was the case with CrowdStrike — all of which requires personnel.

Thinned staffs over-reliant on service providers are at risk of poor recovery from incidents, no matter the source.

Tighten up your technical resources

As Microsoft points out in response to CrowdStrike, besides getting into safe more and being able to enter commands, your next hurdle may be getting access to something intended to protect your device: Bitlocker.

When the computer reboots after entering safe mode, if Bitlocker is enabled you will be asked to enter a recovery key. I speak from experience that, more often than not, accessing Bitlocker recovery keys can take time. They may be backed up in your local Active Directory. They may be printed out and saved in a location that, in the initial moments, you may forget where they have been stored.

Ensure you review recovery steps and processes on a regular basis to guarantee that your team knows exactly where those recovery keys are and what processes are necessary to obtain them.  While Bitlocker is often mandated for compliance reasons, it also adds a layer of complications you may not be prepared for.

During this event, we’ve seen interesting workarounds for getting systems operational. Via social media, people such as LetheForgot shared the following:

“We went into advanced restart options to launch the command prompt, skip the bitlocker key ask which then brought us to drive X and ran ‘bcdedit /set {default} safeboot minimal’ which let us boot into safemode and delete the sys file causing the bsod.”

Another poster recommended “Even in safe mode, crowdstrike folder access was denied. Used cacls to give more rights to user (bypassing admin) and deleted file.”

If you are wondering why this works and doesn’t demand a Bitlocker recovery key, when the computer is booting in safe mode by default this is not something that should be encrypted. You still need to provide valid user credentials to access the C drive, bringing up the next roadblock in recovering access. Do you have access to the domain controller, or will you need access to a local username to get to the C drive and delete the file you need to remove to restore to a functional machine? If you have used LAPS or software that randomizes the Local Administrator password, you will need access to that resource as well.

Once you get access to the machine, then you can delete with the following command:

del C-00000291*.sys

The lesson here is not only to review recovery steps often but also to follow community discussions closely for creative technical solutions when collective IT disaster unfolds.

Build a culture of communication

That brings up another key resource needed during any incident: clear information regarding what is happening and what to do.

Late on the evening of Thursday, July 18, it was clear from comments on social media that something was happening. It was also quickly identified what the underlying culprit was, a CrowdStrike update that went faulty. In other incident situations, you may not be so quickly informed. It may not be clear what has happened and what assets have been impacted. Often, you’ll need to reach out to staff who are closely working with impacted assets to determine what is going on and what actions to take. Often what you first think the issue is and what actions to take may not ultimately be the actions you need to take. Or you may find easier steps to take.

In addition, you may need to determine whether a Plan B may be more beneficial as a plan of action. In this instance, I’ve seen companies decide to move up plans to redeploy computer systems to replace impacted machines. Since a hardware refresh was planned in the coming weeks, they merely moved up plans to redeploy hardware rather than attempt to fix the machines.

All of that requires clear communication among all parties involved — a culture you need to build, in addition to having incident communication strategies and processes in place.

Reassess strategies in wake of lessons learned

Just as with any incident, clean up and follow up are essential.

For those who have machines back up and recovered post-CrowdStrike, there are certain items you should review. First is consider reissuing Bitlocker recovery keys. If you handed out the recovery key manually, consider reissuing and rotating keys.

If you are considering changes to your infrastructure, rather than ripping out your technology and replacing it with a different operating system, consider the alternative of changing how you deploy software and restrict what software is allowed to run on these special-purpose machines. We use antivirus because we don’t have a limit on what we allow to run on our systems. If we spent the time and resources limiting what is allowed to run, machines would be more secure.

Of course, you do need to reconsider what operating system is used for what purpose. We’ve seen too many social media posts of bluescreens on what are merely overgrown notification screens. Do you truly need a full operating system to merely provide information? Or are there alternative ways that you can provide that same information?

Should you not rely on vendors to do their own quality control? From Microsoft to now CrowdStrike, it’s unclear whether decreases in budgets for individuals tasked with testing are the true root cause of issues. In the case of CrowdStrike, a logic error in its Falcon update was to blame, CEO George Kurtz wrote. How exactly that came about will need to be sorted out in the fallout.

Even if you weren’t impacted by this event, you may want to review how fast you roll out update files. From vendor updates to definition updates, you may consider that we trust too much that our vendors have done their due diligence. With many firms cutting budgets, we can no longer take this quality control for granted. Consider having updating rings and have your own process of testing and validation when it comes to rolling out updates even to antivirus and protection suites. Ultimately no software should be completely trusted.

Related content

news analysis

Secure Boot no more? Leaked key, faulty practices put 900 PC/server models in jeopardy

PKfail: An AMI Platform Key discovered on GitHub led researchers to uncover test keys in firmware images from major PC and server vendors, something hackers could exploit if leaked to gain kernel control.

By Lucian Constantin

Jul 26, 2024

7 mins

Vulnerabilities

news

Counting the cost of CrowdStrike: the bug that bit billions

Cyber insurance coverage is set to cover only a fraction of the losses, leaving affected businesses to grapple with substantial uncovered expenses.

By Shweta Sharma

Jul 26, 2024

1 min

Business Continuity
Endpoint Protection

how-to

Download the unified endpoint management (UEM) platform enterprise buyer’s guide

For many enterprises, unified endpoint management (UEM) is the platform of choice for managing endpoints like smartphones and PCs and keeping them from becoming security, privacy, and regulatory compliance risks. Here’s how to choose the right

By Bob Violino

Jul 26, 2024

1 min

Mobile Security
Endpoint Protection
Enterprise Buyer’s Guides

opinion

What CISOs can do to bridge their cyber talent gap

Efforts to fix the 4 million global cyber pro shortfall may someday pay off. Until then, CISOs have practical solutions at their disposal.

By David Gee

Jul 26, 2024

7 mins

IT Skills
IT Training 

PODCASTS

VIDEOS

RESOURCES

EVENTS

SUBSCRIBE TO OUR NEWSLETTER

From our editors straight to your inbox

Get started by entering your email address below.

>>> Read full article>>>
Copyright for syndicated content belongs to the linked Source : CIO – https://www.csoonline.com/article/3476136/ciso-debrief-early-takeaways-from-crowdstrike.html

Exit mobile version