Community post by Andrés Vega, M42 and Technical Leader, CNCF TAG Security
Recent events involving CrowdStrike’s Falcon security software have underscored a critical lesson across the industry: the importance of having a robust, secure release process. Such incidents serve as reminder of the consequences that can result from issues in software updates. These consequences can affect a wide range of critical industries and services, from government agencies and financial institutions to healthcare systems and emergency services.
The Incident
On July 18, 2024, a buggy update to CrowdStrike’s Falcon security software caused widespread system crashes across their customer base. Windows-based systems experienced a Blue Screen of Death (BSOD), the kernel panic of Windows systems, leading to significant disruptions in critical services, including government agencies, banks, airlines, payment processors, 911 call centers, television networks, and health systems.
The aftermath of this incident has been challenging, with organizations scrambling to implement fixes ranging from simple reboots to complex recovery procedures involving encrypted disks. This situation highlights the need for more robust safeguards in the software release process.
Justin Cappos, Professor at NYU and co-creator of in-toto, expressed his shock at the incident:
“It strains credibility that any organization, much less a security company, would fail to have robust software supply chain validation mechanisms in place. The fact that software could be released without testing demonstrates a level of negligence, I’m shocked to see from a major security company.”
This statement underscores the gravity of the situation and the critical need for robust software supply chain validation in all organizations, especially those dealing with critical security software.
The Self-Inflicted Ransomware Comparison
One particularly concerning aspect of this incident was highlighted by cryptography and infrastructure engineer Tony Arcieri on Mastodon. He drew a comparison between the situation and a self inflicted a ransomware attack:
This comparison underscores the critical importance of not just having robust release processes, but also maintaining proper key management practices. Without access to these recovery keys, organizations find themselves in a situation where they can’t access their own encrypted data – mimicking the effects of a ransomware attack, but caused by their own security measures.
The Need for Robust Release Processes
This incident emphasizes the critical need for robust release processes that are not just guidelines, but technically enforced safeguards. As Cappos points out:
“Release processes aren’t just a checklist of best practices. They should be enforced technically in a way that makes it implausible to bypass these safeguards. This technical enforcement is key to preventing incidents like the one we are seeing.”
With this in mind, let’s explore the critical components of a robust release process:
- Comprehensive Testing: Thorough testing of updates in environments that closely mimic production systems.
- Integrity Verification: Use cryptographic techniques to verify the integrity of each step in the release process, ensuring that no authorized changes can be made without detection.
- Staged Rollouts: Implementing gradual rollout strategies to detect issues before they affect a wide user base.
- Quick Rollback Mechanisms: The ability to swiftly revert to a previous stable version when issues are detected.
- Transparent Communication: Clear, timely communication with affected users about the nature of the problem and steps for resolution.
- Proper Key Management: Ensure that encryption keys and recovery keys are properly managed and readily available when needed.
By enforcing these processes in software, organizations can create more robust systems where it becomes extremely difficult, if not impossible, to release software updates without going through all the necessary security checks and balances.
in-toto: The Blueprint for Secure and Verifiable Software Updates
At the CNCF Security Advisory Group, we strongly advocate for the adoption of frameworks that can help prevent such incidents. One such framework is in-toto, which provides a comprehensive approach to supply chain security.
What is in-toto?
in-toto is an open-source framework that cryptographically ensures the integrity of the software supply chain. It allows for the definition of supply chain layouts and the creation of cryptographically verifiable metadata about the steps in the chain.
An in-toto layout specifies:
- The steps that should occur in the software supply chain
- The order in which these steps should happen, these will be enforced as verification procedures to be carried out on the final product.
- Who is authorized to perform each step through the use of cryptographic keys.
- What materials (input) each step should use and what products (output) it should generate, as well as thresholds Determine how many signatures are required for a step to be considered valid.
Example of a Simple in-toto Layout
Here’s a simplified example of what an in-toto layout might look like for a basic software release process:
{ “steps”: [ { “name”: “code-review”, “expected_materials”: [ [“MATCH”, “src/*”, “WITH”, “PRODUCTS”, “FROM”, “coding”] ], “expected_products”: [ [“MATCH”, “src/*”, “WITH”, “MATERIALS”, “FROM”, “coding”] ], “pubkeys”: [“4895703d8b8f0a8369348e5e2f364e8a312273e8”], “expected_command”: [“git”, “push”, “origin”, “main”], “threshold”: 1 }, { “name”: “unit-test”, “expected_materials”: [ [“MATCH”, “src/*”, “WITH”, “PRODUCTS”, “FROM”, “code-review”] ], “expected_products”: [ [“CREATE”, “test-results.xml”] ], “pubkeys”: [“c8650e5c8a17c1f83a2db650f438e899bd76e62c”], “expected_command”: [“pytest”, “tests/”], “threshold”: 1 }, { “name”: “build”, “expected_materials”: [ [“MATCH”, “src/*”, “WITH”, “PRODUCTS”, “FROM”, “unit-test”] ], “expected_products”: [ [“CREATE”, “dist/app-1.0.tar.gz”] ], “pubkeys”: [“f3b93b345b2e6f356dc4a418d66213be51b4e5f4”], “expected_command”: [“python”, “setup.py”, “sdist”], “threshold”: 1 } ], “inspections”: [ { “name”: “final-product-test”, “expected_materials”: [ [“MATCH”, “dist/app-1.0.tar.gz”, “WITH”, “PRODUCTS”, “FROM”, “build”] ], “expected_products”: [ [“CREATE”, “test-results.xml”] ], “run”: [“pytest”, “tests/”] } ] } |
Santiago Torres-Arias, co-creator of in-toto and Assistant Professor at Purdue University, expounds the depth of assurance that in-toto provides.
“Supply chain integrity is not just about securing the ‘what’ of software. Only by answering ‘how’ something was made can you be really sure of its integrity. That is where in-toto shines — it enables automated security and compliance checks throughout the entire software supply chain, ensuring not just the end product, but every step of its creation is verifiable and secure.”
This comprehensive approach to software supply chain integrity is crucial in preventing widespread outages from software updates. By verifying each step of the development and release process, in-toto provides a framework for catching potential issues before they can impact end-users.
How in-toto Layouts Could Help Prevent Incidents Like the CrowdStrike Update Issue
- Enforced Process: By defining a clear layout, in-toto ensures that all required steps in the release process are followed, reducing the chance of skipping crucial steps like thorough testing.
- Verification of Inputs and Outputs: Each step’s materials and products are verified, ensuring that unexpected changes aren’t introduced during the process.
- Authorized Actions: By specifying who can perform each action, in-toto reduces the risk of unauthorized or accidental changes to the software.
- Auditability: The layout provides a clear record of what should happen in the release process, making it easier to audit and identify potential issues.
- Final Product Verification: The inspection step ensures that the final product meets all specified criteria before release.
By implementing in-toto with a well-designed layout along with comprehensive testing procedures, gating criteria, and gradual rolllouts, organizations can significantly reduce the risk of releasing buggy updates, thereby preventing incidents similar to the CrowdStrike update issue.
Priyanka Sharma, Executive Director of CNCF emphasizes the importance of maintaining integrity in software update systems:
“The recent incident with CrowdStrike’s software update serves as a stark reminder of the critical importance of secure software supply chains in our increasingly interconnected digital world. At CNCF, we’ve long advocated for the adoption of robust security practices and tools throughout the entire software lifecycle. This event underscores the need for comprehensive, end-to-end security measures that go beyond traditional approaches. It’s why we support projects like in-toto, which provides a framework for ensuring the integrity of the software supply chain. As we move forward, it’s imperative that organizations not only implement best practices but also leverage cutting-edge tools and frameworks to prevent such incidents. The security of our digital infrastructure is a shared responsibility, and CNCF is committed to fostering the development and adoption of technologies that enhance the security and reliability of the cloud native ecosystem.”
Conclusion
The recent CrowdStrike incident serves as a powerful reminder of the critical importance of secure, robust release processes in software systems that people rely on in their daily lives. By learning from this event and implementing frameworks like in-toto, we can work towards preventing similar incidents in the future, ensuring the stability and security of the software we all rely on.
As the CNCF Security Advisory Group, we urge all organizations to take this opportunity to review and enhance their release processes. The security of our digital infrastructure depends on our collective commitment to best practices in software development and deployment.
Learn more and get started – https://in-toto.io/