LitmusChaos is an open source chaos engineering platform that enables teams to identify weaknesses and potential outages in infrastructures by inducing chaos tests in a controlled way. Chaos engineering verifies the resilience of business services and helps DevOps pipelines proactively build code that is more resilient against software and infrastructure faults.
The CNCF Technical Oversight Committee (TOC) has voted to approve LitmusChaos’ move from the CNCF Sandbox to Incubation level.
The Litmus project was started in 2017 to provide simple chaos jobs in Kubernetes. It became a CNCF sandbox project in 2020 and today has maintainers from 5 different organizations across cloud native vendors, solution providers, and end users.
“The CNCF ecosystem has helped us build a strong and vibrant community around Litmus,” said Uma Mukkara, maintainer of the Litmus project and CEO of ChaosNative. “We have received consistent feedback from users since the launch of 1.0 release last year, which helped us come up with a robust set of features and a stable platform for cloud native chaos engineering.”
The project is used in production by more than 25 organizations, including large end users like Intuit, Lenskart, and Orange, and technology organizations like Red Hat and VMware.
“Litmus is a great tool that offers out-of-the-box generic chaos tests with different types of probes for performing validations at different time(s) during the experiment, which make(s) automation easy,” said Samar Sidharth, lead engineer at Orange.
“Litmus was our top choice when it came to developing our cloud native chaos scenarios,” said Jordi Gil, senior software engineer at Red Hat. “Its extensive list of experiments, open source nature, and friendly community gave us all the ingredients we needed to successfully complete our goals.”
LitmusChaos 2.0 was released in August and brought improved scalability along with new features, including testing against and measuring outputs from the steady-state hypothesis and an increased set of Prometheus metrics for instrumenting application dashboards for better observability. Since the beginning of the year, Litmus operator installations have grown from 50 per day to more than 2,000 daily.
“Chaos engineering techniques enable organizations to cultivate reliability and robustness into their production environments,” said Chris Aniszczyk, CTO of CNCF. “This practice will be key to building robust systems and LitmusChaos has already seen success among organizations looking to improve the resilience of their production deployments. We look forward to continuing to cultivate the growth of the LitmusChaos community and spreading chaos engineering practices.”
Main Components:
- Chaos Operator – built using the Operator SDK framework and manages the lifecycle of a chaos experiment.
- ChaosHub – hosts most of the chaos experiments needed for a quick start in chaos engineering.
- Litmus Workflows – Chaos experiments are chained either in sequence or parallel to build a chaos scenario. The workflows are declarative, schedulable, and browsable. Workflow analytics are also available.
- ChaosCenter — A centralized control plane to design, schedule & monitor Litmus Workflows, with the ability to manage chaos across multiple target environments via agents. The chaos-center supports teaming to facilitate collaboration on chaos scenarios and helps analyze resilience behaviour across runs.
- Litmus Probes – Various probes help users create complete chaos scenarios with automated steady-state validation and remediation actions, close to the real application experience upon failure.
- Chaos Observability – Litmus exports Prometheus metrics that can help to highlight and quantify the impact of chaos on the applications or infrastructure in real-time via in-house dashboards and external visualization on APM tools.
Notable Milestones:
- 2,350+ GitHub Stars
- 4,000+ pull requests
- 1,000+ issues
- 400+ contributors
- 60+ Releases
“The cross-section of personas practicing chaos has grown wider over the past couple of years,” said Karthik Satchitanand, Litmus project maintainer and open source lead at ChaosNative. “This has brought forth numerous viewpoints, resulting in features around chaos management, observability & CI/CD integrations. It is also heartening to see developers build their own probes for steady-state hypothesis validation and experiments using Litmus’s BYOC (bring-your-own-chaos) approach. The future looks exciting!”
The project roadmap includes a number of new features and collaboration with other CNCF projects in the areas of continuous delivery and service mesh to enable a holistic view of cloud native environments. New features will include an increased set of experiments both for Kubernetes and non-Kubernetes targets, improved observability and integration with other platforms via OpenTelemetry, and more.
As a CNCF-hosted project, LitmusChaos is part of a neutral foundation aligned with its technical interests, as well as the larger Linux Foundation, which provides governance, marketing support, and community outreach. LitmusChaos joins incubating technologies Argo, Buildpacks, Cilium, CloudEvents, CNI, Contour, Cortex, CRI-O, Crossplane, Dapr, Dragonfly, emissary-ingress, Falco, Flagger, Flux, gRPC, KEDA, KubeEdge, Longhorn, NATS, Notary, OpenTelemetry, Operator Framework, SPIFFE, SPIRE, and Thanos. For more information on maturity requirements for each level, please visit the CNCF Graduation Criteria.