Ambassador post originally published on the Logz.io blog by Dotan Horovits

Troubleshooting within Kubernetes environments can be a daunting task. If we could only have a magical artificial intelligence advisor that could gather all the data about what goes on the system, and tell me what’s wrong, and even how to solve it. Wouldn’t it be nice?

K8sGPT is a young open source project that uses generative AI to give Kubernetes superpowers to everyone. It recently turned a year old, and is now part of the Cloud Native Computing Foundation (CNCF). 

Does this open source project go beyond the Gen AI hype and get us closer to diagnosing and triaging issues in plain English? I sat down with Thomas Schuetz, core maintainer of K8sGPT and a fellow CNCF Ambassador, to hear all about it, on the recent episode of OpenObservability Talks. Thomas is also a Principal Cloud Architect at Kapsch TrafficCom AG, and teaches at the Austrian University of Applied Sciences, focusing on cloud-native technologies.

The Promise of Generative AI for Kubernetes 

The challenges that Kubernetes administrators face are multifaceted and often recurring. Simple misconfigurations such as missing service accounts or services without endpoints could lead to failed deployments which can be the nightmare of DevOps and SREs.

Traditionally, SREs embark on a manual hunt through logs, events, and configuration files, attempting to discern the root cause of the problem. However, this process can be time-consuming and error-prone, often resulting in delayed resolution and operational disruptions. These were the pains that led Alex Jones to create K8sGPT, later joined by Thomas as the second maintainer.

How K8sGPT Works

K8sGPT functions much like a seasoned SRE, continuously monitoring Kubernetes clusters for anomalies and issues. It analyzes relevant data, identifies potential problems, and leverages external AI engines to provide insights and recommendations. 

The journey begins with data collection, where K8sGPT selectively gathers pertinent information from Kubernetes clusters. Contrary to misconceptions, K8sGPT does not indiscriminately flood AI providers with raw cluster data. Instead, it adopts a targeted approach, filtering out extraneous information and anonymizing collected data to preserve privacy and security.

Once the relevant data is collected, K8sGPT leverages AI algorithms to analyze and interpret the information, in much the same way that an SRE would. For example, it would identify issues such as pods not running, and would check the event stream for potential causes such as a replica set where a service account is missing. With that in place, a generative AI model can be prompted to have the problem explained. 

Sometimes, these insights can even surprise experienced SREs, such as Alex and Thomas themselves: “At one point we wrote an analyzer for some resources and Alex ran it and said yes, this cannot be correct, but in the end it was correct.”

Bring Your Own Gen AI Provider

K8sGPT connects to a variety of AI providers for its analysis. Currently K8s provides out of the box integration with OpenAI, Azure, Cohere, Amazon Bedrock, Amazon Sagemaker, Google Gemini, and Vertex. 

It’s important to emphasize that K8sGPT anonymizes information such as pod names first, before dispatching the prompt. K8sGPT also enables connecting to local models, which proves very useful for organizations that cannot send their information to external providers.

The crux of K8sGPT’s functionality lies in its ability to generate descriptive problem summaries and pragmatic solutions. Thomas underscores how the project’s focus extends beyond rudimentary problem scenarios, evolving to encompass a diverse array of issues, including external integrations. 

K8sGPT started as a CLI that was triggered to provide these diagnostics in a one-off fashion. Then, the project evolved into an automated SRE assistant, adding a Kubernetes operator that runs in the cluster, which continuously looks for problems and prints them out into a resource. 
The continuous supervision not only saves the admin calling the CLI upon an issue, but can proactively detect when things start getting out of the normal, before the admin or end user notices the issue.

Growing Excitement and Adoption

The adoption of the K8sGPT project underscores its growing significance in the Kubernetes ecosystem. Thomas shared how one or two weeks after its launch, the project already had around 1,000 stars on the GitHub repo, and the first fork of the project was done the first day after the release. To date, the project already has around 5,000 GitHub stars and over 500 forks.

Going beyond the stars ranking, companies like Kubermatic, SpectroCloud, and Nethopper have enthusiastically embraced K8sGPT capabilities, integrating it into their infrastructure to streamline troubleshooting processes. Thomas calls on other adopters to reach out and share their use case, so the project maintainers can better steer the project based on the usage, and to support its maturation within the CNCF.

The maintainer team has also grown nicely. Thomas shared that the project now has between 30 to 40 contributors. He also emphasized that, unlike many popular projects, there is no company behind this project, and no business plan behind it. 

K8sGPT Joins the CNCF Sandbox

K8sGPT just turned a year old, marking a significant journey of innovation and growth. The culmination of this journey is marked by K8sGPT’s recent acceptance into the CNCF sandbox. This milestone not only validates the project’s technical prowess but also opens doors to greater collaboration and adoption.

Thomas shared an interesting roadmap for the project. Main focus is put on integrations into cloud provider infrastructures such as Amazon, Google and so on to find problems more on the infrastructure level. He also listed scanning CRDs and integration with additional tooling in the AI landscape. 

While currently integrating with Slack for sending downstream notifications, Thomas mentioned looking into integrating with additional targets such as MS Teams and Mattermost. 

Want to learn more? Check out my OpenObservability Talks podcast episode: KubeCon Paris Highlights and AI Spotlight on K8sgpt.