Solving Cloud Native Operation Challenges with AI Agents

Oh no! Your application is unreachable, buried under multiple connection hops—how do you pinpoint the broken link? How do you generate an alert or bug report from Prometheus when certain conditions are met? You need to roll out a new version of your application—how do you execute a progressive rollout using Argo Rollouts? How do you safely enable zero trust network security when your application scales beyond a single cluster or cloud? With so many projects in the cloud native ecosystem, how do you figure out which ones are right for your needs and layer them together with proper configuration management?

Sound familiar? As a leader in cloud networking, security, and reliability, we hear these questions all the time from platform and DevOps engineers working with CNCF projects like Kubernetes, Envoy, Istio, Prometheus, and Argo. Our customer-facing engineers often resolve these issues independently, but sometimes we need to pull in other internal experts or specialists.

As we continue to scale out and support many more customers, we asked ourselves: how can we leverage our in-house expertise more efficiently? Could we clone our top experts to assist with these issues? This led to a lightbulb moment—why not build AI agents to tackle common challenges and support our engineers and customers? Why not create a catalog of AI agents for the cloud native ecosystem, enabling everyone to run, build, and share AI-driven solutions?

What is kagent?

You’ve probably interacted with ChatGPT, an AI chatbot that provides natural language responses. Agentic AI, however, goes beyond simple chat interactions—it uses advanced reasoning and iterative planning to autonomously solve complex, non-deterministic multi-step problems, turning insights into actions that enhance productivity.

What if we applied agentic AI to the day-to-day operational challenges faced by cloud native engineers? That’s where kagent comes in.

Kagent is an open source programming framework designed for DevOps and platform engineers to run AI agents in Kubernetes. It helps engineers build powerful internal platforms by tackling cloud native tasks such as configuration, troubleshooting, complex deployment scenarios, observability pipelines and dashboards, and safely enabling network security (like mTLS, authentication/authorization changes, etc). Recognizing that different users have different challenges, we designed kagent to be extensible—allowing DevOps engineers, platform teams, and tool developers to create and share their own AI agents.

Built on Microsoft’s AutoGen open source framework, kagent provides a powerful foundation for AI-driven solutions in cloud native environments.

Architecture of kagent

Kagent is built on three key layers: Tools, Agents, and the Framework.

Three layers: tools, agents, framework

Based on our experience, we’ve built tools to interact with Kubernetes, Prometheus, Istio, and Argo, along with AI agents to autonomously solve some of the most common cloud native problems.

Join the kagent Community

If you’re a platform or DevOps engineer, and leveraging AI agents to solve cloud native operation challenges excites you, or if you’re a developer or CNCF project maintainer interested in building AI agents to enrich our ecosystem, we’d love to collaborate!

Check out our website and GitHub repository. Follow our getting started guide to experiment with AI agents in your Kubernetes cluster, and contribute your own agents and tools to extend the cloud native AI ecosystem. Join the discussion in the #kagent channel on CNCF Slack—we’d love to hear from you!

Let’s build the future of AI agents to solve cloud native operation challenges, together.