Community post by Pavan Navarathna Devaraj and Shwetha Subramanian

AI is an exciting, rapidly evolving world that has the potential to enhance every major enterprise application. It can enhance cloud-native applications through dynamic scaling, predictive maintenance, resource optimization, and personalized user experiences. However, many challenges still prevent mass adoption, particularly regarding infrastructure, operations, and data management. Fortunately, cloud-native infrastructure combined with open-source software, models, tools, and databases, enables experimental and production-ready AI models to be efficiently trained, tested, and deployed.

Training machine learning models involves iterative runs on vast datasets. These models often generate high-dimensional data that’s stored in vector databases. The process of training, testing, and deploying AI models is resource-intensive, and requires significant compute power and GPU cycles. As these iterations accumulate, vector databases grow to hold the results of these expensive operations, making them invaluable to advancing AI workloads.

Why Vector Databases Matter in AI Workloads

Vector databases store high-dimensional vectors that represent unstructured data like text, images, and audio. These vectors enable similarity searches that are used in Retrieval Augmented Generation (RAG) to pull in relevant context from the massive datasets that are vectorized and stored in vector databases. The additional context helps improve the quality of responses generated by the Large Language Models (LLMs). Backing up these databases is essential for maintaining data integrity and preventing costly data loss that could disrupt AI applications.

By keeping AI applications and vector databases together on cloud-native infrastructure, organizations can optimize operations and manage their infrastructure more easily. However, because of the sheer volume of high-dimensional embeddings stored in vector databases, data protection is critical to preserve iterative training results before terminating ephemeral compute resources. Losing this data could set back critical AI workloads and make robust backup and disaster recovery (DR) strategies indispensable.

How We’re Solving It with Kanister

At KubeCon + CloudNativeCon North America 2024 in Salt Lake City, UT, our talk “Building Resilience: Effective Backup and Disaster Recovery for Vector Databases on Kubernetes”, will demonstrate an efficient and secure backup and restore strategy for a popular vector database using Kanister, an open-source CNCF Sandbox project. Kanister is a workflow management tool that simplifies data management on Kubernetes by offering the ability to perform atomic data operations on applications via custom resources called Blueprints and ActionSets.

Here’s how it works: ActionSets instruct the Kanister controller to execute an action, like a backup, while Blueprints define the steps needed to perform these actions on specific databases. During our talk, we will:

This practical demonstration will provide you with a clear roadmap for using Kanister to protect your AI/ML data and ensure your resilience and efficiency.

Why You Should Join Us!

In a world where AI is transforming every industry, protecting the infrastructure that powers these models is essential. If you’re working with AI, handling vector databases, or managing applications on Kubernetes, this session is for you! By attending, you’ll learn:

When and Where

Join us for our talk, “Building Resilience: Effective Backup and Disaster Recovery for Vector Databases on Kubernetes”, at Salt Palace in Grand Ballroom GI, Salt Lake City, UT.

We’ll show you how to future-proof your AI applications and secure your cloud-native infrastructure. Don’t miss this opportunity to learn how to safeguard your AI workloads and ensure business continuity in a cloud-native world!