Community post by Dave Smith-Uchida, Technical Leader, Veeam (Linkedin, GitHub)
Data on Kubernetes is growing with databases, object stores, and other stateful applications moving to the platform. The Data Protection Working Group (DPWG) focuses on data availability and preservation for Kubernetes – including backup, restore, remote replication, and the facilitation and orchestration of these processes. At the Data Protection Working Group Deep Dive at Kubecon + CloudNativeCon Salt Lake City (Nov. 13, 2:30 PM), Xing Yang, Cloud Native Storage Tech Lead at Broadcom/VMware, and I will cover topics including:
- The need for Kubernetes data protection
- Deep Dive on Changed Block Tracking (CBT)
- Deep Dive on Volume Group Snapshots
- Best Practices to Prepare Kubernetes Applications for Data Protection White Paper (new project)
- The structure of the Data Protection Working Group
- How to get involved with the Data Protection Working Group
The Need for Kubernetes Data Protection
Kubernetes has evolved from its original mission as an orchestrator for stateless containers that uses external services for data storage into a platform that supports data storage and state within a Kubernetes cluster. State can be stored in Persistent Volumes (PVs) but also in Kubernetes resources as native Kubernetes applications take advantage of the Kubernetes API server to store their working information. The evolution of Kubernetes into a stateful platform has created a need to protect the data stored in Kubernetes against loss, corruption, and other threats such as ransomware attacks. The Data Protection Working Group has published a white paper that outlines when you need data protection in Kubernetes. We’ll cover the high points during the session at KubeCon, but in the meantime, we invite you to read the paper here: https://github.com/kubernetes/community/blob/master/wg-data-protection/data-protection-workflows-white-paper.md
Deep Dive on Changed Block Tracking (CBT)
The Data Protection Working Group has been working on adding CBT to Kubernetes and the Container Storage Interface (CSI). CBT improves the performance of backup and replication of large volumes by tracking the blocks that have been changed between two snapshots. When a backup is performed, the backup system creates a volume snapshot and retrieves the list of blocks that have changed since the previous backup’s snapshot was taken and only copies those blocks. Many storage systems, both traditional and cloud-based, implement CBT, but the APIs are proprietary. Since Kubernetes has already created standard APIs for allocating, attaching, snapshotting, and cloning volumes, adding CBT is the next step in standard APIs for storage systems. Veeam Kasten has been a leader in using proprietary CBT systems for Kubernetes data protection, and we’re proud to have been a participant in creating the Kubernetes CBT API, which is currently in beta with Kubernetes 1.32.
Changed Block Tracking KEP: https://github.com/kubernetes/enhancements/issues/3314
Deep Dive on Volume Group Snapshots
Volume Group Snapshots are another Kubernetes enhancement that supports data protection. When an application uses multiple volumes, taking a consistent snapshot of all of the volumes is important. Taking snapshots one by one while the application is running may result in inconsistencies between the volumes, and may create a backup that will not be usable. One way to get consistency is to quiesce the application and snapshot each volume it uses, but this can take considerable time and the application will be unavailable while in the quiesced state. Volume Group Snapshots snapshot all of the volumes in the group together without needing to quiesce the application to achieve consistency. As with CBT, this is a feature offered by many storage systems, but only with proprietary APIs.
Volume Group Snapshots KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot
White Paper: Best Practices to Prepare Kubernetes Applications for Data Protection
A new project for the group is creating a white paper on Best Practices to Prepare Kubernetes Applications for Data Protection. When working with data protection solutions like backup and restore, applications need to be structured so that the backup and restore process can make consistent backups and be restored to a working state. This is an ongoing project, and we invite everyone to join us and share their needs, experiences, and ideas for how to best prepare their applications for data protection.
Structure of the Data Protection Working Group
The Data Protection Working Group consists of participants who use Kubernetes and create applications, storage, and data protection solutions for the platform. We’re open to anyone who is interested in protecting their data on Kubernetes. Come join our session to exchange ideas, find out how to contribute and let us know what your needs are!
Click here to learn more about the Kubernetes Data Protection Working Group.
Come see us at KubeCon + CloudNativeCon North America 2024!
Stop by Veeam’s booth (#K7) for in-person demonstrations of our Veeam Kasten data protection solution and talk to our subject matter experts.
Kanister
Kanister is an open source framework for data protection and management on Kubernetes. It is a CNCF Sandbox project and can be found at: https://www.kanister.io/.
Come see Xing and I at the Data Protection Working Group Deep Dive at Kubecon + CloudNativeCon Salt Lake City (Nov. 13, 2:30 PM)