There are three high-level requirements to operating an etcd cluster in production:
Each etcd member must be bootstrapped: The etcd binary has to be on the host and the runtime parameters must be defined.
The list of members must be kept up to date: This list must come from some external source of truth such an infrastructure API or the administrator themselves.
Periodic backups of the etcd state must be configured.
In this talk, we present etcdadm, a project recently adopted by the Kubernetes Cluster Lifecycle SIG. etcdadm is a command-line tool with a kubeadm-like interface that makes it easy to meet the first requirement by simplifying the bootstrap process. The project’s goal is to also make it easy to meet the second and third requirements by programmatically querying a variety of APIs for cluster membership, and by integrating existing services to store backups. Etcdadm will enable the automated operation of etcd clusters with a user experience analogous to CoreOS’s etcd operator, but without the prerequisite of a functional Kubernetes cluster.
We will explain the etcdadm design in-depth, and discuss how our experience running etcd clusters in production informed that design. We will also demonstrate deploying a highly available etcd cluster using etcdadm, recovering the cluster from both partial and complete failures. In the process, we will cover important etcd runtime parameters and caveats of dynamic cluster reconfiguration.