Case Study

DigitalOcean

Scaling for the future with Cilium

Challenge

DigitalOcean is a cloud service provider with a target market of small to medium-sized businesses, developers, and startups. They offer a full suite of cloud infrastructure and services to their customers, including a managed Kubernetes service known as DigitalOcean Kubernetes Service (DOKS). 

They initially built the overlay networking layer of DOKS with Flannel, but as the platform expanded in size and scale, they realized they needed a more mature networking solution to handle this growth.

Solution

DigitalOcean selected Cilium as the preferred networking, security, and observability solution for their DOKS platform due to its prominent role in the cloud native ecosystem, its capabilities in IP Address Management (IPAM), and its powerful network policies. Cilium’s use of eBPF to enhance performance further solidified its selection.

Impact

Switching the dataplane of DOKS and their internal Kubernetes clusters from Flannel to Cilium, powered by eBPF, enabled DigitalOcean to onboard more sophisticated customers, scale to meet their demands, and secure their multi-tenant environment. The performance, security, and observability provided by Cilium allowed them to meet the critical requirements necessary for these customers to use their platform.

Industry:
Cloud Type:
Published:
June 24, 2024

Projects used

Cilium
Kubernetes

By the numbers

637,000

Customers

190

Countries

16

Data centers

Migrating to Cilium for Performance, Security, and Additional Features

During the early days of building their managed Kubernetes service (DOKS), DigitalOcean used Flannel for networking. However, as their DOKS platform grew, they recognized the need for a more sophisticated CNI that could scale with their platform.

Motivated by this requirement, they began searching for a solution that offered powerful network policies to keep their customers separated in a multi-tenant environment and also had easier IP address management. After evaluating their options, they had to choose between Cilium and Calico.

“The first overlay network implementation for DOKS customers was Flannel. It was simple and did not have any network policy features. As the Kubernetes offering of DigitalOcean matured, it was clear that there needed to be something more mature on the network side. There were only two ways to go, it was either Cilium or Calico from that point.”

Ingo Gottwald, Senior Engineer, DigitalOcean

DigitalOcean chose Cilium for its prominence in the cloud native ecosystem, its improved IPAM and network policies, and its use of eBPF, significantly enhancing performance.

“One of the factors behind choosing Cilium was that you could see that Cilium was slowly establishing itself as the default CNI in the cloud native ecosystem. Cilium was also the first CNI powered by eBPF and XDP, which was promising in terms of performance.

Network policy handling, especially for our internal multi-tenant environment, was critical because we couldn’t move to this new and more flexible architecture without having these security barriers in place. Another factor was that Cilium would fully handle IPAM for us.

Ingo Gottwald, Senior Engineer, DigitalOcean

After deciding, they devised a custom installation process to manually migrate their cluster network to Cilium, aiming to execute the migration with minimal downtime.

“In terms of installation, we have custom Kubernetes controllers and other reconcilers that manage customer control planes. They are responsible for detecting the type of configuration that is live and the kind of software deployed on those clusters. To integrate this effectively, we maintain a configuration pipeline that first renders Helm charts into YAML manifests statically, which are then incorporated into our code base and dynamically tailored to the needs of a given cluster at runtime.

We moved from etcd-backed cilium to CRD-backed storage while migrating to our new architecture. Managing clusters on a CRD basis proved far easier for us than managing direct etcd access.”

Ingo Gottwald, Senior Engineer, DigitalOcean

Cilium is now integrated into DigtialOcean’s multi-tenant environment, utilized both in their managed Kubernetes service and for internal infrastructure. In DOKS, Cilium is provided to customers as the default CNI option with limited configuration flexibility. Internally, it is used to isolate individual control planes of clusters created by customers, ensuring they remain separate.

“We use Cilium in two places. We have the product DOKS, which is our managed Kubernetes offering, and all of those Kubernetes clusters come with Cilium as the default CNI. For our customers, there’s no possibility of choosing a different one; everything is optimized to use Cilium. It’s just a plain CNI implementation, but we also recently started shipping Hubble and Hubble Relay for observability. 

Then there’s the other side, which is in our infrastructure. To simplify infrastructure management, we run our own Kubernetes clusters to host our customers’  Kubernetes control planes. We’re running Kubernetes in Kubernetes. Our internal Kubernetes clusters also run Cilium and run it more elaborately. Since it is a multi-tenant environment, we’re making heavy use of Cilium Network policies and we’re running Cilium in a default-deny enforcement mode. Cilium provides an important security barrier. We use Cilium network policy to isolate every control plane within its namespace, helping to ensure it cannot communicate with anything in the cluster except the neighboring services essential for its operation.”

Ingo Gottwald, Senior Engineer, DigitalOcean

On the internal side, with Cilium now a core part of its networking infrastructure, DigitalOcean benefits from more networking and security capabilities due to Cilium’s rich set of features, more accessible network management, and enhanced observability.

“Cilium brings a lot of network features that we could not offer otherwise because they’re complicated to develop. It enables a more intelligent network than just having a flat layer three for your Kubernetes clusters. Additionally, aspects of network management, such as IPAM, have become much easier with Cilium.”

Ingo Gottwald, Senior Engineer, DigitalOcean

Providing Enhanced Observability to Customers with Hubble

Observability has been one of the top benefits of migrating to Cilium for DigitalOcean and its customers. Internally, they use Hubble to monitor their network traffic and assess the performance of their network policies, and Hubble is one of the features most requested by their customers.

DigitalOcean appreciates Hubble’s observability capabilities, especially the service map feature, which their customers highly praise.

“Regarding Observability, with Hubble, it’s night and day. With Hubble, it’s easy to see where traffic is going and where it’s denied. In a distributed world, observability is a key feature. 

Hubble was one of the features our customers most frequently requested to be enabled, and we recently enabled it. One feature they love is the Hubble UI, especially the services map feature, where you can see all of your pods and their communication. Our customers greatly value this because it gives them a better understanding of their applications and network connections.

Internally, we also use Hubble because we want easy debuggability for network policies. We didn’t want to log into every Cilium agent and run the Cilium monitor command to identify issues; we wanted one central place that could simplify this process, and that’s what we got with Hubble.”

Ingo Gottwald, Senior Engineer, DigitalOcean

Enhancing Business Growth with Cilium 

Integrating Cilium has been a huge success for the DigitalOcean team. They have improved the security, observability, and performance of their platform’s network both internally and for their customers. Migrating to Cilium has also helped the DigitalOcean team acquire more customers with stringent platform requirements and keep these customers happy.

“Cilium enabled us to acquire more sophisticated customers. Once you enter the small and medium business market, where some companies must meet certain benchmarks, policies, and regulations, having specific performance, security, and observability features becomes critical for them to even consider your product.”

Ingo Gottwald, Senior Engineer, DigitalOcean

“We have customers who build applications like video streaming applications and fintech applications which require solid networking performance and thanks to Cilium they get a good experience from DOKS. 

Other customers have stringent security requirements, which are primarily driven by their own needs and compliance. Cilium provides network policies, encryption, and various other security mechanisms that help them stay secure. 

Finally, the thing that pushed us to look at Hubble was troubleshooting. Building complex real-time applications is hard, and when you run into performance issues, even if they happen occasionally, you need to be able to troubleshoot them. Our customers needed Hubble and we could just enable it with Cilium.  Each of these features in the overall Cilium platform is yet another value proposition for customers.”

Bikram Gupta, Lead Product Manager, Kubernetes & App Platform, DigitalOcean

Future Plans and Offering More Capabilities to Customers with Cilium

In the future, the DigitalOcean team will have more plans for Cilium, both internally and for their customers. They plan to offer more customization to Cilium deployments in their customers’ clusters in a safe manner. Internally, they aim to move from partially to entirely replacing kube-proxy to unlock more features. Additionally, they are closely monitoring the development of the Cilium ingress controller and ClusterMesh.

“We are considering letting customers customize their Cilium configuration to a degree that is still safe for them to do without risking issues with our managed offering. At the same time, our current kube-proxy replacement configuration is partial; that is what we started with, and we’re still on that. We’re looking into moving to the full replacement mode to run kube-proxy less because it enables so many Cilium features that we cannot use today. This would build the basis for all the features that we want to enable for our customers.”

Ingo Gottwald, Senior Engineer, DigitalOcean

“Cilium is a very exciting product in the Kubernetes space, and we are closely watching Cilium’s ingress and cluster mesh features. They both look exciting, and I believe have great potential for our customers.”

Bikram Gupta, Lead Product Manager, Kubernetes & App Platform, Digital Ocean