Case Study

Seznam

Seznam opts for Cilium to boost performance and reduce costs

Challenge

Seznam is a Czech technology company that offers a variety of services including a search engine, news aggregator, TV streaming, email, and maps. 

Initially, Seznam utilized F5 hardware load balancers and IPVS for managing network traffic, along with Calico for network security within their infrastructure. Upon discovering Cilium and evaluating its capabilities, Seznam recognized that Cilium could potentially replace their existing network solutions and simplify the security, cost, and management of their clusters.

Solution

Seznam selected Cilium as their solution for networking, security, and observability due to its high performance, open source nature, popularity within the ecosystem, cluster mesh feature, and label-based identity management and security.

Impact

Migrating their infrastructure networking to Cilium has allowed Seznam to increase performance while reducing both capital and operational costs and achieving deeper visibility into their network for faster debugging.

Industry:
Location:
Cloud Type:
Published:
July 30, 2024

Projects used

Cilium
Kubernetes

By the numbers

1 Million

HTTP requests/second

100 +

Clusters in a cluster mesh

100k +

Endpoints across the DCs

Adopting Cilium for High-Performance Load Balancing

Seznam’s platform and infrastructure team, consisting of 20 members, collaborates to build, support, and operate their infrastructure for 700 developers. This infrastructure operates across three data centers and includes 10,000 bare metal hosts, as well as Kubernetes clusters running on OpenStack.


Seznam initially built their load-balancing infrastructure using F5 load balancers but later transitioned to IPVS to reduce costs. Upon the release of Cilium 1.8, the Seznam team began evaluating the project. They were particularly impressed by the kube-proxy replacement feature and the performance improvements that Cilium offered over IPVS. Their evaluation concluded when support for the Maglev algorithm was announced, along with the standalone Layer 4 XDP Load Balancer. These new additions prompted them to consider migrating their existing load-balancing setup to Cilium.

“Initially, we utilized F5 load balancers, which was a paid solution, before transitioning to IPVS, a software-based solution integrated into the Linux kernel. When Cilium 1.8 was released, featuring the kube-proxy replacement, we began extensively evaluating Cilium and noticed significant performance gains. However, it wasn’t until Cilium 1.9, which introduced support for the Maglev algorithm, that we seriously considered switching. We had been using IPVS with the Maglev algorithm, and if we were to replace it, we wanted to maintain that consistency. 

The announcement in Cilium version 1.9 regarding Maglev support was pivotal for us. Furthermore, in Cilium version 1.10, the introduction of a standalone load balancer component, capable of running independently, aligned perfectly as a direct replacement for our existing load balancing setup. That was the point where we said ‘This is stunning, we need to look more into Cilium and try to use it’.”

Ondřej Blažek, Software Engineer, Seznam


When they compared Cilium’s standalone layer 4 load balancer with their IPVS-based one, they found a 72x reduction in CPU consumption, allowing them to save massive hardware costs. The first piece of Cilium went into production in their network infrastructure.

“We have had Cilium’s layer 4 load balancer running in production for a couple of years now. It works very well and has saved us a lot of money. We saved a lot of money on load balancing because, with our previous solution, IPVS, we needed a lot of machines. Now, with the XDP-based one from Cilium, we are fine with just a couple of nodes, and the rest can be moved into OpenStack to add to our available CPU resources. Switching over to XDP-based acceleration, you can see the savings right away.”

Ondřej Blažek, Software Engineer, Seznam

Simplifying Network Security and Connectivity with Cilium Identities

For OpenStack networking, the Seznam team utilized Calico with IP-based security groups. However, while evaluating Cilium for load balancing, they realized that Cilium could also help them secure their network. 

“We had Calico as the networking layer in OpenStack, but we encountered a problem finding a solution that would help us secure everything. Our users typically have their APIs, backends, and frontends in Kubernetes, but most of the databases used in our services run in OpenStack and VMs. 

We need to secure the traffic between these services and allow traffic to the databases and clients. This is where we saw how Cilium could fix everything for us. If we could run Cilium in OpenStack, it would mean no more security groups or IP-based security groups; we would just have the label-based identities that Cilium provides for network security. We just love the label-based security.”

Ondřej Blažek, Software Engineer, Seznam

Besides securing their network, Seznam.cz was also looking for a solution that would help them mesh all their OpenStack and Kubernetes clusters together. During their evaluation process, they saw that Cilium could help them with this challenge.

“We were looking for a solution that would allow us to connect multiple OpenStack and Kubernetes clusters and help us understand what’s going on in them. Before Cilium, we had Calico everywhere; if you wanted to connect to the OpenStack cluster from the Kubernetes cluster, you’d need to know the pod CIDRs and the IP addresses of the Kubernetes cluster and then specify that in the OpenStack (security groups). 

This process was not user-friendly because you needed to keep track of when a new pod subnet was added to Kubernetes to manually insert it into the security groups to allow access. On the other hand, if you use Cilium, all of this happens by default; you only need to care about knowing which label you want to allow access from.”

Ondřej Blažek, Software Engineer, Seznam

Realizing that Cilium could ultimately help them solve the majority of their networking and security issues, they set out to run Cilium across their whole infrastructure. They reached out to some of the Cilium maintainers to help them figure out how to run Cilium in OpenStack.

“We talked to Thomas Graf and other Cilium maintainers to figure out a way to run Cilium in OpenStack and managed to build our own plugin and get it working.”

Ondřej Blažek, Software Engineer, Seznam

Deploying Cilium On OpenStack and Kubernetes for Seamless Networking

To ensure seamless connectivity across their infrastructure, Seznam deploys Cilium on both Kubernetes and OpenStack clusters. Cilium agents run on each hypervisor in their OpenStack clusters, similar to how they would run on Kubernetes nodes. These agents apply eBPF programs on the network interfaces, enforcing policies at the ingress points to the VMs, which serve as Kubernetes nodes.

To bridge the gap between Kubernetes and OpenStack, Seznam devised a solution where they run a lightweight Kubernetes API, such as K3s, alongside Cilium in OpenStack. This allows users to apply the same network policies they write for Kubernetes to their OpenStack environment. The Cilium agents in OpenStack watch this Kubernetes control plane and enforce the policies on the ingress points to the VMs, including databases. By integrating Cilium with both Kubernetes and OpenStack, Seznam achieved a seamless mesh across Kubernetes and OpenStack. This enables clients in Kubernetes to easily communicate with databases in OpenStack and vice versa, provided the correct policies are applied.

“We have lots of OpenStack clusters where Cilium is running on each hypervisor. Just like Kubernetes nodes, there are hypervisors where Cilium agents are running, and they are implementing or applying eBPF programs on the network interfaces that serve as the Ingress to the VMs. These VMs are also Kubernetes nodes.

The Cilium agents in OpenStack are not as native as in Kubernetes. In Kubernetes, you have Kubernetes objects you can apply, such as network policies and Cilium cluster-wide network policies. So, we thought: how can we do this in OpenStack where you just have the Cilium agent? We figured out we could run just the Kubernetes API, so our users can use the same network policies they write for Kubernetes and apply them to OpenStack. The Cilium agent, then, is not watching network policies like in vanilla Kubernetes but has its own Kubernetes control plane where users can apply policies. The Cilium agent will just watch and enforce those policies on the ingress points to the VMs, for example, to the databases. 

If you then look at Kubernetes clusters where you have Cilium as the CNI, the classic scenario everybody runs, and you mesh those together with OpenStack, you have the cluster mesh we wanted to have. Now if a client is in Kubernetes and wants to talk to the database in OpenStack, it can do so easily. Once a developer applies the correct Cilium network policies, their application can easily talk to the database and back, and vice versa. 

It doesn’t matter; with Cilium the whole network works in the same way.”

Ondřej Blažek, Software Engineer, Seznam

Automating the Migration from Calico to Cilium

Seznam originally chose Calico as the CNI for their Kubernetes clusters. However, after switching load balancing to Cilium, they began to see other areas that Cilium could benefit them too.

“With Calico, we were blind. Someone would have bad latencies or there might be some drops, but we couldn’t see anything. There was no way to understand what was exactly going on. Sometimes our best bet was to ask if they had tried turning it off and on again. With Cilium, we could see it was a whole different world with the Cilium monitor command and Hubble. We could see what was happening in the network and come up with a solution.”

Ondřej Blažek, Software Engineer, Seznam

Seznam operates many Kubernetes clusters, some of which are purely Cilium-native. However, they also have many older clusters based on Calico, which have been challenging to decommission. To facilitate the migration to Cilium, Seznam developed a tool that translates Calico workload endpoints into Cilium endpoints and IP identity pairs. This approach allows for a smoother transition without forcing users to move everything to new clusters immediately.

Given the size of these clusters, a gradual migration was necessary. Their solution lets users incrementally move components and apply policies on the Cilium side while still operating within Calico clusters. The tool watches for changes in Calico cluster information, such as label updates, and converts these into IP-identity pairs that are shared across the entire cluster mesh. It also leverages etcd to store key-value pairs, facilitating information migration and policy enforcement across environments.

“After going all in on Cilium, we were thinking, how can we help ease the migration to the Cilium world? We created a tool which can translate from Calico naming to Cilium. In the Calico world, it’s called a workload endpoint and we translate those workload endpoints into Cilium endpoints and into the IP identity pairs. 

The tool migrates the information about a pod and IP address and the labels they have into an object in etcd. With information in key-value storage, the  tool can just convert it into an IP identity pair which is then shared across the whole cluster mesh.

With this, we have Cilium in OpenStack, in Kubernetes, and now can translate from our Calico clusters too. Our users don’t need anything else and with that, we can connect everything and enforce policies. Users in the Calico world don’t have to  understand the Cilium world yet, but they can access it and everything in OpenStack which helps ease their migration to Cilium.”

Ondřej Blažek, Software Engineer, Seznam

Beyond their migration tool, a significant feature for Seznam was the policy audit mode in Cilium, which allowed them to observe traffic and see potential policy drops without strict enforcement. This helped them write proper network policies before fully implementing them.

“There is this policy audit mode which allows you to observe the traffic and see what would be dropped if you ran in strict enforcement mode. This audit mode was huge for us. It helped us a lot to write proper network policies before production traffic dropped.

Server by server, cluster by cluster, we installed, deployed, or migrated to Cilium in Kubernetes. Our new solution is implemented on bare metal, VMs, and Kubernetes, with Cilium on top to manage the entire network.”

Ondřej Blažek, Software Engineer, Seznam

Enhancing Networking Capabilities And Reducing Infrastructure Costs with Cilium

Cilium has been a game changer for the Seznam team, they have been able to save a lot of money by replacing their previous load balancer with the Cilium Layer 4 load balancer and also have tightened their network security and provided more flexibility to their developers.

“Using Cilium as our complete networking solution has made things easier for all our users, it’s easier to implement network security policies without the need to know anything about IP addresses. They just need to care only about the labels and that’s it. They are super happy that it works.”

Ondřej Blažek, Software Engineer, Seznam

One of Seznam’s goals is to have Cilium universally deployed across their environment, enabling them to understand and fix issues as they arise. This migration path, while still ongoing, has allowed them to enhance their networking capabilities and security posture significantly.

“The end goal for us is to have Cilium everywhere and understand the code. It’s all about being able to read the code and fix it if something doesn’t work because we want to work as a part of the Cilium open source community.”

Ondřej Blažek, Software Engineer, Seznam