shopping
Case Study

Kakao

Kakao improves network performance and lowers costs with Cilium

Challenge

Kakao Corp is a South Korean company known for its popular messaging application, KakaoTalk. They not only help people stay connected but also offer additional services such as maps, comics, online shopping, banking, taxis, and more. Their platform has over 50 million users, primarily in South Korea.


Initially, Kakao built their Kubernetes platform using Cilium due to its better performance and to reduce network costs through eBPF. However, they combined Cilium with kube-proxy and Nginx Ingress, which led to several network problems. The addition of kube-proxy, in particular, made the system more complicated and slowed down debugging because there were too many parts to investigate.

Solution

After reevaluating their options, the platform team at Kakao once again selected Cilium as the CNI for their Kubernetes platform due to its use of eBPF which helps reduce networking costs and its kube-proxy replacement option helps them eliminate the need for kube-proxy in their system.

Impact

Leveraging Cilium and kube-proxy replacement in their Kubernetes platform has lowered Kakao’s network and performance costs, and enabled faster debugging of network issues.

Challenges:
Industry:
Location:
Cloud Type:
Published:
July 25, 2024

Projects used

By the numbers

7,000+

Clusters

120,000+

Nodes across multi-zones

102.5 Million

Monthly active users (domestic)

Replacing kube-proxy with Cilium for Better Performance and Faster Debugging

Kakao’s platform engineering team is tasked with building, operating, and maintaining their Kubernetes platform. This platform offers Kubernetes as a service and is built on OpenStack. They run the platform on-premise, managing over 7,000 clusters and 120,000 nodes.

The Kakao platform engineering team initially built their Kubernetes platform using Cilium, kube-proxy, and Nginx Ingress. However, they began experiencing frequent network issues primarily due to kube-proxy, which led to an increase in support tickets from their developers. The complexity of their network stack made timely debugging difficult. Faced with these challenges, Kakao needed to reconsider the structure of their platform’s networking.

“We were using Cilium already, but we combined it with kube-proxy which was causing us to have a lot of network issues. It was hard for us to tackle these issues because we had to analyze kube-proxy and Cilium and also Nginx Ingress for L7 policy. There were so many components to dig into when we had calls about network issues.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

During their re-evaluation, Kakao wanted a CNI that has good performance, could replace kube-proxy, and is easy to install.

“While we were re-evaluating, we wanted a CNI that was good with performance, easy to manage, and could be installed automatically since we have so many clusters. For performance, we were trying to support multi-zone clusters, which could lead to more network issues if the performance was low. We also considered removing kube-proxy since we wanted to reduce the number of components in our platform for easier debugging.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

Kakao evaluated Cilium again with Calico and Flannel and selected Cilium as the best option for them due to its use of eBPF which can drastically reduce network costs and its kube-proxy replacement option which enabled them to eliminate the need for kube-proxy in their platform.

“We compared Calico, Flannel, and Cilium. We thought Cilium was the best because of its use of eBPF which can reduce network cost and also its kube-proxy replacement option reduced the complexity of debugging network issues by just focusing fully on Cilium itself. Finally, Cilium has many options, but its default options serve us well in almost every case.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

After their re-evaluation, Kakao decided to upgrade their Kubernetes clusters and use Cilium with full kube-proxy replacement mode.

“Our current setup right now is the same but without kube-proxy. Cilium is the default CNI in our platform.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

Reducing Costs for Network and Performance with Cilium

Cilium has been a tremendous success for Kakao and is now a key part of their Kubernetes platform. With Cilium, they have managed to lower their network costs, very important given the size of their platform, and also completely replace kube-proxy, which has significantly reduced the number and complexity of network issues they face.

“As an engineer, I can say that Cilium has lowered our costs for performance and network. Because we have so many clusters and nodes, we always need more machines for new services to be served. By reducing network costs or CPU consumption with Cilium, we can manage with fewer nodes. Compared to kube-proxy, Calico, or Flannel, Cilium offers more value in that aspect.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

“With kube-proxy, we needed more time to figure out when there were networking issues. Without it, we can just look at Cilium’s options or endpoints list. Also, if there are issues in Cilium, the Cilium community provides updates very regularly. We don’t have many problems these days and even if there are, there will be patches soon. We don’t need to look at issues so much now unlike before.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

The Kakao platform team also appreciates the Cilium community for all the help that they’ve gotten so far when trying to learn about the project or fix the issues they are having.

“The Cilium community provides a lot of tutorials like videos or books, which are all available online. You can look at all the books and videos to learn more about eBPF and how Cilium works as a CNI. The community is good and they look through issues very quickly, so patches are being released very often. ”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

For the future, Kakao’s platform team is currently evaluating Cilium Cluster Mesh and Hubble after hearing about these features of Cilium from the Community and other end users.

“We explored Cluster Mesh while we were at KubeCon Paris 2024 and noticed many people were using it. We are currently provisioning clusters in multiple zones, which is a bit different from cluster mesh so I am planning to examine the pros and cons between provisioning multi-zone clusters and using cluster mesh.I also heard that many other companies use Hubble for debugging issues and we are considering using it in the future.”

Kwang Hun Choi, Cloud Engineer, Kakao Corp

You can try out Cilium with interactive labs and also watch Kakao’s talk at KubeCon EU 2024 to get a better understanding of their infrastructure.