
Streamlining Global Logistics with Cilium at DB Schenker
Challenge
The IT unit for the land transportation business of DB Schenker identified an opportunity to elevate its IT infrastructure to meet global delivery demands. Troubleshooting the networking stack was challenging due to limited visibility and reliance on multiple tools like Wireshark. Their existing CNI offered limited capabilities for performance enhancements and lacked advanced observability, making it hard to monitor and troubleshoot microservices effectively. The team needed a modern, scalable, and secure networking solution to address latency, increase observability, enhance security, and simplify operations.
Solution
The team adopted Cilium for its eBPF-based networking capabilities after evaluating various options. Cilium’s chaining mode allowed for an incremental migration that minimized disruptions. With features like Hubble for real-time observability, WireGuard for encryption, and seamless integration with existing infrastructure, Cilium addressed DB Schenker’s needs for improved performance, enhanced visibility, and simplified operations. Removing kube-proxy and their service mesh further optimized networking, reducing latency and overhead.
Impact
Cilium transformed DB Schenker’s IT operations, optimizing service connectivity, and improving network stability. Enhanced observability enabled faster troubleshooting, while performance improvements minimized latency and improved resource usage efficiency. Security was bolstered through fine-grained network policies and encryption, and the modular deployment allowed for smooth adoption. Cilium has become a cornerstone of DB Schenker’s strategy to build a future-ready, scalable IT platform.
By the numbers
460 +
Kubernetes Nodes
700 +
Business Applications
7 Million +
HTTP Requests/Hour
Challenges Delivering Networking Around the World
DB Schenker’s IT infrastructure underpins the operations of a logistics powerhouse. As a global leader in logistics, it operates over 1,850 locations worldwide, employs more than 72,700 people, and handles land, air, and ocean transportation. Its IT infrastructure supports thousands of services and applications across multiple Kubernetes clusters, managed by a team of eight platform engineers. Scaling this infrastructure to meet global demands presented significant challenges.
With thousands of services and applications running across multiple Kubernetes clusters, the company’s platform engineers face a monumental task in ensuring seamless operations. At the core of these operations lies a Kafka-powered ecosystem, capable of handling over a million requests per second, a workload that demands unparalleled stability and performance.
Initially, DB Schenker relied on Calico as its CNI. While functional during the early stages, Calico began to show its limitations as the platform scaled. Amir Kheirkhahan, a platform engineer at DB Schenker, explained, “Calico met our requirements at the time, but it lacked the visibility and performance enhancements we needed. Troubleshooting was complex and time-consuming.” The team recognized the need for more robust monitoring capabilities for microservice connections rather than relying on tools like Wireshark for packet tracing. “The lack of real-time monitoring challenged the ability to trace communication between services,” Kheirkhahan added.
By transitioning to eBPF solution, performance was significantly improved, and lower latency was achieved compared to the previous iptables-based approach. “We realized that eBPF could simplify connection tracking and provide faster TCP connections,” Kheirkhahan noted. The team’s goals were clear: improve observability, reduce latency, and enhance security.
After evaluating alternatives, including Calico eBPF, the team selected Cilium. What set Cilium apart was its robust eBPF-based architecture and seamless integration capabilities. “Cilium’s documentation made it clear what we needed to do and the migration process was smooth, thanks to Cilium’s well-documented and modular approach” Kheirkhahan shared. You can read more about their migration in the blog on the Cilium website.
Faster Networking, Better Observability with Cilium
The team began the transition using Cilium’s chaining mode, which allowed them to retain their existing VPC CNI setup during the initial phases. This incremental approach minimized disruptions and provided a safety net for testing. One of the first milestones was the removal of kube-proxy, a major source of latency and overhead. “By removing kube-proxy and leveraging eBPF, we achieved faster TCP connections and reduced latency,” Kheirkhahan reported. “Cilium’s eBPF capabilities simplified connection tracking, removing overhead and improving speed.”
Another game-changing feature was Hubble, Cilium’s observability tool. With Hubble, the team gained real-time, layer-seven visibility into network traffic, enabling them to troubleshoot issues with unprecedented efficiency. “Cilium provided real-time, layer-seven visibility into network traffic, which was previously unavailable,” Kheirkhahan explained.
Simplifying Service Mesh with Encryption from Cilium
Cilium provided an integrated security model that strengthened defensive capabilities.Replacing their service mesh with WireGuard for encryption simplified the service mesh architecture while enhancing security.
“Our service mesh’s discontinuation of open source support was a trigger for us to explore alternatives. We replaced a fragmented set of tools with Cilium, streamlining our operations significantly. Cilium’s built-in encryption capabilities were a perfect fit for what we needed from a service mesh.” – Amir Kheirkhahan, platform engineer at DB Schenker
The ability to enforce namespace isolation with Cilium’s network policies also strengthened DB Schenker’s security posture. “Cilium’s network policies gave us finer-grained control over application access,” Kheirkhahan shared. “The ability to enforce namespace isolation with Cilium enhanced our zero-trust security model.”
Reduced Operational Complexity and Downtime with Cilium
The results were transformative. Pods were scheduled faster, and nodes ensured networking readiness before being marked available. The application resilience increased significantly and it resulted in smoother and more consistent operational performance. “After switching to Cilium, incident rates have decreased significantly,” Kheirkhahan shared. The improved stability allowed the team to focus on strategic initiatives.
Replacing their service mesh with Cilium also eliminated unnecessary complexity. “Our microservices previously relied on service mesh functionality, but it became a limiting factor as our requirements evolved,” Kheirkhahan explained. “With Cilium’s integrated encryption and observability features, we no longer needed a separate service mesh. This streamlined our architecture and reduced operational overhead.”
Efficiency gains, for example in our resource utilization, have contributed to overall cost reductions. . “We dynamically scale our AWS instances based on workload requirements,” Kheirkhahan noted. “Cilium’s efficiency allows us to make even better use of our infrastructure.”
Gateway API and Cost Optimization Future Goals
DB Schenker is always looking for ways to improve their platform for developers and Cilium has multiple features they are evaluating for the future. “Layer-seven visibility is the next step in our roadmap,” Kheirkhahan said. “We’re excited about the potential it offers for enhancing our security posture and operational efficiency.”
Cilium has optimized networking at DB Schenker, addressing complex challenges while enhancing performance, observability, security, and stability. By adopting an eBPF-powered solution, DB Schenker has ensured its IT infrastructure is prepared to meet the demands of a rapidly evolving logistics landscape. As Kheirkhahan said, “Cilium is not just a tool; it’s a cornerstone of our platform’s success.”