Implementing Zero Trust Security with Cilium
Challenge
WS02 is a software company that creates products for API management, enterprise integration, and identity and access management. Recently, WSO2 introduced Choreo, a comprehensive internal Developer Platform as a Service that supports the complete lifecycle of enterprise software engineering. Choreo caters to different roles within an organization, such as developers, architects, engineering managers, and DevOps teams.
A key feature of Choreo is its observability capability, designed to provide insightful monitoring for microservices and APIs. However, challenges arose in supplying the necessary libraries to collect observability metrics for customer workloads and applications on the Choreo platform. Security posed another significant challenge. The platform aimed to establish a zero trust architecture, ensuring strict separation of customers’ workloads and business data while complying with regulatory requirements.
WSO2 embarked on a search for solutions to enhance the observability capabilities of Choreo, implement comprehensive zero-trust security, and ensure compliance with various data protection regulations.
Solution
WSO2 migrated all their clusters in Azure Kubernetes Service (AKS) to Cilium and implemented network traffic encryption using Cilium’s IPsec. Additionally, they adapted their cell-based architecture to incorporate Cilium’s Layer 3 and Layer 4 network policies, which helped secure the boundaries between tenants. To generate and visualize observability data for their cell architecture, they also set up Hubble.
Implementing Cilium provided WSO2 with a complete solution to the security, observability, and networking needs of their Internal Developer Platform as a Service.
Impact
With Cilium, WS02 now has zero trust networking for their developer platform which is key for onboarding new customers. With Hubble, they are also able to provide deep observability for customer applications.
By the numbers
60+
Trillion transactions/year
20+
Trillion managed API transactions/year
1+
Billion managed identities/year
WSO2 is a global software company that develops products and services for API management, enterprise integration, identity, and access management. Founded in 2005, the company serves a customer base comprising 750 direct customers, 5,000 OEM customers, and over 25,000 open source customers. Their products play a key role in the digital transformation journeys of various companies, handling more than 60 trillion transactions and managing over 1 billion identities annually.
The company currently operates offices in ten countries, with its primary research and development center located in Sri Lanka. This center is staffed with over 400 engineers focused on developing their product stack and offerings. Additionally, they have SRE teams responsible for managing operations for their internal developer SaaS platform called Choreo. The Choreo platform operates across multiple cloud providers with the cloud data plane as a multi-tenanted resource running across multiple different cloud providers’ managed Kubernetes Service.
The developer platform is designed to enable teams to create microservices, web apps, tasks or APIs in their chosen languages but encountered challenges providing consistent observability tools across various programming languages. Additionally, Choreo’s use of the cloud data plane requires stringent security measures, including isolating runtime environments, encrypting network traffic, and implementing strong authentication and authorization. To tackle these issues, Choreo needed to build a zero-trust platform to secure customer data and workloads while meeting regulatory requirements.
“Choreo is an internal developer platform that supports various personnel like developers and architects within the organization. With it, developers can easily write their code as microservices, web apps, tasks or even APIs in their preferred language. The first challenge was around observability. Since the microservices can be written in various languages, providing application-level libraries to collect these observability metrics was challenging. We needed a universal way to collect these metrics and offer insightful observability.
The next major challenge was security. In Choreo, when customers are using the cloud data pane, it is a shared resource. We need to protect their workload and data, isolate their runtimes, encrypt all the network traffic, and provide authentication and authorization for their business applications. Essentially what we aim to implement is a zero trust platform that provides all the necessary security requirements for a customer’s workload and business data.
In addition to these challenges, we also need to comply with various regulations. For example, with private data plane deployments, we were challenged to retain all the observability and runtime traffic within the private data plane itself without moving it into a central location.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
The WSO2 team conducted an extensive evaluation of various solutions, but found that they were either too resource intensive or incompatible with their existing architecture. Ultimately, they selected Cilium for its compatibility with their needs: it is eBPF-based, offers observability out of the box, integrates seamlessly into their architecture, and enables the implementation of zero-trust security through its network policy and transparent encryption capabilities.
“Initially, we explored different service mesh solutions, but the additional resource requirement for running a sidecar with each workload was not justifiable for us. Additionally, they had architectural limitations, particularly in retaining captured data within the user data plane which they do not natively support.
For universal observability, we identified that an eBPF-based solution would be ideal because eBPF allows us to capture metrics data without needing to instrument the application code. We initially used another eBPF-based metrics project but later encountered some architectural and stability issues.
When it came to implementing a zero trust platform, we understood that just adopting the tool was not enough, it had to be supported by the platform architecture too. For our initial implementation, we combined our cell-based architecture with Kubernetes, using namespaces and network policies. However, this approach did not meet our future requirement to complete a fully zero trust platform. This is where Cilium came into the picture.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
After selecting Cilium, WSO2 migrated all their clusters in AKS to utilize Cilium. They implemented transparent network traffic encryption using Cilium’s IPsec. Additionally, they adapted their cell-based architecture to incorporate Cilium’s Layer 3 and Layer 4 network policies, which helped secure the boundaries of the cell architecture. To generate and visualize observability data for their cell architecture, they also set up Hubble.
“We migrated all our clusters to Cilium, enhancing our cell-based architecture with Cilium’s advanced Layer 3 and Layer 4 network policies. Our setup also includes Hubble, integrated with Layer 3 policies for observability within the cell. We encrypted our network traffic with Cilium’s IPsec transparent encryption, ensuring all our outbound data transfers are encrypted. Cilium’s transparent encryption distinctive advantage lies in its kernel-level operation which provides it with a significant edge in performance over user-space solutions.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
Migrating their platform to Cilium provided WSO2 with a complete solution for their security, observability, and networking needs.
“Cilium provides more than just a CNI, it’s a complete service mesh solution. To us, Cilium is a single solution that covers a large number of our platform feature requirements. Before, we couldn’t find a single solution to all of our challenges but when we used Cilium, it was a perfect match. It provided all the network level functionality, all of the observability requirements, as well as the service mesh functionality.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
Providing Observability to Customers With Hubble
After setting up Cilium and configuring their network policies, the team also set up Hubble to offer observability for their customer’s workloads and to debug performance issues when they arose.
“We use Hubble in two scenarios. One is to provide an observability view to our customers, which we call our cell-based observability view. They can come to our Choreo dashboard and view their observable data like error count, request count, latency and the different HTTP status codes. All this data is captured from Hubble relay, which we collect and store in Azure ADX and then visualize. Second, if a customer has some performance issues, we directly use the Hubble CLI to look at the Layer 3 packets to see where things are failing and what optimization we can do. Ultimately, we use Hubble CLI and its observability data to optimize our network traffic, find out issues, and resolve them.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
Cilium Onboards Customers Faster
Cilium has now become an integral part of WSO2’s Choreo platform, providing all-around solutions to their business security, networking, and observability needs. This allows the platform team to onboard customers to Choreo more quickly.
“Since we are providing an Internal Developer Platform to our customers, zero trust security is key for them to come onboard into our system. They have to make sure that we are protecting their data and isolating their runtime environment. These directly impact our business in terms of onboarding customers and Cilium plays a major role in making this possible. With Cilium, all our customers can make sure their data is protected even though they are using shared resources within our cloud.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
Since Cilium is already a key part of their platform, WSO2 is exploring additional uses for Cilium. They are currently implementing egress control and service authorization within their cell-based architecture. Additionally, they are evaluating Tetragon for runtime enforcement. WSO2 is also considering the adoption of more Cilium service mesh features and evaluating using the Cilium Node proxy as an internal API gateway.
“To further increase the security of our platform, we are currently examining Tetragon for runtime security, egress control and fine-grained access control for interservice communication, and mutual authentication with Cilium Service Mesh.
We are also looking to introduce a few more service mesh features such as retries, circuit breakers, and advanced deployment strategies with traffic partitioning using Cilium’s Gateway API implementation. Finally, we started experimenting with using the Cilium node proxy as an internal API gateway. We are a company that provides API management as a solution, so we have an Envoy proxy as an API gateway. Since Cilium already has an Envoy node proxy, we are looking at doing a POC where we can reuse the node proxy as a gateway itself without running a separate one.”
Lakmal Warusawithana, Senior Director – Cloud Architecture, WSO2
If you would like to read more about WSO2’s Choreo, cell-based architecture and how Cilium fits in, you can read about it in this blog.