Member post by Kyuho Han, SK Telecom

Background : Age of collaboration

Since the World Economic Forum (WEF) 2021, The great reset of our society through digital transformation has been accelerating.

In Korea, digital transformation is accelerating not only in the tech industry such as games, search, and telecommunications, but also in traditional industries such as education, real estate, and finance. In particular, in the financial industry, various financial services based on data analysis can be provided by various entities through the MyData project, which was revamped in 2021.

“MyData is a policy initiated by the South Korean government that enables individuals to directly manage and utilize their own data. The initiative aims to return data sovereignty to individuals, enabling them to receive a variety of customized services. As such, service providers are required to pass personal data to third parties via standardized APIs upon customer request.”

In the traditional industry, it is not possible to develop services with only its own manpower, so various development entities need to develop services in the form of micro services and cooperate. Kubernetes is fulfilling the role of PaaS for building micro services in this era.

SK Telecom is providing TKS (SkTelecom Kubernetes Service) to traditional business fields such as finance and broadcasting.

  1. Efficient use of resources
  2. Increased operational efficiency through standardization of operational functions such as fail over, auto scaling, etc.
  3. Unified visibility into infrastructure and services

New Challenge: Distributed and implicit security configurations

As services grow and become more personalized, more and more microservices are being developed, deployed, and operated. This naturally led to the demand for multi-tenant configurations, where a single Kubernetes cluster is shared by micro services developed by multiple development entities, sometimes with different development and operational entities.

In most cases, many Kubernetes clusters are created and managed based on geo-redundancy and purpose (DEV/STG/PRD). Therefore, it is necessary to apply different security policies for each micro service and each Kubernetes cluster purpose.

As a basic feature, TKS supports user management with multi-cluster support, RBAC by user/group, and single sign on (SSO) to the dashboard, various services, and Kube APIs.

However, this feature alone does not allow for more granular control. Therefore, various security guides were created, and based on these guides, CI/CD pipeline, Kubernetes RBAC, and various IT services needed to be configured to comply with the security guides, some of which were implemented in the form of guidelines for developers and operators to follow.

Although this configuration seems to solve everything, there was a high possibility of hard-to-find security threats due to misunderstanding of security guides or incorrect configurations for individual systems. In addition, applying security policies as a duty of developers and operators put the burden on the development and operation personnel, making it difficult for them to fully utilize their individual capabilities.

Especially difficult was the fact that the security configuration that interpreted the security policy were distributed across multiple IT systems, which not only made it difficult to manage configurations, but also required interworking with multiple IT systems to establish a unified monitoring and auditing system for security violations. This increase in management points negatively affected the maintenance of the system.

Solution: Kubernetes admission control as a Code

Before we explain the solution we chose, let’s start with the image we have in our minds. Many construction sites or factories use Travel Restraint Systems. These are devices that protect workers from hazards by restricting their radius of motion. They are very simple, intuitive, and minimally restrictive of the worker’s movement. They can also provide safety without much effort or awareness on the part of the worker.


I think it’s similar to building micro services with K8S, where various stakeholders collaborate with each other to build micro services. To innovate, we need to ensure maximum convenience for developers/operators. We also need to provide them with a system that ensures they don’t breach security and create security holes without even realizing it.

We call this system a governance system. The requirements of a governance system can be summarized in three main points.

Clear policy descriptions

During the policy formulation phase, it’s important to clearly state what the policy is – you don’t want people looking at the policy and having different interpretations. However, a clear policy description may not express the intent of the policy. We struggled with the choice between clarity and intent. In the end, we decided that intent can lead to differences in interpretation, which can be confusing for developers/operators, so we prioritized clarity.

In the end, we decided to adopt a form of Policy as a Code that clearly describes the policy itself as code.

Accurate policy enforcement

The existing system works by applying a defined policy to multiple systems. The settings of each system are designed for the unique purpose of that system. Therefore, no matter how clear a policy is described, there is a possibility of a gap in translating the policy into settings for the systems to which the policy should be applied. The size of this gap increases with the number of policy setpoints (individual systems), and the possibility of malfunction due to incorrect settings increases.

Therefore, the best practice is to have a single policy enforcement target. With Kubernetes, everything runs through the Kube API, so you can control everything right before the final execution through the extension of Admission provided by the Kube API Server.

Visibility

After policies are applied, you need to observe how well they are adhered to, so you can remove unnecessary restrictions and continue to add necessary ones. The structure for visibility should be built on the same principles as the structure for accurate policy enforcement. Fewer points of policy enforcement, ideally a single point of policy enforcement, makes it simpler to monitor policy enforcement.

Below is our proposed governance system.

Image

Open Source

The Kube API Server in Kubernetes provides an extension of functionality through the Admission webhook.

Image

Two representative SWs that can provide governance in the form of policies as code via Admission Webhooks are OPA Gatekeeper and Validating Admission Policy, which is available as a stable feature in Kubernetes 1.30.

The 2 solutions can be briefly compared as follows.

Both technologies enforce policies through the deployment of Kubernetes Custom Resources, so you can manage the deployment of policies through a general pipeline like GitOps.

OPA GatekeeperValidating Admission Policy
Policy as a CodeRego
FeaturesValidatingMutatingAudit
External Data Support
Rich policy library
Policy as a CodeCEL
FeaturesValidating

We were able to develop 20 policies in advance through interviews with our customers. Looking at the content of the 20 policies, there was a requirement for not only validation but also mutating. There was also a requirement for validation in conjunction with external data. Therefore, we selected OPA Gatekeeper as the final solution.

As mentioned earlier, the products/services that organizations offer, the technology they use, and the way people work are constantly changing, so policies need to be constantly changing as well. We need the ability to easily change and create policies, and easily import best practices.

There is a learning curve with OPA Gatekeeper as it requires you to know both the less popular language of Rego and Kubernetes, so further development is needed to make editing/creating policies easier and the ability to update policies naturally.