Guest post originally published on the Magalix blog by Mohamed Ahmed
What Is OPA?
It’s a project that started in 2016 aimed at unifying policy enforcement across different technologies and systems. Today, OPA is used by giant players within the tech industry. For example, Netflix uses OPA to control access to its internal API resources. Chef uses it to provide IAM capabilities in their end-user products. In addition, many other companies like Cloudflare, Pinterest, and others use OPA to enforce policies on their platforms (like Kubernetes clusters). Currently, OPA is part of CNCF as an incubating project.
What Does OPA Bring To The Table?
You may be wondering: How did OPA come about? What problems does it try to solve? Indeed, policy enforcement over APIs and microservices is as old as microservices themselves. There’s never been a production-grade application that didn’t enforce access control, authorization, and policy enforcement of some kind. To understand the role of OPA, consider the following use case: your company sells laptops through an online portal. Like all other similar applications, the portal consists of a front-page where clients see the latest offerings, perhaps some limited-time promotions. If customers want to buy something, they need to log in or create an account. Next, they issue payments through their credit cards or other methods. To make sure your clients repeatedly visit, you offer that they sign up for your newsletter, which may contain special discounts. Also, they may opt to receive browser notifications as soon as new products are announced. A very typical online shopping app, right? Now, let’s depict what that workflow would look like in a diagram to visualize the process:
The diagram above shows how our system might look internally. We have a number of microservices that communicate with each other to serve our customers. Now, obviously, Bob shouldn’t see any of the internal workings of the system. For example, he can’t view (or even know about) the S3 bucket where payments get archived, or which services the notification API can talk to. But, what about John? He’s one of our application developers and he needs to have access to all the microservices to be able to troubleshoot and debug when issues occur. Or, does he? What if he accidentally (or intentionally) made an API call to the database service to change the delivery address of the customer to somewhere else? Even worse, what if he had read permissions to the customers’ credit card numbers? To address those risks, we place an authorization control on top of each of our microservices. The control checks whether or not the authenticated user has the required privileges to perform the requested operation. Such an authorization system may be an internal, home-grown process or external as provided by AWS IAM. That’s how a typical microservices application is built and secured. But look at the drawbacks of using several assorted authorization systems especially as the application grows:
- Modifying existing policies, or introducing new ones, is a nightmare. Just think of how many places you’ll need to visit to give Alice read access to all storage-related systems. This means S3, MySQL, MongoDB, and perhaps an external API to name a few.
- There’s no way for developers to enforce policies on their own systems. They can obviously hardcode their authorization logic in the application, but that makes things intricately worse: trying to unify policies among different microservices is highly complicated.
- Adding to the previous point, introducing a new policy to local services may require changing the code and, thus, introduce new versions of all the microservices.
- What if you want to integrate policies with an existing user database? For example, integrating with the HR database.
- We’ll need to visualize the policy to ensure that it’s doing what it’s supposed to do. This becomes of increasing importance as your policies get more complex.
- Modern systems comprise multiple technologies and services which are written in different languages. For example, you may have the core of your system running on Kubernetes, and a bunch of legacy APIs that are not part of the cluster written in Java, Ruby, and PHP. Each platform has its own authorization mechanism.
Let’s look at Kubernetes as an example. If all users were authorized access to the entire cluster, lots of nasty things can happen such as:
- Giving unlimited requests and limits to all the pods may cause random pods to get evicted from the nodes.
- Pulling and using untested, haphazard images that may contain security vulnerabilities or malicious content.
- Using Ingress controllers without TLS, allowing unencrypted, unsecured traffic to the application.
- Numerous other unforeseen risks due to the overall complexity.
You can definitely use RBAC and Pod security policies to impose fine-grained control over the cluster. But again, this will only apply to the cluster. Kubernetes RBAC is of no use except in a Kubernetes cluster.
That’s where Open Policy Agent (OPA) comes into play. OPA was introduced to create a unified method of enforcing security policy in the stack.
How Does OPA Work?
Earlier, we explored the policy-enforcement strategies and what OPA tries to solve – that showed us the “what” part. Now, let’s now take a look at the “how.”
Let’s say that you’re implementing the Payments service of our example application. This service is responsible for handling customer payments. It exposes an API where it accepts payment from the customer. It also allows the user to query which payments were made by a specific customer. So, to obtain an array containing the purchases done by Jane, who is one of the company’s customers, you send a GET request to the API with the path /payment/jane. You provide your credential information in the Authorization header and send the request. The response would be a JSON array with the data you requested. However, since you don’t want just anyone with network access to have access to the Payments API to see such sensitive data, you need to enforce an authorization policy. OPA addresses the issue in the following way:
- The Payments API queries OPA for a decision. It accompanies this query with some attributes like the HTTP method used in the request, the path, the user, and so on.
- OPA validates those attributes against data already provided to it.
- After validation, OPA sends a decision to the requesting API with either allow or deny.
The important thing to notice here is that OPA decouples our policy decision from policy enforcement. The OPA workflow can be depicted in the following diagram:
OPA is a general-purpose, domain-agnostic policy enforcement tool. It can be integrated with APIs, the Linux SSH daemon, an object store like CEPH, etc. OPA designers purposefully avoided basing it on any other project. Accordingly, the policy query and decision do not follow a specific format. That is, you can use any valid JSON data as request attributes as long as it provides the required data. Similarly, the policy decision coming from OPA can also be any valid JSON data. You choose what gets input and what gets output. For example, you can opt to have OPA return a True or False JSON object, a number, a string, or even a complex data object.
OPA Internals
To fully understand OPA and start implementing it in your own projects, you must familiarize yourself with its features and components. Let’s start with how you define your policies.
Policy Language: Rego
Rego is a high-level declarative language that was built specifically for OPA. It makes it very easy to define policies and address questions like: is Bob allowed to perform a GET request on /api/v1/products? Which records is he actually allowed to view?
Deployment
When it comes to deploying OPA, you have more than one option depending on your specific scenario:
- As a Go library: if your application is written in Golang, you can implement OPA as a third-party library in the application.
- As a daemon: if you’re not using Go, then you can deploy OPA just like any other service, as a daemon. In this case, it’s recommended that you use a sidecar container or run it on the host level. The reason is that this design increases performance and availability. Imagine that you have OPA deployed in Kubernetes in a separate pod that happens to live on a separate node than where your application pod is running. Now, every time your service needs to consult OPA for a policy decision, it has to make a call over the network to reach the pod where OPA is running. This introduces unneeded latency and may cause application lags at peak times.
How To Manage And Control OPA?
To further reduce latency, the designers decided that OPA should keep all the policy data in memory. This ensures that OPA is not going to query another service to request the data. To deal with OPA, you have a set of APIs that serve different purposes:
- Bundle service API: used for sending policy data to OPA. OPA continuously polls the Bundle service API searching for new versions of the policy. Once found, it pulls and applies the new version.
- Status service API: used for determining the status of the service. It tells you the current policy version that’s active on OPA.
- Decision log service API: every time OPA makes a policy decision, it logs it. Later on, it sends batches of those logs to the log service API. This is particularly useful for audit and troubleshooting purposes.
- Tools for building, testing, and debugging policies: a set of command-line tools that you can use like opa test, opa run, opa checks, etc. There’s also a VS Code plugin available to ease development.
Your First OPA Policy
By now you should have a pretty clear picture why OPA came to existence, the problems it tries to solve, and the way it was designed and managed. It’s time to test the waters and see what it’s like to create a policy in the Rego language. The first step is to define your policy in plain English. For example:
“Customers should be able to view their own payments. Financial department staff should be able to view any customer payment.”
The next step is to convert the policy to the Rego code. We can use the Rego playground for this. So, in the main panel, clear the code that was already added there and add the following:
package play
# Customers should be able to view their own payments
allow = true {
input.method = "GET"
input.path = ["payments", customer_id]
input.user = customer_id
}
Let’s review this snippet line by line:
- Any lines that start with the hash sign (#) are comments. It’s always a good practice to write what your policy is supposed to do as a coherent, human-readable comment.
- allow = true means that the decision would be allow if the following ‘evaluations’ are true.
- The input method is GET. Any other HTTP method (POST, PUT, etc.) will violate the policy.
- The path is /payments/customer_id. Notice that the customer_id is not quoted, which means that it’s a variable that needs to be substituted at invocation time.
- The user should also be the customer_id.
If we were to translate this code back to plain English, it’d look something like:
“Allow the request if the method it uses is GET, the path is /payments/customer_id, and the user is the same customer_id. Which effectively allows a customer to view her own payment data.”
The Rego playground also allows you to evaluate your code and make sure that the policy will work as expected. In the INPUT panel, we can fake a legitimate request by adding the following code:
{
"method": "GET",
"path": ["payments","bob"],
"user": "bob"
}
Notice that the INPUT is using arbitrary JSON. There are no specific rules to follow when supplying the request. Now, let’s see how OPA would reply in response to this decision request by pressing the Evaluate button. The OUTPUT panel should display something as follows:
{
"allow": true
}
Below is a screenshot of the playground after performing the above steps:
Now, let’s try and change the user in the request to be alice, which entails that a customer is trying to view the payments of another customer. If we press Evaluate you will notice that the output displays an empty JSON object {}. The reason for that, OPA doesn’t know what to send when the policy doesn’t match. To change this behavior, add the following statement before the body of the policy:
default allow = false
So, the whole policy should look like this:
package play
# Customers should be able to view their own payments
default allow = false
allow = true {
input.method = "GET"
input.path = ["payments", customer_id]
input.user = customer_id
}
Now, if you press Evaluate you’ll see the expected output:
{
"allow": false
}
Notice that the playground is so powerful that you can select parts of the policy and evaluate it independently from the rest of the policy. This can be super useful when you have a complex policy that evaluates to false when it shouldn’t. In that case, you can select portions of the policy and see where exactly the flaw occurs.
Okay, now that we’ve executed the first part of our policy, let’s move on to the second part: the financial department staff should be able to view any customer payment.
Add the following lines after the policy that we defined earlier:
# Financial department staff can view any customer payments
allow = true {
input.method = "GET"
input.path = ["payments", customer_id]
finance[input.user]
}
finance = {"john","mary","peter","vivian"}
Most of this policy is similar to the previous one, except at line 4. Instead of evaluating whether the user ID is the same as the customer ID, we evaluate if the user is part of the finance JSON object. Rego has a lot of built-in constructs that allow you to do many helpful things including lookups. Finally, we define the finance object and add the usernames for the staff who work within that group. In a real-world scenario, this JSON object would be passed as part of the INPUT request or as a token. Now, let’s test the policy by setting the user and the customer to the same name (for example, bob). The policy should return true. Change the user to be john (who is part of the finance department) and test the policy. Again, it should return true. Finally, change the user to be any name that does not work in the finance department (let’s say, jane), and the policy should return false.
You can read more about the Rego language and what you can do with it by referring to the official documentation.
Integrating OPA With Other Systems
As mentioned before, OPA can be integrated with many of today’s platforms. Let’s take a look at a few examples of what OPA can do for you:
Kubernetes:
- Ensure that ingress hostnames are only changed by the front end team.
- Deny pulling any images except the ones coming from the corporate Docker registry.
- Enforce using requests and limits for any pods that are created on the cluster.
API Authorization:
- You can use OPA with Envoy, Istio, and other platforms to enforce IAM controls. For example, you can easily control security staff access to sensitive data.
Linux PAM:
- Pluggable Authentication Modules (PAM) have long been used in Linux to provide fine-grained controls to multiple services including SSH and sudo. OPA has a PAM plugin that enables it to integrate with PAM and enforce policies. For example, you can restrict SSH access to your production machines unless in non-working hours or unless the user has a support ticket open.
There are also many other products that can be integrated with OPA to provide endless possibilities. For example, Kafka, ElasticSearch, SQLite, and CEPH, to name a few.
TL;DR
- The need for authorization is as old as software itself.
- The absence of a central authorization system that can be used among different systems and platforms has caused many problems. For example, APIs may have their own authorization logic built into code, other microservices may depend on one or more external authorization systems. This makes rolling out new policies, checking the version of the existing one, or even introducing minor changes a very challenging operation.
- OPA works by acting as a consultant to whichever service that needs to make an authorization decision. The service makes a decision inquiry to OPA, then OPA reviews the inquiry based on the policies already stored in it and responds with the decision.
- OPA was designed to be general-purpose and platform-agnostic. Accordingly, you don’t have to follow any rules when sending inquiries or defining what the output looks like. Requests and responses are sent in JSON format.
- Policies may be written in Rego, a language designed specifically for OPA. Resembling JavaScript, OPA makes it very easy to convert plain English rules to valid OPA policies.
- The Rego playground tool is an excellent way to try your policies before implementing them.
- OPA can be deployed either as a Go library that becomes part of the application binary or as a standalone daemon.
- Since policy decision-making happens at each API request, it’s highly recommended that the OPA daemon is placed as tightly as possible to the application. For example, as a sidecar container in a Kubernetes pod, or as a daemon running on the node itself. This practice helps reduce latency and reduce network traffic.
- OPA uses a number of APIs that make it easy to inject new policies, check the version and status of the existing ones, or collect audit and log data.
- In this article, we looked at a simple demonstration of how to use the Rego language to enforce a policy. In the example, we showed how easy it is to enforce a complex policy and test it through the Rego language and the playground tool.
- OPA can integrate with many modern-day systems and platforms like Kubernetes, Kafka, SQLite, CEPH, and Terraform. Through the PAM plugin, it can also integrate with the Linux PAM to enforce advanced policy controls on Linux daemons that use PAM (e.g., sshd and sudo).
To fast-track your adoption of policy as code with OPA, check out Magalix KubeAdvisor and its simple markdown interface for Open Policy Agent, and try a 14-day free trial.