Guest post originally published on the ARMO blog by Ben Hirschberg
DevOps and modern engineering have enabled us to provide higher quality code at greater speeds by introducing guardrails and checks into our automated continuous integration (CI) and continuous deployment (CD) processes. However, with security becoming a more pressing matter as more critical zero-day threats arise and misconfigurations are introduced to production systems, while at the same time as application and development processes all moving to more automated CI/CD processes––this is becoming a critical point for enforcing security validations and checks.
With DevSecOps coming a long way as a discipline, there are now great frameworks and best practices for applying security gates in your CI, and later CD.
We’d like to provide a practical guide for doing so in cloud-native environments, and particularly Kubernetes environments. These security gates will enable you to detect early and prevent known security issues from reaching production, such as documented vulnerabilities, common misconfigurations, permissions, and others. CI/CD Security gates include not only the technical controls that need to be applied but also the way we think about security posture, risk appetite, and how to continuously enforce cloud-native security in an ongoing manner.
We believe that security can be embedded as early as your first line of YAML, upon pulling your container images, scanning, deploying, and then even in ongoing post-deployment continuous security monitoring; all with open source tooling. In addition, with the ability to leverage widely adopted developer tooling from VSCode, CLIs, and GitHub Actions, all of these checks can be included from within existing developer workflows and context.
CI Security Hygiene Needs a Hard Reset
CI security receives far less attention than other areas of application and cloud security. Let’s not even talk about security, let’s take quality as an example. You have an endless number of linters or compilers that will return warnings if there are errors in the code. There are even grammar-checking tools for README files (A LOT!), probably even more than those that check for security. In the same way, we as developers, are obsessed with our code and grammar hygiene, how are we not equally concerned with our CI security hygiene?
As we all know and have heard many times through the “shift left” manifesto, the earlier we apply security in the process the better this will be from every kind of perspective––cost, effort, and time. However, likely the most critical piece is catching issues early before they propagate and manifest to production, but also with sufficient context (for non-security experts) to actually be able to do something meaningful about it.
There is also a common misconception that security needs to be a heavy lift, and bog down development processes, and I’d like to bust some of these myths and enable you to adopt security as you write your code, through your CI and CD, rinse and repeat.
Security Gate #1: Prevention During Coding
Security hygiene starts as early as the first lines of code you write – whether it’s JSON, YAML, Helm charts, or anything else, there are well-known misconfigurations and vulnerabilities that can be caught during coding.
Kubescape VSCode Extension
With the open-source Kubescape VSCode extension, you receive in-editor notifications while writing YAML about potential security issues by marking the relevant lines in your manifest files. This saves the extra step of having to scan your config files after you have already completed the work and enables developers while still in context to make immediate changes and edits.
In addition, these alerts can also enforce security hygiene based on known and popular security frameworks like NSA-CISA and MITRE ATT&CK and other K8s-related compliance frameworks. You can also build an internal or custom security framework, that contains common misconfigurations that can be custom-defined and continuously updated, such as avoiding privilege escalation through least-privilege practices, and best practice resource limitations.
By proactively adding security controls into your coding, you can minimize the number of vulnerabilities your scanning tools will later find, and lower the noise level. This matters in the context of security scanning tools, as they will always find misconfigurations and vulnerabilities––and this creates a lot of alert fatigue and what we call “CVE Shock”. This overwhelms developers – not knowing where to get started with remediating these known vulnerabilities and issues.
Security Gate #2: Detection Through Code Repository Scanning
After we’ve written the configuration code, the natural next step is to push it through our CI (usually via the CLI) and try to get our Pull Request (PR) merged into our codebase. However, a good security practice is to nonetheless scan your public and private code repositories and your container image registries before deploying to production, as there are always additional security considerations with your stack, tools, supply chain, and registries.
Kubescape CI Config & CLI
This is where the Kubescape CLI comes in handy.
The Kubescape CLI can scan any of the common Kubernetes / CI configuration files including YAML, JSON, and even their output in the XML & JUnit formats. It can also scan your config based on common and well-known security frameworks like MITRE, NSA, or CIS, or by your own custom-defined security framework of choice.
Due to the fact that the threat and vulnerability landscapes are constantly growing, be forewarned, it will always find vulnerabilities (sometimes many), that are then categorized by severity. This is when good security practices and processes are important.
We’ll start by saying that you DO NOT need to resolve and remediate every issue immediately, but a security best practice is to acknowledge all of the vulnerabilities and to create exceptions when you choose not to fix them. This way there is a log for the process, as well as a decision owner, and full context to the vulnerability.
This means that whenever you run a security scan, you should always do the following once there are findings:
- Acknowledge the findings (in a log)
- Fix or Ignore them
In addition to this, you can also define many different parameters that enable you to make the right fix / ignore security decisions, such as:
- Risk threshold
- When to fail/pass a build (based on severity or the risk threshold defined)
- Custom frameworks
- Exceptions
This is the GitOps way to apply security from a CI/CD perspective and enables the ongoing prioritization of security threat remediation based on how lucrative the fix is.
Kubescape & GitHub Actions
GitHub Actions are becoming a popular way to configure simple pipelines and CI processes for GitHub repositories. Therefore, in the same way, that you can leverage Kubescape, you can also utilize GitHub Actions.
On top of the same features and capabilities you receive from the CLI, the GitHub Actions support also makes it possible to receive visualizations and graphical representations right inside your GitHub repository. It also enables the CI security scanning to happen directly inside developer workflows and context, and not need to add any additional solutions.
Security Gate #3: Detection Through Container Image Registry Scanning
One commonly accepted security best practice is ensuring that any resources we take off the public web are thoroughly scanned, to ensure the end-to-end security of everything in our software supply chain. And this isn’t limited to just our code imports and packages, but also to the containers we use and the registries we employ.
Another thing we have learned along the way is that if security isn’t embedded into existing processes, it very often simply won’t happen. That is why you need to have container image registry scanning that enables you to scan images directly from their registries (e.g. ECR, GCR, quay.io and more) before they are deployed and run in the cluster.
This is an incredibly important security guardrail that can be easily applied and automated, to ensure that vulnerabilities are detected during the development process from third-party registries, and like the previous guardrails––have some mechanism in place to prevent vulnerabilities from reaching deployments and production environments from this vector, as well.
This leads us to our next guardrail, as security is never simply a one-off scan.
Security Gate 4: Continuous Security Post-Deployment
We’ve spoken about good practices and tooling when it comes to CI security, but security is a living, breathing, and continuous process, and it doesn’t end once your PR is merged. We need to have the proper guardrails in place, to monitor for ongoing threats as they emerge.
Therefore, while we scanned our registries during our CI security process, what happens if threats emerge for container registries and configurations already running in production? This is where CD security good practices are required.
By proactively requiring security scanning upon a compelling event (e.g. if someone changed something in the system), according to a predefined schedule (daily, weekly, monthly), or when a security vulnerability is discovered––you can detect if the CVE exists in your production systems or discover potential configuration drift. This means you don’t have to have already strapped security engineers continuously monitoring for security threats, you can create a greater sense of ownership for security on the entire team, through continuous and ongoing security that will alert you to new threats in your running operations, without any specific security domain expertise.
By ensuring CI security starts as early as the code, then adding gates before deployment and more checks post-deployment you can provide end-to-end security controls throughout your CI/CD processes, and minimize the potential of vulnerabilities leaking into production systems. When we apply security controls early, and natively into development processes, we create shared ownership and understanding that security cannot be an afterthought with the complexity of cloud-native systems.