Guest post originally published on the Rookout blog by Liran Haimovitch, co-founder and CTO of Rookout
As the world is adapting to new and unforeseen circumstances, many of the traditional ways of doing things are no longer. One significant effect of this is that the world has gone almost completely virtual. Whether it’s Zoom happy hours and family catch ups or virtual conferences, what used to be in-person has digitized. Before the world seemingly turned upside down a few months ago, I was meant to speak at a conference about our experience at Rookout running Jenkins on Kubernetes these past few years. Yet, alas, it was not meant to be. So, I figured I could impart whatever I have learned thus far, here (digitally! ;)), with you all. The world is going virtual, so what better way to connect over learned experiences, right?
Jenkins and Kubernetes
Why would you go about running Jenkins on top of Kubernetes?
The TL;DR of why we chose Jenkins is that we needed a high degree of control over build-processes and the code-reusability enabled by Jenkins Pipelines (in the time since we have made that choice, CircleCI and GitHub Actions have made great progress in meeting some of our requirements). You can find the full details of that specific journey in this blog post, but let’s focus on this one.
Running Jenkins on top of Kubernetes takes away most of the maintenance work, especially around managing the Jenkins Agents. The Jenkins Kubernetes Plugin is quite mature, and using it to spin up agents on demand reduces the maintenance costs of the agents themselves to virtually nothing.
The Ugly Parts
While we greatly enjoy the day-to-day benefits of this setup, such as fast build times, highly customizable CI/CD processes, and little to no maintenance, getting it up and running was far from a trivial task.
Along the way, we ran into various limitations of both Jenkins and Kubernetes, looked ‘under the hood’ and discovered little known nuggets of knowledge. By sharing them here, with you, I hope your own deployment experience will go much smoother.
The Deployment Process
The easiest way to get Jenkins deployed on Kubernetes Cluster (we are using a dedicated cluster for Jenkins, but that’s not necessary) is to build your own Helm chart (if you are not familiar with Helm, check it out) relying on existing helm charts as dependencies and adding any additional resources you might need.
The first chart dependency you’ll add is, quite obviously, Jenkins itself, and we chose this helm chart from the stable helm repository. The most important configuration options to define are:
- Make sure to pass a Persistent Volume Claim as ExistingClaim to the Jenkins Persistence configuration.
- Figure out the amount of memory your Jenkins Master requires based on the amount of jobs you are running and set the JVM arguments -Xmx, -Xms, and -XX:MaxPermSize in the hidden master.javaOpts argument (we use 8192m, 8192m, and 2048m respectively).
The most challenging part of running Jenkins on Kubernetes is setting up the environment for building container images. To do so, follow these three simple steps:
- Add a deployment and a service running the official Docker image for building containers docker:dind to your Helm chart.
- Mount a persistent volume to /var/lib/docker to make sure your layers are cached persistently for awesome build performance.
- Configure pod templates to use the remote docker engine by adding the DOCKER_HOST environment variable to point to the relevant pod (i.e. tcp://dind-service:2375).
Operational Considerations
The next step on your journey is to enable your team to access Jenkins while, at the same time, avoiding exposing it to the world. Jenkins has a multitude of plugins and configuration options and struggling to keep everything up to date and secure is nearly impossible.
We chose to handle that by having our ingress controller, HAProxy performs the OAuth2 authentication before passing any incoming requests to Jenkins. Follow this guide to configure the HAProxy OAuth2 plugin to use the OAuth2 Proxy Container. If you configure Jenkins to use the same OAuth2 identity provider (for instance using this plugin for Google Authentication), your team will only have login once. Alternatively, you can always get a commercial, off the shelf solution such as Odo.
Once you have everything set up, you’ll want to make sure your Jenkins Master is being backed up regularly. The easiest way to achieve this is to use this neat little script.
Resources and Scaling
As I previously mentioned, we found that one of the biggest benefits of this approach is the ability to easily scale your resources on the fly. We use two separate Node Pools on our cluster, one for long-running pods such as the Jenkins Master, Ingress, Docker-In-Docker, and a second node pool for the Jenkins Agents and the workloads they are running.
For our master itself, we chose a single-master deployment for our Jenkins. This is running on a single node with 16 CPUs and 64GBs of RAM. This means that master upgrades and other unexpected events can lead to short downtimes. If you need a multi-master deployment, you are on your own 🙂
The second node pool is running the Jenkins Agents and their workloads and has auto-scaling enabled. To allow Kubernetes to smartly manage resources for that node pool, you have to make sure that you properly define the Kubernetes Resource Requests and Limits.
This has to be done in two separate configurations:
- Set the Jenkins Agents resources in the helm chart under agent.resources.
- Set the resources for the workloads themselves as part of the Pod Templates in the Jenkins Kubernetes Plugin.
Keep in mind that the second node pool is actually a great opportunity for cost savings and is the perfect candidate for Spot Instances, either directly or by leveraging Spot. As an additional benefit, when running on GKE we found that nodes’ performance deteriorated over time, probably due to the intensive scheduling of pods. When using Google’s Preemptible VMs that are automatically replaced every 24 hours (or less), we noticed significant improvements to the cluster reliability and performance.
It all boils down to…
In my work with both our customers and Rookout’s R&D team, I have found that deployments are often the bottleneck that is slowing down day-to-day operations and engineering velocity. I hope that by sharing with you a few of the lessons we learned running Jenkins on Kubernetes, you’ll now be able to improve your own CI/CD processes.
Having said that, it’s still important to note that adopting the right tools will enable you to do even more. So go forth and get started, I’m looking forward to hearing how your experience went!