Guest post originally published on Lightrun’s blog by Tom Granot, Director of Developer Relations at Lightrun
KoolKits (Kubernetes toolkits) are highly-opinionated, language-specific, batteries-included debug container images for Kubernetes. In practice, they’re what you would’ve installed on your production pods if you were stuck during a tough debug session in an unfamiliar shell.
We created a quick, 2-minute explanation of the project if you prefer that to the written word:
To briefly give some background, note that these container images are intended for use with the new kubectl debug
feature, which spins up Ephemeral containers for interactive troubleshooting. A KoolKit will be pulled by kubectl debug
, spun up as a container in your pod, and have the ability to access the same process namespace as your original container.
Since production containers are usually rather bare, using a KoolKit enables you to troubleshoot with power tools instead of relying on what was left behind due to the generosity (or carelessness) of whoever originally built the production image.
The tools in each KoolKit were carefully selected, and you can read more about the motivation behind this entire project below.
If you just want to take a look at the good stuff, feel free to check out the full project on GitHub.
Debugging Kubernetes is Hard
It’s not trivial to understand what’s going on inside a Kubernetes pod:
Multi-instance (i.e. a networking hell) – In order to effectively debug a Kubernetes-based application, you need to be able to access multiple replicas of the same application running in multiple pods/nodes, and multiple applications running on different pods/nodes. In order to get a shell to the specific application that ran the specific bit of code you wanted and debug it, you’d need to identify it among the crowd, and even then – are limited to a specific pod since shells are 1-1.
Images without tools / infra – The best practice in the cloud-native world is to ship “thin images” – container bases that contain as little bloat as possible, to ensure that pull times are fast. This means, though, that many of these containers lack basic tooling – some of them basically contain nothing, not even a shell – which makes debugging practically impossible on a shell level.
Placing a breakpoint can cause cascading failures – Note that, generally speaking, distributed applications tend to create services that rely on one another – a web service will rely on some bootstrap service, another service calls the web service and thus relies on it and so forth. If one service fails, all the other services that rely on it fail too. When placing a breakpoint inside a service that other services rely on we’re in the risk of a cascading failure – one service that kills a service that relies on it kills another service that relies on it and so forth.
Debugging can change state – Using a debugger you can change various values in the application and modify the state. This can have catastrophic consequences on production applications that should be removed from any internal interference – all state modifications should occur via inputs from the outside and the logic applied on them, not by a developer interfering with the application.
Privacy Considerations – Most tooling on the market does not prevent a developer from being exposed to private, sensitive data. Instead, developers can add breakpoints in locations that allow them to view credit card information, personal health data and the like.
The tooling in the space also leaves a lot to be desired – tools that connect your local environment to the remote cluster for development purposes often allow you to only work on one service at any given point in time – you can connect to remote services, but you’re only developing 1 service at a time. In addition, service mesh troubleshooters rely on, well – a service mesh, which not all clusters have nowadays.
The Motivation Behind KoolKits
We understood early on that packing a punch by using the right tools is a great source of power for the troubleshooting developer – and we figured we’d find a way to give back to the community somehow – and that’s how we came up with the idea for KoolKits.
Let’s dive deep for a second to explain why KoolKits can be pretty useful:
As we mentioned before, there’s a well-known Kubernetes best practice that states that one should build small container images. This makes sense for a few different reasons:
- Building the image will consume less resources (aka CI hours)
- Pulling the image will take less time (who wants to pay for so much ingress anyways?)
- Less stuff means less surface area exposed to security vulnerabilities, in a world where even no-op logging isn’t safe anymore
There’s also a lot of tooling in existence that helps you get there without doing too much heavy lifting:
- Alpine Linux base images are super small
- DistroLess Docker images go a step further and remove everything but the runtime
- Docker multi-stage builds help create thin final production images
The problem starts when you’re trying to debug what’s happening inside those containers. By using a small production image you’re forsaking a large amount of tools that are invaluable when wrapping your head around a problem in your application.
By using a KoolKit, you’re allowing yourself the benefits of a small production image without compromising on quality tools – each KoolKit contains hand-picked tools for the specific runtime it represents, in addition to a more generic set of tooling for Linux-based systems.
P.S. KoolKits was inspired by kubespy
and netshoot
.
Considerations
There’s quite a few decisions we made during the construction of these images – some things we took into consideration are listed below.
Size of Images
KoolKits Docker images tend to run, uhm, rather large.
KoolKits are intended to be downloaded once, kept in the cluster’s Docker registry, and then spun up immediately on demand as containers. Since they’re not intended for constant pulling, and since they’re intended to be packed with goodies, this is a side effect we’re willing to endure.
Using Ubuntu base images
Part of the reason it’s hard to create a really slim image is due to our decision to go with a full Ubuntu 20.04 system as the basis for each KoolKit. This mainly came from our desire to replicate the same environment you would debug with locally inside your clusters.
For example, this means no messing around with Alpine alternatives to normal Ubuntu packages you’re used to working with. Actually, this means we have a way of including tools that have no Alpine versions in each KoolKit.
Using language version managers
Each KoolKit uses (wherever possible) a language version manager instead of relying on language-specific distros. This is done to allow you to install older runtime versions easily, and in order to allow you to swap between runtime versions at will (for example, to get specific versions of tooling that only exist for specific runtime versions), as need be.
Available KoolKits
Each of the folders in the repo contains the Dockerfile behind the KoolKit and a short explanation of the debug image. All KoolKits are based on the ubuntu:20.04
base image, since real people need real shells.
The list of available KoolKits:
koolkit-jvm
– AdoptOpenJDK 17.0.2 & related tooling (includingjabba
for easy version management and Maven 3.8.4)koolkit-node
– Node 16.13.1 & related tooling (includingnvm
for easy version management)koolkit-python
– Python 3.10.2 & related tooling (includingpyenv
for easy version management)
Note that you don’t actually have to build them yourselves – all KoolKits are hosted publicly on Docker Hub and available free of charge.
KoolKits Coming up
- A whole new, Go 1.17.7 KoolKit
- JVM KoolKit –
jvm-profiler
,jHiccup
support - Node.js KoolKit –
llnode
,thetool
support - Python KoolKit –
vardbg
,memprof
support
Contribution
We’d be more than happy to add tools we missed to any image – just open a pull request or an issue to suggest one.
KoolKits was created by Lightrun – a Developer Observability Platform. Using Lightrun, developers can add new, read-only logs, metrics and traces in real time to live applications running locally or in production – without redeploying or stopping the application. Lightrun works on Kubernetes out of the box, and allows multi-pod, multi-cluster debugging without port-forwarding or service mesh wizardry. Get Lightrun in your IDE today for free.