Guest post originally published on the Logz.io blog by Dotan Horovitz, Logz.io

Kubernetes has become the de-facto industry standard for container orchestration. It provides the required abstraction for efficiently managing large-scale containerized applications with declarative configurations, an easy deployment mechanism, and both scaling and self-healing capabilities.

As with any system, logs help engineers gain observability into containers and the Kubernetes clusters they’re running on and the key role they play is evident in a lot of incidents featuring Kubernetes failures.  Yet Kubernetes poses a set of unique logging challenges.

Kubernetes is a highly distributed and dynamic environment. In production, you’ll most likely be running dozens of machines with hundreds of containers that can be terminated, restarted, or rescheduled at any point in time. This transient and dynamic nature of the system is a challenge in itself.

Kubernetes clusters also consist of multiple layers that need monitoring, each producing different types of logs.

Worried? Don’t be. Thankfully, there is a lot of literature available on how to gain visibility into Kubernetes. There are also various logging tools that integrate natively with Kubernetes to make the task easier. In this article, we’ll review some of these tools as well as review the Kubernetes logging architecture.

A Simple Example: Containerized application logging with Kubelet

Logging to stdout and stderr standard output streams

The first layer of logs that can be collected from a Kubernetes cluster are those being generated by your containerized applications. The best practice is to write your application logs to the standard output (stdout) and standard error (stderr) streams. You shouldn’t worry about losing these logs, as kubelet, Kubernetes’ node agent, will collect these streams and write them to a local file behind the scenes, so you can access them with Kubernetes.

Let’s take a look at an example pod manifest that will result in running one container logging to stdout:

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  containers:
  - name: example
    image: busybox
    args: [/bin/sh, -c, 'while true; do echo $(date); sleep 1; done']

Copy

To apply the manifest, run:

kubectl apply -f example.yaml

Copy

To take a look at the logs for this container:

kubectl logs example

Copy

The command calls kubelet service on that node to retrieve the logs. As you can see, the logs are collected and presented with Kubernetes. This is done for each container in a pod, across your cluster. Using kubectl for retrieving logs saves you from needing to access individual nodes in the cluster.

Kubectl can only show a single pod’s logs at a time. If you need to aggregate many pods into a single stream, you would need to use kubetail command, or higher level log aggregation and management tools that we will discuss later in this article.

Using a sidecar for logging

If your application does not output to stdout and stderr, then you can deploy a sidecar container alongside your application that will pick up the application logs and stream them to stdout and stderr respectively.

Such a sidecar pattern enables also performing some log manipulations, such as aggregating several log streams on the node into one, or separating one application log stream into several logical streams (each handled by a dedicated sidecar instance).

For persisting container logs, the common approach is to write logs to a log file and then use a sidecar container:

apiVersion: v1
kind: Pod
metadata:
name: example
spec:
containers:
- name: example
image: busybox
args:
- /bin/sh
- -c
- >
while true;
do
echo "$(date)\n" >> /var/log/example.log;
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: sidecar
image: busybox
args: [/bin/sh, -c, 'tail -f /var/log/example.log']
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: {}

As seen in the pod configuration above, a sidecar container will run in the same pod along with the application container, mounting the same volume and processing the logs separately.

Kubernetes logging architecture

As mentioned, one main challenge with logging Kubernetes is understanding what logs are generated and how to use them. In the following sections I will look into the node logging and the cluster logging.

Kubernetes Node logging

When a container running on Kubernetes writes its logs to stdout or stderr streams, they are picked up by the kubelet service running on that node, and are delegated to the container engine for handling based on the logging driver configured in Kubernetes.

In most cases, Docker container logs will end up in the /var/log/containers directory on your host. Docker supports multiple logging drivers but, unfortunately, Kubernetes API does not support driver configuration.

Once a container terminates or restarts, kubelet keeps its logs on the node. To prevent these files from consuming all of the host’s storage, a log rotation mechanism should be set on the node.

Kubernetes doesn’t provide built-in log rotation, but this functionality is available in many tools, such as Docker’s log-opt, or standard file shippers or even a simple custom cron job. When a container is evicted from the node, so are its corresponding log files.

Depending on what operating system and additional services you’re running on your host machine, you may need to take a look at additional logs. For example, in Linux journald logs can be retrieved using the journalctl command:

$ journalctl -u docker

-- Logs begin at Wed 2019-05-29 10:59:24 CEST, end at Mon 2019-07-15 10:55:17 CEST. --

jul 29 10:59:35 thinkpad systemd[1]: Starting Docker Application Container Engine...

jul 29 10:59:35 thinkpad dockerd[2172]: time="2019-05-29T10:59:35.285765854+02:00" level=info msg="libcontainerd: started new docker-containerd process" p

jul 29 10:59:35 thinkpad dockerd[2172]: time="2019-05-29T10:59:35.286021587+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc

As you can see in the above example, Docker container runtime writes its logs to journald. Other important Kubernetes system processes at the node level are kubelet, which also logs to journald, and kube-proxy, the network proxy that runs on each node, which logs to /var/log directory.

Logging kernel events might also be required in some scenarios. You might, for example, use Unix dmesg command to print the message buffer of the kernel to debug device drivers issues:

$ dmesg

[ 0.000000] microcode: microcode updated early to revision 0xb4, date = 2019-04-01

[ 0.000000] Linux version 4.15.0-54-generic (buildd@lgw01-amd64-014) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 (Ubuntu 4.15.0-54.58-generic 4.15.18)

[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-54-generic root=UUID=6e228d30-6415-4b41-b992-172d6899693e ro quiet splash vt.handoff=1

[ 0.000000] KERNEL supported cpus:

[ 0.000000] Intel GenuineIntel

[ 0.000000] AMD AuthenticAMD

[ 0.000000] Centaur CentaurHauls

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'

[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'

Kubernetes system components logging

In addition to kubelet and kube-proxy node services we covered earlier, there are control plane components on the level of the Kubernetes cluster itself that can be logged, as well as additional data types that can be used (events, audit logs). Together, these different types of data can give you visibility into how Kubernetes is performing as a system.

The following are the main system components of Kubernetes control plane:

Some of these components run in a container, and some of them run on the operating system level (in most cases, a systemd service).

The systemd services write to journald, and components running in containers write logs to the /var/log directory, unless the container engine has been configured to stream logs differently.

Kubernetes’ system components use Kubernetes’ logging library — klog — to generate their log messages. These system logs were not known to follow uniform structure, which made it difficult to parse, query and analyze. However, Kubernetes’ recent v1.19 release introduced a new option in klog for structured logging in text as well as in JSON format.

Structured logging provides a well-defined structure in klog native format, with a list of key-value pairs for the variant payload. Using the --logging-format=json flag enables JSON output.

It’s important to note that structured logging (both string and JSON options) is still in alpha per v1.19, with incremental adoption, so you may encounter early stage issues such as system logs that are still unstructured, log formatting changes, or klog flags which are supported for JSON. Check the documentation for updated feature status and information here.

Kubernetes events

Kubernetes events can indicate any Kubernetes resource state changes and errors, such as exceeded resource quota or pending pods, as well as any informational messages.

The command kubectl get events -n <namespace> returns all events within a specific namespace:

LAST SEEN   TYPE      REASON                  OBJECT                                  MESSAGE

4m22s       Normal    ExternalProvisioning    persistentvolumeclaim/mysql-pv-claim    waiting for a volume to be created, either by external provisioner "docker.io/hostpath" or manually created by system administrator

4m22s       Normal    Provisioning            persistentvolumeclaim/mysql-pv-claim    External provisioner is provisioning volume for claim "default/mysql-pv-claim"

4m22s       Normal    ProvisioningSucceeded   persistentvolumeclaim/mysql-pv-claim    Successfully provisioned volume pvc-b5419197-f122-4263-9c78-e9fb457db630

4m22s       Warning   FailedScheduling        pod/wordpress-57b89f8b5b-gt6bv          pod has unbound immediate PersistentVolumeClaims

4m20s       Normal    Scheduled               pod/wordpress-57b89f8b5b-gt6bv          Successfully assigned default/wordpress-57b89f8b5b-gt6bv to docker-desktop

4m18s       Normal    Pulled                  pod/wordpress-57b89f8b5b-gt6bv          Container image "wordpress:4.8-apache" already present on machine

4m18s       Normal    Created                 pod/wordpress-57b89f8b5b-gt6bv          Created container wordpress

4m18s       Normal    Started                 pod/wordpress-57b89f8b5b-gt6bv          Started container wordpress

4m22s       Normal    SuccessfulCreate        replicaset/wordpress-57b89f8b5b         Created pod: wordpress-57b89f8b5b-gt6bv

Using kubectl describe pod <pod-name> provides a lot of useful information about the pod, including a section listing the latest events:

Events:

Type     Reason            Age                    From                     Message

----     ------            ----                   ----                     -------

Warning  FailedScheduling  9m44s                  default-scheduler        persistentvolumeclaim "mysql-pv-claim" not found

Warning  FailedScheduling  9m44s (x2 over 9m44s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims

Normal   Scheduled         9m42s                  default-scheduler        Successfully assigned default/wordpress-mysql-694777bb76-tqn55 to docker-desktop

Normal   Pulled            9m40s                  kubelet, docker-desktop  Container image "mysql:5.6" already present on machine

Normal   Created           9m40s                  kubelet, docker-desktop  Created container mysql

Normal   Started           9m40s                  kubelet, docker-desktop  Started container mysql

Kubernetes audit logs

Audit logs can be useful for compliance as they should help you answer the questions of what happened, who did what and when.

Kubernetes provides flexible auditing of kube-apiserver requests based on policies. These help you track all activities in chronological order.

Here is an example of an audit log:

{
  "kind":"Event",
  "apiVersion":"audit.k8s.io/v1beta1",
  "metadata":{ "creationTimestamp":"2019-08-22T12:00:00Z" },
  "level":"Metadata",
  "timestamp":"2019-08-22T12:00:00Z",
  "auditID":"23bc44ds-2452-242g-fsf2-4242fe3ggfes",
  "stage":"RequestReceived",
  "requestURI":"/api/v1/namespaces/default/persistentvolumeclaims",
  "verb":"list",
  "user": {
  "username":"user@example.org",
  "groups":[ "system:authenticated" ]
  },
  "sourceIPs":[ "172.12.56.1" ],
  "objectRef": {
  "resource":"persistentvolumeclaims",
  "namespace":"default",
  "apiVersion":"v1"
  },
  "requestReceivedTimestamp":"2019-08-22T12:00:00Z",
  "stageTimestamp":"2019-08-22T12:00:00Z"
}

For more information on monitoring Kubernetes logs for anomalies, as well as for threat detection, check out this post.

Kubernetes logging tools

Hopefully, you’ve now got a better understanding of the different logging layers and log types available in Kubernetes. The logging tools reviewed in this section play an important role in putting all of this together to build a Kubernetes logging pipeline.

Kubernetes doesn’t provide log aggregation of its own. However, Kubernetes release contains optional logging agents for Elasticsearch and for Stackdriver Logging (for use with Google Cloud Platform), and Fluentd as node agent. In the following sections I’ll look into each of them.

The general architecture for cluster log aggregation is to have a local agent (such as Fluentd or Filebeat which are discussed below) to gather the data and send it to the central log management. The agent usually deploys per node as a DaemonSet to collect all the logs on that node. However, it can also deploy per pod for finer granularity. The agent can also perform some filtering and manipulation of the logs before sending them, to improve the logs ingestion and analysis or to reduce log volume. I highly recommend adding metadata from the node (which is accessible to the local logging agent), such as pod name, cluster id and region, which greatly helps in analysis and troubleshooting.

Fluentd

Fluentd is a popular open-source log aggregator that allows you to collect various logs from your Kubernetes cluster, process them, and then ship them to a data storage backend of your choice.

Kubernetes-native, fluentd integrates seamlessly with Kubernetes deployments.  The most common method for deploying fluentd is as a daemonset which ensures a fluentd pod runs on each pod. Similar to other log forwarders and aggregators, fluentd appends useful metadata fields to logs such as the pod name and Kubernetes namespace, which helps provide more context.

ELK Stack

The ELK Stack (Elasticsearch, Logstash and Kibana) is another very popular open-source tool used for logging Kubernetes, and is actually comprised of four components:

ELK can be deployed on Kubernetes as well, on-prem or in the cloud. While Beats is Elasticsearch’s native shipper, a common alternative for Kubernetes installations is to use Fluentd to send logs to Elasticsearch (sometimes referred to as the EFK stack).

Together, these components provide Kubernetes users with an end-to-end logging solution. As effective as it is, deploying and managing ELK deployments at scale is a challenge unto itself.

Logz.io offers users with a fully-managed option for using the stack to log Kubernetes, with built-in integrations and monitoring dashboards. Get more information on logging Kubernetes with Logz.io.

Google Stackdriver

And last but not least…Google Stackdriver.

Stackdriver is another Kubernetes-native logging tool that provides users with a centralized logging solution. If you’re using GKE, Stackdriver can be easily enabled using the following command:

gcloud container clusters create [CLUSTER_NAME] \
 --zone [ZONE]
 --project-id [PROJECT_ID]
 --enable-stackdriver-kubernetes \
 --cluster-version=latest

For more information on using Stackdriver to log Kubernetes, check out Logging Using Stackdriver.

Endnotes

Once a cluster is up and running with logging in place, you can make sure your workloads and underlying infrastructure stay healthy. Logging also helps you to be prepared for issues that may arise during the deployment of a new production release and stop them before they affect the customer’s experience.

Kubernetes’ kubectl and kubetail commands can provide a useful manual way to inspect logs, but monitoring clusters in production calls for a cluster-wide log aggregation and analysis tool such as ELK stack. In production it’s recommended to keep your logs separately from the Kubernetes cluster running your monitored application, so that your logs remain accessible for troubleshooting even (and especially) during cluster outage and issues.

It takes time to implement production-ready logging for your services, as well as to set up alerts and tune them appropriately. However, an effective logging solution allows you to focus on monitoring your key business metrics, which, in turn, increases the reliability of your products and your company’s revenue.

To learn more contact us or visit our blog.

kubectl logs and other useful kubectl commands

Some useful kubectl commands are listed below:

kubectl logs -f # stream logs
kubectl logs --since=1h # return logs newer than a relative duration
kubectl logs --since-time=2020-08-13T10:46:00.000000000Z # return logs after a specific date (RFC3339)
kubectl logs --previous # print the logs for the previous instance of the container
kubectl logs -c # print the logs of this container
kubectl logs -l #  print logs from all containers in pods defined by label
kubectl get events --sort-by=’.metadata.creationTimestamp’ # print all events in chronological order
kubectl describe pod  # print pod details like status or recent events

You can find more information on these and other commands on the reference documentation here.

Dotan Horovits (@horovits) is a CNCF speaker, a co-organizer of the Israeli chapter of CNCF, and a developer advocate at Logz.io – Twitter LinkedIn

Logz.io provides a cloud observability platform based on ELK, Prometheus and Jaeger.