How to ace the Prometheus Certified Associate (PCA) exam

Posted on November 7, 2024

CNCF projects highlighted in this post

Community post originally published on Medium by Giorgi Keratishvili

Introduction

If you have worked on Kubernetes production systems at any time during the last 10 years and needed to check your pods or application uptime, resource consumption, HTTP error rates, and needed to observe them for a certain period of time, most probably you have been using the Prometheus and Grafana stack. If you want to extend your knowledge of observability and monitoring, then this exam is exactly for you because it does not focus only on Prometheus but also on general concepts such as SLA, SLO, SLI, how to structure alerting, and best practices for observability.

Besides that, Prometheus is one of the first open source projects that joined the CNCF after Kubernetes and has since been one of the most preferred tools for monitoring and observability in containerized environments. It also incorporates other open source projects such as Grafana for visualization and OpenTelemetry for observability, which greatly impact the whole industry.

Who should take this exam?

As mentioned above, the majority of the time we see Prometheus in Kubernetes or containerized environments, but nonetheless, it is not limited only to containerized scenarios. Overall, if we have any production system, we must have some kind of monitoring tool. If we don’t have observability, we are blind and when things go wrong — and they will — it’s just a matter of time before it happens, you will appreciate that you could see more and on time everything.

Regarding persons who would benefit SysAdmins/Dev/Ops/SRE/Managers/Patform engineers or anyone who is doing anything on production should consider it.

Exam Format and PSI Proctored Exam Tips

So, are we fired up like a torch, eager to spot any degradation in your systems and wanting to pass the exam? Then, we have a long path ahead until we reach this point. First, we need to understand what kind of exam it is compared to CKAD, CKA, and CKS. This is the first exam where the CNCF has adopted multiple-choice questions, and compared to other multiple-choice exams, this one, I would say, is not easy-peasy. However, it is still qualified as pre-professional, on par with the KCNA and KCSA.

This exam is conducted online, proctored similarly to other Kubernetes certifications, and is facilitated by PSI. As someone who has taken more than 15 exams with PSI, I can say that every time it’s a new journey. I HIGHLY ADVISE joining the exam 30 minutes before taking the test because there are pre-checks of ID, and the room in which you are taking it needs to be checked for exam criteria. Please check these two links for the exam rules and PSI portal guide

You’ll have 90 minutes to answer 60 questions, which is generally considered sufficient time. Be prepared for some questions that can be quite tricky. I marked a couple of them for review and would advise doing the same because sometimes you could find a hint or partial answers in the next question. By this way, you could refer back to those questions. Regarding pricing, the exam costs $250, but you can often find it at a discount during Black Friday promotions or near dates for CNCF events like KubeCon, Open Source Summit, etc.

The Path of Learning

At this point, we understand what we have signed up for and are ready to dedicate time to training, but where should we start? Before taking this exam, I had a good experience with Kubernetes and its ecosystem and had experience with Prometheus but only for things that I needed. I did not delve deeper, yet I still learned a lot from this exam.

Let break down Domains & Competencies

**Observability Concepts 18%**
    Metrics
    Understand logs and events
    Tracing and Spans
    Push vs Pull
    Service Discovery
    Basics of SLOs, SLAs, and SLIs

**Prometheus Fundamentals 20%**
    System Architecture
    Configuration and Scraping
    Understanding Prometheus Limitations
    Data Model and Labels
    Exposition Format

**PromQL 28%**
    Selecting Data
    Rates and Derivatives
    Aggregating over time
    Aggregating over dimensions
    Binary operators
    Histograms
    Timestamp Metrics

**Instrumentation and Exporters 16%**
    Client Libraries
    Instrumentation
    Exporters
    Structuring and naming metrics

**Alerting & Dashboarding 18%**
    Dashboarding basics
    Configuring Alerting rules
    Understand and Use Alertmanager
    Alerting basics (when, what, and why)

At first glance, this list might seem too simple and easy; however, we need to learn the fundamentals of observability first in order to understand higher-level concepts.

Understanding Observability

Observability is a measure of how well the internal states of a system can be inferred from knowledge of its external outputs. In the context of system engineering and IT operations, observability is crucial for diagnosing issues and ensuring that all parts of the system are functioning as expected.

The Pillars of Observability

Logs: These are immutable records that describe discrete events that have happened over time. Logs are useful for understanding what happened in a system after an event.
Metrics: These are numerical values that measure some aspect of a system over intervals of time. Metrics are vital for understanding the performance of a system and for making comparisons over time.
Traces: These are records of the full paths or sequences of events that occur as requests flow through a system. Tracing helps in identifying how requests propagate through a system and where delays or issues arise.

Bubble chart with observability, traces, logs and metrics

Some Key Concepts to Remember:

SLA (Service Level Agreement): An SLA is the agreement you establish with your clients or users, defining the level of service they can expect.
SLO (Service Level Objective): SLOs are specific, measurable goals your team must achieve to meet the SLA. They represent the performance targets you aim to reach.
SLI (Service Level Indicator): SLIs are the actual metrics or measurements that indicate the real-time performance of your system. They are used to assess compliance with SLOs.
Pull: Metric scraping is initiated by Prometheus. Prometheus queries target endpoints to collect metrics at regular intervals.
Push: Metrics are published by the application to an endpoint (e.g., Pushgateway). This method allows applications to push metrics when they are generated.
Trace: A trace represents a sequence of operations that together form a unique transaction handled by an application and its constituent services. Traces help in understanding the flow of requests and identifying bottlenecks or issues in your system.
Span: A span, in the context of tracing, represents a single operation within a trace. It provides detailed information about the duration and context of that operation.
Rule Types: In Prometheus, there are two types of rules. Record rules help precompute frequently needed or computationally expensive expressions, while alert rules enable you to define alert conditions using PromQL queries.
Meta-Monitoring: Comprehend the concept of meta-monitoring, which involves monitoring the Prometheus instances themselves.

Key Learning Objectives:

PromQL: PromQL is pivotal you can put your skills to the test by exploring metrics here and delving into Prometheus’s basic system architecture here. Additionally, grasp the fundamentals of Prometheus’s basic functions here.
AlertManager: You need to understand AlertManager and its functionalities are crucial. Gain insights into AlertManager features here and visualize your AlertManager routes here.
Exporters: Familiarize yourself with exporters like node_exporter and blackbox_exporter.

You can explore and learn about PCA Certification and related topics freely through the following GitHub repositories which I have used

For structured and comprehensive PCA exam preparation, consider investing in these paid course from KodeKloud where they providy play ground I used it in preparation and it helped a lot

Conclusion

The exam is not easy among other certs. I would rank it in this order KCNA/CGOA/CKAD/PCA/KCSA/CKA/CKS. After conducting the exam, you will receive grading within 24 hours and after passing the exam it feels pretty satisfying. Overall, I hope it was informative and useful 🚀

Prometheus Certified Associate certificate for Giorgi Keratishvili

Hyderabad, India