Guest post originally published on the Epsagon blog by Ran Ribenzaft, co-founder and CEO at Epsagon
OpenTelemetry is an exciting new observability ecosystem with a number of leading monitoring companies behind it. It is a provider-agnostic observability solution supported by the CNCF and represents the third evolution of open observability after OpenCensus and OpenTracing. Supporting APIs for both tracing and metrics, OpenTelemetry provides rich auto instrumentation and SDKs for a number of programming languages and aims to support provider-agnostic instrumentation, allowing you to avoid vendor lock-in with its OpenTelemetry collector.
This article provides a technical overview of OpenTelemetry and its major components: metrics, tracing, SDKs, and its collector agent. It explains why a new approach to telemetry is important, discusses its current state and supported languages, and talks about the reasoning behind some of its implementation details. Finally, we cover some considerations when getting started with OpenTelemetry as well.
What is OpenTelemetry?
OpenTelemetry is a set of standards, libraries, SDKs, and agents that provide full application-level observability. It uses the same standards-based approach as OpenCensus and OpenTracing that helps avoid vendor lock-in by decoupling application instrumentation and data export. OpenTelemetry’s vast ecosystem is comprised of:
- Standards/Specifications
- APIs
- SDK Concrete Implementation of an API
- Metrics
- Tracing
- Auto-Instrumentation
- Exporters
- Collector
Standards & Specifications
OpenTelemetry takes a standards-based approach to implementation. The focus on standards is especially important for OpenTelemetry since it demands tracing interoperability across languages. Many languages come with type definitions to use in implementations, such as for interfaces, that can be used for creating reusable components.
API Language-Specific Types and Interfaces
Each language implements the specification through its API. APIs contain language-specific type and interface definitions, which are abstract classes, types, and interfaces meant to be consumed by concrete language implementations. They also contain no-op implementations to enable local testing and provide tooling for unit testing. The definition of an API is located in each language’s implementation. As stated in the OpenTelemetry Python Client:
“The opentelemetry-api package includes abstract classes and no-op implementations that comprise the OpenTelemetry API following the specification.”
You can see a similar definition in the OpenTelemetry Javascript Client:
“This package provides everything needed to interact with the OpenTelemetry API, including all TypeScript interfaces, enums, and no-op implementations. It is intended for use both on the server and in the browser.”
SDK Exportable Implementation of the Specification
SDKs are the glue that combines exporters with the API. SDKs are concrete, executable implementations of the API. The rest of this section will explore each of the major OpenTelemetry components: exporters, metrics, tracing, auto-instrumentation, and the collector.
Exporters
Exporters enable you to extract data from applications and transform data into specific instrumentation protocols and vendors. The concept of exporters here is the same as with OpenCensus and OpenTracing. Thus, you can instrument the application using OpenTelemetry and then configure an exporter to determine where the OpenTelemetry data is sent. This decouples the instrumentation from any specific vendor or protocol, avoiding vendor lock-in.
Metrics
If you’ve already used OpenCensus, you should be very familiar with metrics. The primitive for combining measures (actual metric events) with an exporter is called a Meter in OpenTelemetry. The metric primitives are generic to capture a wide variety of metric events.
Tracing
Tracing in OpenTelemetry is very similar to that in OpenTracing. OpenTelemetry introduces the concept of a TracerProvider, which can model global tracer instances in a singleton pattern, similar to OpenTracing’s global tracer. OpenTelemetry also introduces additional abstractions, such as SpanProcessors, which are how exporters are attached to the OpenTelemetry API calls.
Auto-Instrumentation
Auto-instrumentation is the ability to dynamically instrument language-specific libraries for tracing. Instrumenting libraries for tracing requires propagating a trace context throughout all call sites. Modifying code to propagate this can be difficult with legacy projects and large projects and is extremely difficult to do in languages like node.js, which have historically lacked thread-local storage. Auto-instrumenting will automatically patch common libraries (such as HTTP clients/servers, web frameworks, and database clients) to automatically add tracing!
Epsagon is also incorporating their language-specific auto-instrumentation frameworks into Python, Ruby, Java, Go, Node.js, PHP, and .NET, which drastically cuts down on the time it takes to instrument tracing.
Collectors
One of the biggest new features with OpenTelemetry is the concept of an agent. Agents are standalone daemons that collect metrics. To support agents, OpenTelemetry has created its provider agnostic protocol: collectors. These decouple the exportation and transformation of telemetry from the collection. OpenTelemetry also offers a new vendor-agnostic protocol to go along with the collector. While the protocol is still in its infancy, the goal is to further decouple observability instrumentation from specific vendors!
Why OpenTelemetry?
Here are a few reasons behind CNCF’s development.
Evolution of Standards
One reason for these new components and abstractions is an evolution of standards. OpenCensus started with Google and represented its strategy with a tracing implementation tailored to Google’s tracing implementation. OpenTracing was then an evolution of OpenCensus, taking a standards-based approach to implementing tracing. Both projects inherited the concept of “exporters” and decoupled instrumentation from exportation. OpenTelemetry is a merging of these two frameworks. Once OpenTelemetry is stable, there shouldn’t be a need to use multiple frameworks, but until it is, it’s important to consider OpenCensus and OpenTracing as well.
Multi-Provider
At the heart of OpenTelemetry is the decoupling of language instrumentation code from vendors. With OpenTelemetry, applications only need to be instrumented once regardless of the provider. This allows companies to choose the best provider for their needs, and they can even change providers with only very minimal changes to their code! And in the case of the OpenTelemetry Collector, no code changes are required!
More Generic APIs
OpenTelemetry also evolves a number of the OpenTracing and OpenCensus APIs, introducing new concepts and abstractions. For example, OpenTracing has the concept of a span “tag,” which is a way to attach key/value data to individual spans. Best practices for choosing tags haven’t changed, but the concept of a tag has been replaced with an “annotation,” which is a more generic form of “tag.” OpenTelemetry has introduced more generic abstractions for several different components.
Additionally, it encodes concepts that were previously only conventions, such as the OpenTracing semantic conventions, into the OpenTelemetry API. In OpenTracing, the span.kind tag was a convention that was not enforced by the API but had significance in some of the tracing providers (OpenCensus specifies SpanKind). OpenTelemetry pulls this concept from OpenCensus into the API and makes SpanKind a property of spans. Figure N shows an example of having an explicit kind in OpenTelemetry.
Asynchronous Events
OpenTelemetry treats asynchronous events as first-class citizens through its Links API. In OpenTracing, there are two ways to model causal relationships between spans. The relationship is specified during the span creation of Tracer.StartSpan() calls:
- ChildOf: The Parent is dependent on the new span’s results.
- FollowsFrom: The parent is not dependent on the new spans results.
It establishes causality explicitly through the Links API, which collapses the distinction. The example below shows the Golang API for creating a new span, specifying links
Supported Languages
All major programming languages are supported by OpenTelemetry. Detailed information on the status of different projects is available on the OpenTelemetry website and on each language’s GitHub page.
Progress is being made quickly, so check back often! A missing feature today could easily be implemented in a couple of days or weeks.
Getting Started with OpenTelemetry
Since this is still a young project, you need to perform some background research before getting started. It’s a good idea to:
- Check the OpenTelemetry language version.
- Check feature support for your target languages.
- Check available exporters for your target languages.
After this, you should check out examples for your chosen language in its given GitHub repo.
Conclusion
OpenTelemetry has all the components necessary to be a one-stop observability solution:
- Standards-first approach
- Language-specific SDKs
- Metrics
- Traces
- Collectors
- Auto-instrumentation
OpenTelemetry aims to embody metrics and tracing, two of the three pillars of observability. But before making the switch, make sure to check if it supports the languages you want to use because each language is in a different phase of implementation and some features may not be available across all languages. OpenTelemetry has made significant progress in the last six months and continues to do so. If you’re looking to implement, it provides backward compatibility for both OpenCensus and OpenTracing as well, reducing the friction involved in getting started.