Guest post originally published on the Helios blog by Ran Nozik
OpenTelemetry (OTel) is an open source selection of tools, SDKs and APIs, that allows developers to collect and export traces, metrics and logs. It’s the second-most active project in the CNCF, and is emerging as the industry standard for system observability and distributed tracing across cloud-native and distributed architectures. OTel supports multiple languages, like JavaScript, Java, Python, Go, Ruby and C++. Many developers, however, aren’t familiar with the actual technology behind OTel – how does this magic actually works behind the scenes? In this blog post, I will explain how OpenTelemetry works under the hood with JavaScript. In the future, we will show OTel in action in additional languages.
Instrumentation Under the Hood: A Technical Explanation
Like all instrumentation libraries, OpenTelemetry operates by wrapping existing function implementations and extracting the necessary pieces of data. These include the function parameters, duration, and results. Sometimes, changes to the data are made as well (e.g., for context propagation purposes) , in a cautious manner.
The specific wrapping and extraction mechanism operates differently in every language. There is a clear difference between how it works in dynamic languages, like JavaScript, Python and Ruby, and non-dynamic languages, like Java, Go and .NET (but more on that later).
Let’s think of a classic example. Say we are trying to collect data from an HTTP client (like axios in JavaScript or requests in Python). For simplicity’s sake, let’s assume we only want to collect the request duration, HTTP method, URL and response status code.
Python’s requests
lib, for example, exposes a separate function for each HTTP method (requests.get
/ requests.post
/ requests.put
, and so on). But each of these functions eventually calls an internal request
method, whose parameters are the method, URL and all the kwargs
arguments. The function then returns a response object.
A simplified way of explaining how instrumenting requests
would look something like:
def request(method, url, **kwargs):
# Original implementation
def wrapped_request(method, url, **kwargs):
before = datetime.now()
# Call the original implementation
response = request(method, url, **kwargs)
# Collect the necessary information, asynchronously of course
duration = datetime.now() - before
collect_data(method, url, response.status_code, duration)
# Return the value from the original call
return response
To close the loop, the original function implementation only needs to be replaced with the new one, wrapped_request
. For dynamic languages like JS and Python, this is done by simply holding a reference to the original implementation and replacing the function by its name. A pseudocode implementation (which isn’t very very far from a real life code ) looks like this:
original_request_impl = requests.request
def wrapped_request(method, url, **kwargs):
# Wrapped implementation as appears, has the original as a closure
requests.request = wrapped_requests
Users of these requests will not notice a thing – they will continue calling requests.get
and requests.post
like they did before. But the auto-instrumentation will collect the necessary data for monitoring, troubleshooting and many other use cases.
How Does Instrumentation Work in JavaScript?
In JS, since everything is an object, patching a method is as easy as reassigning a variable. Let’s look at a simple example:
class Person {
constructor(name) {
this.name = name;
}
print() {
console.log(`My name is ${this.name}`)
}
}
const p = new Person(‘Johnny’);
p.print(); // Prints “My name is Johnny”
const origPrint = p.print;
const newPrint = function() {
console.log(‘Hey there!’);
// Call the original implementation
origPrint.apply(this, arguments);
}
p.print = newPrint;
p.print(); // Prints “Hey there! My name is Johnny”
What we did was simple – we replaced the implementation of the specific Person
instance, and added an additional print, to show how the instrumentation works. As a side note, notice that this code change only affects the specific instance of the Person
class. To patch all instances, we could have simply replaced the print
method of Person.prototype
instead.
How OpenTelemetry Works with JavaScript
In JavaScript, OpenTelemetry works specifically by hooking into the native require
function. This is the function that loads modules by their name, triggering the instrumentation process. For example, when the developer calls require('kafkajs')
, OTel uses the require-in-the-middle module to apply changes to the `kafkajs` module. This change wraps the necessary functions in a similar manner as shown above, using the shimmer library, and returns the patched module back to the user code. From the end-user’s perspective – the change was completely transparent and they are not aware of any changes made.
You may have noticed that this mechanism implicitly assumes that the `require-in-the-middle` hook was set before the call to require('kafkajs')
. If `kafkajs` (or any other module we are trying to instrument) is loaded before the hook is set, it will simply “miss” its opportunity to patch the necessary functions. This is a big potential pitfall – it assumes the developer knows exactly where to put the OpenTelemetry initialization code. In many cases – this may not be trivial, and we have indeed seen many developers “misplace” the OTel initialization code, causing the instrumentation to behave unexpectedly. Data from modules that were required
before OTel are missing (typically, HTTP frameworks like express/koa), while data from other modules appear properly.
How is this problem solved?
Ensuring OTel is Loaded Before Other JavaScript Modules
As described above, using OpenTelemetry in JS requires a good understanding of the application’s initialization flow. A module that is loaded before OTel will not be properly instrumented, and it is often happening implicitly through cascading require
s. But how can you be 100% sure the module was loaded after OTel?
In some cases (AWS Lambda, for instance) the developer may not even have control over the loaded modules, as the Lambda runtime comes with preloaded modules and calls a handler function that the developer provides. In this case, adding the initialization code at the top of the handler file just won’t work. There are other similar examples – where the code runs as part of homegrown microservices templates, whose initialization flow isn’t accessible (and perhaps even known) to the developer.
The most reliable way to avoid these problem is to use the native Node.js functionality of –require – to make sure the OTel initialization code is called before anything else. Setting NODE_OPTIONS
to require
this code ensures no module is loaded before the require-in-the-middle hook. The typical way for doing this is by creating a file with the OpenTelemetry initialization code (let’s call it otel_init.js
). Assuming the application’s main file is app.js
, you can either:
- Replace the
node app.js
command withnode --require otel_init.js app.js
. - If you’re unable (or prefer not to) change the command, setting the
NODE_OPTIONS
environment variable to--require otel_init.js
will also do the trick.
What about cases in which require-in-the-middle
cannot work at all? webpack bundles the entire module with all of its dependencies into a single file, and the modules are not loaded using the native require
function (except for modules that are defined externally), but rather by a unique identifier allocated by webpack.
How can OTel work in such conditions? Stay tuned for our next blog posts.
To learn the basic usage of OTEl with JavaScript, click here.
To get started with Helios, which leverages OTel’s capabilities to help engineering teams build production-ready cloud-native applications, sign up here.