Member blog post by Abhishek Singh, Christos Kalkanis, Alexander Wert, and Bahubali Shetti of Elastic
In March 2023, OpenTelemetry took a big step towards this goal by merging a profiling data model OTEP and working towards a stable spec and implementation. Today OpenTelemetry profiling takes another big step in establishing profiling as the fourth telemetry signal in OpenTelemetry with the donation of Elastic’s continuous profiling agent. SREs can now benefit from these capabilities: quickly identifying performance bottlenecks, maximizing resource utilization, reducing carbon footprint, and optimizing cloud spend.
What is continuous profiling?
Profiling is a technique used to understand the behavior of a software application by collecting information about its execution. This includes tracking the duration of function calls, memory usage, CPU usage, and other system resources.
However, traditional profiling solutions have significant drawbacks limiting adoption in production environments:
- Significant cost and performance overhead due to code instrumentation
- Disruptive service restarts
- Inability to get visibility into third-party libraries
Unlike traditional profiling, which is often done only in a specific development phase or under controlled test conditions, continuous profiling runs in the background with minimal overhead. This provides real-time, actionable insights without replicating issues in separate environments. SREs, DevOps, and developers can see how code affects performance and cost, making code and infrastructure improvements easier.
Contribution of production-grade features
Elastic’s continuous profiling agent, based on eBPF, is a whole system, always-on, solution that observes code and third-party libraries, kernel operations, and other code you don’t own. It eliminates the need for code instrumentation (run-time/bytecode), recompilation, or service restarts with low overhead, low CPU (~1%), and memory usage in production environments.
The profiling agent facilitates identifying non-optimal code paths, uncovering “unknown unknowns”, and provides comprehensive visibility into the runtime behavior of all applications. Elastic’s continuous profiling agent provides support for a wide range of runtimes and languages, such as C/C++, Rust, Zig, Go, Java, Python, Ruby, PHP, Node.js, V8, Perl, and .NET.
Additionally, organizations can meet sustainability objectives by minimizing computational wastage, ensuring seamless alignment with their strategic ESG goals.
Keep reading
- Look back at the OTel roadmap
- A practical guide to data collection
- OTel has embraced profiling
- Register for KubeCon + CloudNativeCon North America 2024 today
Benefits to the OpenTelemetry community
The contribution boosts the standardization of continuous profiling for observability and it accelerates the practical adoption of profiling as the fourth key signal in OTel. Customers now have a vendor-agnostic method to collecting profiling data and enabling correlation with existing signals, like tracing, metrics, and logs.
OTel-based continuous profiling unlocks the following possibilities for users:
- Improved customer experience: delivering consistent service quality and performance through continuous profiling ensures customers have an application that performs optimally, remains responsive, and is reliable.
- Maximize gross margins: Businesses can optimize their cloud spend and improve profitability by reducing the computational resources needed to run applications. Whole system continuous profiling is one way of identifying the most expensive functions (down to the lines of code) across diverse environments that may span multiple cloud providers. In the cloud context, every CPU cycle saved translates to money saved.
- Minimize environmental impact: energy consumption associated with computing is a growing concern (source: MIT Energy Initiative). More efficient code translates to lower energy consumption, contributing to a reduction in carbon (CO2) footprint.
- Accelerate engineering workflows: continuous profiling provides detailed insights to help troubleshoot complex issues faster, guide development, and improve overall code quality.
- Improved vendor neutrality and increased efficiency: an OTel eBPF-based profiling agent removes the need to use proprietary APM agents and offers a more efficient way to collect profiling telemetry.
With these benefits, customers can now manage the overall application’s efficiency on the cloud while ensuring their engineering teams optimize it.
Moving forward, Elastic will continue collaborating closely with the OTel Profiling and Collector SIGs to ensure seamless integration of the profiling agent within the broader OTel ecosystem.