How Mezmo secures petabytes of data monthly with Linkerd
Challenge
Mezmo is a fast-growing observability company that makes observability data consumable and actionable to customers around the world, from hyper-growth startups to Fortune 500 companies. To provide this critical service to its customers, Mezmo processes a tremendous amount of data — petabytes every month — across a highly scaled, multi-region and multi-cloud Kubernetes platform.
One of Mezmo’s largest customers needed to encrypt its data in transit as well as authenticate workload identities. Not meeting those needs would lead to customer and revenue loss.
Solution
The solution was clear: A service mesh, with its mutual TLS capabilities, would allow Mezmo to meet the requirements in Kubernetes. They needed to find a service mesh that would allow a “fixing the airplane while flying” approach, ensuring application uptime and reliability were not impacted during the rollout.
Impact
The successful deployment of Linkerd was directly linked to revenue preservation, and meeting the critical mTLS requirement from a high-profile customer meant an end to countless hours looking for the right solution, and numerous, sometimes tense, customer meetings.
By the numbers
Petabytes of data
processed every month
10+ K8s clusters
across 10+ regions
Zero downtime
converting to Linkerd
To satisfy critical customer requirements around data security, Mezmo investigated several service mesh options. After a detailed evaluation, Mezmo chose Linkerd for its simplicity and ease of use. Today, Mezmo operates Linkerd at significant scale in production, taking advantage of features such as automatic encryption of data in transit and authentication of workload identities through mutual TLS to provide best-in-class security to its customers’ valuable data at scale.
Mezmo overview
Founded in 2015 in Silicon Valley, Mezmo (formerly known as LogDNA) provides a powerful telemetry data platform for its customers that handles extreme amounts of data. Mezmo processes petabytes of logs and metrics data each month and has been recognized as one of the fastest-growing companies in the US by Inc. 5000 and Deloitte Fast 500.
The Mezmo Telemetry Pipeline allows customers to set up sophisticated rules for ingesting, profiling, transforming and routing all data across the company. For example, it allows optimized security data, such as audit logs and login attempts, to be delivered to a SEIM, while routing compliant marketing data, such as web access logs and form errors, to the marketing team. In addition, integrated log analysis helps enterprises to search, visualize and act on telemetry data easily. With this platform, Mezmo customers get the unique capability to respond quickly to any incident or data change by correlating these different streams. Mezmo’s powerful toolset allows customers to manage all of their log and telemetry data across their entire platform.
The platform engineering team and infrastructure
To handle the extreme scale of data it processes, Mezmo’s platform itself is highly scaled: it comprises 10+ Kubernetes clusters across 10+ regions and runs in both AWS and IBM Cloud. The platform operates in service of teams of product owners and developers who are continuously pushing the limit of what’s possible. “Constant product innovation is business critical because this is a fast-moving space,” stated Rich Prillinger, Senior Director of Platform Engineering at Mezmo. “And the platform we build must support that rapid pace of innovation.”
Introducing Linkerd
In 2022, one of Mezmo’s largest customers presented a new requirement. This customer’s data included sensitive information such as PII, health records, credit card numbers and more, and they needed encryption of this data in transit as well as authentication of workload identities handling this data in order to protect it.
The Mezmo team knew that mutual TLS provided exactly these features and that a service mesh was the best way to tackle this requirement in Kubernetes, so they began an internal evaluation process for the best implementation. The team started by evaluating Kuma and Istio.
However, after over six months, with an engineer assigned full-time to the task, the team could not bring either project to the level of confidence the platform demanded.
The Mezmo platform team then moved on to Linkerd and, within a few days, had the project working and providing mutual TLS for network communication on the clusters. Their service mesh decision was clear: after months of unsuccessfully trying to get other service meshes to function as desired, Linkerd’s ability to “just work” convinced Mezmo platform engineers that they had found their solution.
“After a long period of research and experimentation with service meshes, it was a tremendous relief to finally have a clear path with Linkerd to meet our customer’s demand.”
Rich Prillinger, Senior Director of Platform Engineering at Mezmo
Rolling out Linkerd to production
Mezmo’s rollout of Linkerd took a “fixing the airplane while flying” approach — it was critical that application uptime and reliability were not impacted during the rollout. Prillinger’s team took an incremental approach, starting by implementing the mesh on less critical services, then rolling the mesh out across the sites from smallest to largest, and finally repeating this process across all the services. This was only made possible by Linkerd’s ability to be deployed incrementally on individual components without disrupting the existing communication with un-meshed components.
Today, Mezmo’s platform is meshed with Linkerd. And while mTLS was the primary driver for adopting a service mesh, other features, such as latency-based load balancing, authorization policy and mutual TLS have become the icing on the cake, further allowing Mezmo to provide a highly available platform with world-class security.
Happy customers and a healthy business
The successful deployment of Linkerd was directly linked to revenue preservation at Mezmo. Meeting the critical mTLS requirement from a high-profile customer meant an end to countless hours looking for the right solution, and numerous, sometimes tense, customer meetings — a huge relief for the entire team.
Adopting Linkerd not only helped Mezmo meet their customer’s demands, it also made their platform more secure, benefiting all Mezmo users. Finally, the Linkerd maintainers have also been a part of Mezmo’s success.