The Indian stock brokerage Zerodha handles 8 million trades a day, making it the largest retail stock investment platform in the world. “Our mission is to make trading and investing easy and accessible to the masses,” says CTO Kailash Nadh.
Given its industry and scale, Zerodha requires infrastructure that spans a public cloud (AWS for most in-house applications) and physical machines in multiple data centers, with specific regulatory and technical requirements for capital market connectivity via leased lines and adapters from various stock exchanges.
That complexity – along with a heavily regulated technology stack and end-user applications and internal systems with disparate external dependencies – led the company to embrace cloud native technologies.
“We needed a centralized, uniform monitoring infrastructure that worked across a wide variety of environments,” Nadh says. “Prometheus gave us powerful monitoring for critical, low-latency financial systems. It helped us aggregate and monitor metrics infra-wide. The large number of existing exporters and the ease of writing custom exporters enabled us to attain wide coverage in a short period of time.”
Additionally, Zerodha began moving its services from VMs to containers, and gradually to Kubernetes in 2020. Because all of its apps had already been developed with a service-oriented architecture and 12-factor approach, the migration was straightforward. The infrastructure team began by creating CI pipelines with GitLab as the company has a well-defined process of pushing changes to production with its CI/CD process. With a focus on infrastructure-as-code practices, Zerodha uses a mix of Terraform, Packer, and eksctl to create its Kubernetes infrastructure on AWS and hosts container artifacts on an internal registry powered by AWS (ECR).
“We have been conscious of not creating an ops vs. dev divide,” says Nadh. “Developers are responsible for the entire lifecycle of their projects, including deployments. We created a standard template for deployments with Kubernetes that allows developers to craft their own deployments with minimal scaffolding or direct involvement of DevOps engineers.”
As a result, deployment rollouts are faster and more frequent: Complete environments with all the dependencies can be brought up in minutes, rather than hours, with very little manual intervention. “Kubernetes has helped us standardize the deployment process of applications built on many different kinds of stacks across teams,” he says. “We’ve gained scale and modularity, especially in an environment where sweeping regulatory changes often demand significant changes to systems. Kubernetes also allows us fine-grained resource allocation for workloads, reducing cost of compute instances by at least 50%.”
Zerodha is using a CNCF incubating project, NATS, “for transmitting large volumes of real time-market data across infrastructures across applications at high throughputs,” says Nadh. “Many of our components depend on the ease of subscriptions and instant and ‘automagical’ failovers NATS offers. We had at least three other technologies that we had tried over the years before stumbling upon NATS. It pretty much solved all the issues we had faced with other message streams and PubSub systems.”
To find out more about Zerodha’s cloud native journey, read the full case study.