A fintech startup founded in 2013 in Brazil, Nubank was never weighed down by legacy infrastructure. Early on, the company embraced Docker containers and ran almost all of its infrastructure on AWS.
Which is not to say that Nubank didn’t have its challenges as its customer base reached 23 million and the engineering team grew from 30 to more than 520 over the past few years. “We had an immutable infrastructure, and we were growing very, very fast,” says Renan Capaverde, Director of Engineering at Nubank. “Our deployments were dependent on spinning a whole stack or cloning our whole infrastructure to iterate all the development. So it was getting slower and more painful over time.”
Other pain points included load balancing for applications, and difficulties arising from adding new security group rules in AWS.
“At the end of the day, we were certain that Docker was already really beneficial to us, and we wanted to go more into containerization,” says Capaverde. Finding an orchestrator was the next step. Nubank’s data infrastructure team had already been running a Spark cluster on top of Mesos, and the team looked at several different technologies including Docker Swarm. Ultimately, “it felt like the right choice was Kubernetes,” he says. “Kubernetes had a lot of great abstractions and a lot of support from the community and Cloud Native Computing Foundation, and all the expertise from Google deploying Borg.”
When Nubank began its migration to Kubernetes, “the first thing that we wanted was to empower developers for running their applications,” says Capaverde. “Our applications were already cloud native, so they had a really good architecture for the projects themselves. They were as scalable as we needed.”
The Nubank cloud native platform also includes Prometheus, Thanos, and Grafana for monitoring, and Fluentd for logging.
To help engineers use the new platform, the team held extensive training throughout the company for the first year. They also developed a command line interface, called NuCLI, with 500+ shortcuts for automation such as getting logs from applications or hard deploys. “This abstraction was easier for people, and at the beginning of the migration, mitigated the lack of expertise in Kubernetes,” says Capaverde.
Initially, Nubank, a Clojure shop, had some issues with CPU sets and the JVM, and had to be conservative in their resource allocation to avoid risk and move faster. This prevented the cost savings that they expected with Kubernetes. Almost a year and a half later, with further tweaking of outscaling and budgets, the team estimates that Nubank has gained about 30% cost efficiency.
There have been other benefits too. “Kubernetes has really great abstractions like readiness probe and life probe,” says Capaverde. “Instead of having to boot, we are using our blue/green strategy for deployments. We have 400+ microservices, so you can imagine deployment when we had to wait for our EC2 instance to boot and then start containers. With Kubernetes we just have to start the containers.”
As a result, deployment has gone from 90 minutes to 15 minutes for production environments. And that, says Nobre, was “the main benefit because it helps the developer experience.” Today, Nubank engineers are deploying 700 times a week. “For a bank you would say that’s insane,” Capaverde says with a laugh. “But it’s not insane because with Kubernetes and canary deployments, it’s easier to roll back a change because it’s also faster to deploy. People are shipping more often and with more confidence.”
For more about Nubank’s cloud native journey, read the full case study.