How Cloud Native projects & team culture power Allianz Direct’s CI/CD capabilities
Challenge
Allianz Direct altered much of its environment when it migrated from private to public cloud. Its CI/CD pipeline, however, remained unchanged. Allianz Direct engineers did not own the pipeline, making it difficult to extend, change and troubleshoot. Also, code and configuration changes took too long to enable the agility and rapid customer response central to the company’s mission.
Solution
Through an intensive team effort, Allianz Direct redesigned its pipeline in just three months. This required a three-pronged approach: first, overhauling its infrastructure to prepare for the new pipeline; second, replacing older tools with the latest open source, cloud native software; third, communicating with Allianz Direct’s engineering teams and additional stakeholders – in business, security, etc. – to standardize on new technologies and team tactics.
Impact
Standardizing on technologies and methods has accelerated the software release process. Owning and managing their pipeline reduced the time needed to make code changes and resolve issues, increasing agility and responsiveness. It also effected a transformation in the team itself, tightening collaborative ties and improving communication amongst all. With the project, Allianz Direct has broken ground within Allianz, spreading the cloud native mindset and technologies throughout the company.
By the numbers
Redesigned CI/CD pipeline
In just three months
New release pipeline
Condensed to 10-15 workflows from 200
Configuration changes
Reduced from half an hour to one minute
Allianz disrupts itself with Allianz Direct
Allianz SE is a 132-year-old multinational financial services company headquartered in Munich, Germany. It offers a range of insurance products in areas including life, health, auto, as well as asset management. It is the largest insurance provider in the world, operating in 70 countries, with over $1B in assets.
Disruption has come to the insurance industry, forcing providers to start thinking and behaving like digital-first companies. In 2018, Allianz answered the changing market with Allianz Direct, an innovative European online insurer within Allianz.
“The competition is not another insurance company, but is the Big Four Tech Companies (GAFA). If we want to compete with them, we have to have the same tech DNA,”
Sergiu Petean, Head of Devops, Allianz Direct
Allianz Direct is “a tech company with an insurance license,” according to CIO Des Field Corbett. It has introduced new digital methods for consumers buying insurance directly without brokers. Unique methods of leveraging both new and legacy systems enabled it to move nimbly. After starting in auto insurance, it brought a home insurance product to market in only four months. In under three years, it brought three insurance products—car, home and travel—to four European countries: Germany, the Netherlands, Italy and Spain.
Allianz Direct began in private cloud but migrated to the AWS public cloud in 2020 for optimal elasticity, scale and uptime. In 2021, it began leaning into cloud native technologies and best practices, like GitOps and infrastructure as code. Despite significant technical evolution, by May 2021, one element in its environment remained unchanged: Its CI/CD pipeline. “Nobody had the courage to approach it,” Petean said.
The reluctance was largely due to high coupling and unmanaged complexity in the pipeline’s design. Also, Allianz Direct did not own the underlying infrastructure themselves. These weak points made the pipeline a black box to Allianz Direct’s DevOps team.
Legacy pipeline hampers startup speed, agility
Because Allianz Direct engineers lacked in-depth knowledge of its design, extending or upgrading the pipeline proved challenging. Written in the Groovy programming language, its many conditions made adding new practices awkward or impossible. “You tried to achieve something, you broke something else,” according to Petean.
Also, as a startup serving a multinational audience, Allianz Direct required a level of speed and agility that the pipeline did not enable. “We had great technology for what we needed before, but we are a different kind of animal,” Petean said.
How would Allianz Direct compete with the digital-first giants disrupting the insurance space? At minimum, it would need to release code fast and without issues. Allianz Direct knew the CI/CD pipeline is central to this, and its own was not optimal. At its heart was the open source automation server Jenkins to build, test and deploy software. One drawback of this was that source and configuration code lived in a single repository. Any configuration change forced a full build-deploy cycle starting with source code. Updating a single environment variable meant drudging through the whole pipeline: writing code, then deploying to dev, testing, pre-production and finally production.
In fact, Allianz Direct had to maintain two Jenkins instances—one devoted to production, another to non-productive environments. Together, they comprised more than 200 jobs with unknown owners. Due to this complexity, some simple tasks were surprisingly time consuming. For example, adding a new build node or updating Java required a restart, taking 30 minutes or longer.
“Almost everyday, we had issues with the infrastructure serving CI/CD and no control over it,” said Gyorgy Hrabovszki, Software Engineer at Allianz Direct. If a single tool went down, Allianz Direct could not deliver software. For example, an OpenShift incident might cause Jenkins to go down, freezing Allianz Direct’s whole delivery pipeline. Its only recourse was to create an incident, escalate it to P1 and wait for Support to fix it.
For engineers striving to innovate and push boundaries in their companies, lacking control of their technology is frustrating, Petean said. They eventually face a crossroads: Either give up or move to a new organization. “We didn’t like either of those scenarios,” he said. The only satisfactory solution was to rewrite the pipeline.
Successful transformation is 20% tech, 80% talk
At the outset of developing a new pipeline, two things were apparent. First, the pipeline would leverage the most advanced cloud native CI/CD tools. Second, the project required more than technology—collaboration within and across engineering teams was vital.
The DevOps team began by examining their own team mechanics, which needed tuning.
“We were not yet the team we wanted to be,” Petean said. After doubling in size with five new hires, they cultivated new team dynamics for improved collaboration. For example, instead of 15 minutes, they began spending 60–90 minutes in meetings. They also broke silos to form functional working groups.
Choosing technologies on which to standardize was also a team endeavor. Migrating to the new pipeline would affect everyone in the company, including business and security. Likewise, effectively operating it for improved release cycles would require constant teamwork.
“It’s not: you build the pipeline and then everyone is forced to work in it. It’s not enough just to have the technical process. You need a lean organization and a DevOps engineering culture with everyone playing their role.”
Sergiu Petean, Head of Devops, Allianz Direct
Engineering teams often disagree over tools or methods on which to standardize. Team leaders tackled this by communicating with engineers about their preferences in order to reach agreements. They collected feedback from DevOps and software engineers, security specialists and others to design a pipeline backed by all of Allianz Direct.
Cloud native tech & teamwork uplevel pipeline & processes
The team simplified their pipeline by—ironically—swapping out one tool for two new ones. Where they’d used Jenkins to both build and release code, they substituted a pair of CNCF projects: Tekton to build/integrate and Argo CD to deploy. The new release pipeline contains just 10-15 workflows—comprising 40-50 components— with clear owners/developers versus the previous 200. Nine out of the eleven tools in the pipeline are CNCF projects, including: Helm, Prometheus, Vault, CoreDNS, FluentD and others. Prisma for security is the only proprietary tool used.
Crucially, Tekton and ArgoCD placed configuration and source code in different repositories, allowing independent deployment of changes. The power of this arrangement is its simplicity; engineers build an image once and, on deployment, marry it with an environment-specific configuration. The mantra is “build once, deploy everywhere.” With no reset needed, configuration changes now take just a minute. Plus, the new pipeline, with twice the quality and security tests as the old one, still deploys faster.
Even with these safeguards, Allianz Direct is prepared for issues to arise. “In the fast-paced, ever-evolving cloud native environment, incidents are inevitable; high cost is not,” Petean said.
The new stack leverages cloud native observability tools to nip system issues in the bud. For example, synthetic monitoring simulates the customer experience in real time. Teams can quickly spot and resolve problems, such as slow-loading webpages, preventing adverse customer impact. Here again, team coordination is as vital as technology. “We recover from incidents fast, and it takes huge collaboration from all parties to achieve that,” Petean said.
Allianz Direct leads Allianz Group into disruptive future
These upgrades are helping Allianz Direct meet its goal of unmatched customer focus. “When an organization has the ability to release code fast, with stability and security, customer happiness will improve dramatically,” Petean said.
Through such transformative projects, Allianz Direct is setting trends within Allianz Group. Company engineers outside Allianz Direct now seek its guidance on adopting the tools and practices that have made it a success. The long-term objective is to spread its startup mentality and methods throughout Allianz, disrupting it from within to outpace disruption arising without.