Guest blog post originally published on PlanetScale’s blog by Deepthi Sigireddi
Over the last three quarters, the team at PlanetScale has focused on the dual goals of making open source Vitess easy to use and easy to contribute to. A part of this effort was a migration of all the integration tests written in Python to Go.
There were several reasons for this project:
- The Python tests were very time-consuming to develop and debug.
- The Python tests added additional install dependencies for anyone getting started as a contributor.
- Support for the Python version being used (2.7) ended on January 1, 2020.
This was a fairly massive project that required several people working on it for almost four months. The project was started around November 1, 2019 and completed on February 25, 2020. There were 197 separate integration tests in 39 files that had to be migrated. In terms of LOC, it was over 24,000 lines of Python code.
In order to accomplish the migration, we first built a test framework in Go (using the command and testing packages) that allowed us to start a Vitess cluster and interact with it programmatically. The framework had to support running multiple tests in parallel without port conflicts; create non-conflicting working directories for all the relevant processes; log sufficient information to enable failure diagnosis, etc. Once that was done, it was a matter of translating Python tests into the equivalent Go code.
Along the way, we were also able to improve the CI pipeline for Vitess. While Travis CI has served us well over the years, we saw an opportunity to switch to GitHub actions. The advantages?
- Larger compute+memory instance types. While Travis CI (and Circle CI for that matter) will provide you with larger instances on paid plans, we really wanted to stay within the free tier so that contributors could run with the same technologies and experience as the core project. Larger sizes are important for Vitess, since the test suite can launch 6 or more instances of mysqld.
- No limit of 5 concurrent jobs. We were using Travis matrix builds for a purpose they weren’t designed for – to split 2 hrs and 30 minutes of testing into 5 “shards” of 30 minutes. That meant that we could only effectively have one concurrent job, and during peak periods there could be a delay of an hour or more to have test suite results. Our new GitHub actions configuration still uses shards, but now with over 14 of them. We are also no longer blocked by other developers running CI tasks at the same time.
The end result of the project is that it is now much easier and faster to develop new integration tests. It is also easier for someone new to the project to get started. The CI changes give us quicker feedback on pull requests and increase throughput on pull requests