Community post by Caleb Woodbine (calebwoodbine.public@gmail.com) (ii.nz)
Optimising the k8s-conformance repo
The Kubernetes conformance project by CNCF ensures consistency in Kubernetes’ stable APIs and core functionalities across different implementations.
A provider will apply for the program through the CNCF and submit their sonobuoy test results as a PR to the cncf/k8s-conformance repo. Then they will be able to use the certified Kubernetes logo with their offering.
The program allows the customers of vendors listed here and users of distributions to have confidence in the way they use Kubernetes, that’s a good lot of options!
The repo has had contributions for over 694 people from ~110 organisations. The repo containing 2658 commits relating to different 21 Kubernetes release versions and all over the course of 6 years. This has lead the need to archive the older releases to make it easier to submit results for new ones.
A look into submissions
Each time there’s a release out, vendors can submit their results via pull requests to the cncf/k8s-conformance repo.
Only four files are required to be included in the submissions:
- README.md
- PRODUCT.yaml
- e2e.log
- junit_01.xml
In the past, some other files may have been included such as a sonobuoy.tar.gz, which ended up making the submissions unnecessarily large.
Cloning issues
In order to submit a pull request, the repo must first be cloned. However, the process of cloning the repo was including a large amount of time and storage. The repo was totalling 7.6GB and process would take somewhere around 10mins depending on internet connection. This increasing amount of time and storage was concerning end users (see: here).
A solution
The repo size needed to be reduced and after doing some reading, a few options came up for mitigating it. These were:
git filter-repo
seemed to be a worthy option to help with the needs of the repo.
The program supports several different ways to rewrite a repo and focus on content. Some common patterns are to include select content or to exclude select content.
The folder structure of the repo has Kubernetes release version named folders then inside that, the product names.
Given the predictable structure, a list of Kubernetes versions to be trimmed could be generated and iterated on to remove the commits containing those release version folders.
The repo was then filtered to remove any files which includes zip files, like so
git filter-repo --force --path-glob '*.zip' --invert-paths
This is selecting all files matching the path glob of *.zip
then inverting it to only select things which don’t match it.
Next selecting a release folder below v1.24, see
git filter-repo --force --path v1.7 --invert-paths
Now with some bash and xargs magic, all the release versions that are releases five below the latest can be removed
OLDEST_UNSUPPORTED='v1.7'
LAST_KEPT_SUBMISSIONS="5"
KUBERNETES_STABLE_VERSION="$(curl -sSL https://storage.googleapis.com/kubernetes-release/release/stable.txt)"
KV_MA="$(echo "${KUBERNETES_STABLE_VERSION}" | grep -Eo '^v[1]')"
KV_MI="$(echo "${KUBERNETES_STABLE_VERSION}" | sed 's/v[1].\(.*\)\..*/\1/g')"
OLDEST_UNSUPPORTED_MI="$(echo "$OLDEST_UNSUPPORTED" | grep -Eo '[0-9]{1,2}$')"
LAST_SUPPORTED_MI="$((KV_MI-=LAST_KEPT_SUBMISSIONS))"
seq "$OLDEST_UNSUPPORTED_MI" "$LAST_SUPPORTED_MI" \
| xargs -I{} \
git filter-repo --force --path "$KV_MA.{}" --invert-paths
The repo is now ~2.7GB 🎉 and an improvement!
A warning
Please note that the process of git filter-repo
rewrites all commits, especially related to the selection. This may be potentially dangerous or break things depending on how the repo is consumed.
e.g:
- commits will have different hashes
- commits may not show up as signed
Prior to tidying up the repo, a backup fork was made to keep the history around. See the archive at cncf/k8s-conformance-archive.
Going forward
A policy was added to ensure that only the aforementioned four files are included in pull requests (README.md, PRODUCT.yaml, e2e.log, junit_01.xml) for the verify-conformance bot.
git filter-repo
likely will be need to be run again in the future though it should be fine for several more releases.
For now, it resolved the issues and means for a quicker clone time and less storage on disk.
Thank you for reading!