Some of you may not be aware that the CNCF community has access to an incredibly valuable reporting tool – DevStats.
CNCF began developing DevStats in 2017 to provide the Kubernetes community with timely and relevant insights into how Kubernetes was dealing with nearly unprecedented growth. Today it has grown to encompass all CNCF projects and, as it is open source, can be customized for nearly any project or metric.
Beyond tracking the stats to monitor the health of all our hosted projects, we also use DevStats in compiling our Annual and Project Journey Reports.
In monitoring DevStats, we just came across an incredible milestone – all CNCF projects combined have surpassed one billion lines of code. That’s right, one billion!
To mark this achievement, we sat down with DevStats creator Łukasz Gryglicki to learn more about the tool, it’s history, and how our community can benefit from it.
CNCF: What is DevStats?
Łukasz Gryglicki: DevStats is a service that takes data from git and GitHub and turns it into graphs reporting community activity. It’s a CNCF-funded project, as well as a service for all CNCF-supported projects. It organizes and displays project data using Grafana dashboards. We host it on some beefy servers generously donated by Packet.
The way it works is that it downloads several petabytes of data representing every public GitHub action of the last six years, and throws out nearly all of it except for the ~1,400 repositories of CNCF-hosted projects. It processes the data and stores it in a Postgres database, and downloads updated data every hour.
DevStats is now (as of about 9 months ago) a Kubernetes-native application, and uses many other CNCF projects, including Helm, containerd, CoreDNS, and more. DevStats is a fully open source project. It also uses Linux Foundation projects, including Linux (Ubuntu) and Let’s Encrypt, as well as Red Hat’s Patroni for supporting running Postgres databases on Kubernetes.
DevStats also allows users to track custom metrics, not just PR issues or commits. It has many non-standard dashboards, such as analyzing bot activity, company affiliation, contributor location, time zone mapping, gender, programming languages, license types, and many more.
CNCF: How did DevStats come to be?
LG: CNCF executive director Dan Kohn proposed the initial architecture for DevStats and recruited me to implement it. We had previously worked together at a healthcare startup, Spreemo. My first implementation was in Ruby, but when I re-implemented in Go I was able to take advantage of concurrency to get a 20x performance improvement.
We created DevStats in 2017 as a way for the Kubernetes community to monitor developer and community data. It was created for the Kubernetes Steering Committee and SIG-Contributor Experience, who needed a tool that would allow in-depth analysis and understanding of what was happening in the community. They were also looking for a way to control the development of such a fast-growing project, with Kubernetes becoming the second largest open source community behind Linux. They needed a tool that understood their workflow (like bot commands, and Kubernetes-specific repository labels). One of the biggest requirements was to allow the analysis of historical data to show how trends evolve.
We first presented the project at KubeCon + CloudNativeCon EU 2018 in Copenhagen. Then, to support better scaling and more resource demands, it was moved to Kubernetes. It became an example of how a full Kubernetes application should look, conforming to all best practices as presented at KubeCon + CloudNativeCon EU 2019 in Barcelona. Now, DevStats covers all CNCF projects, some Linux Foundation projects (like Linux and Zephyr), GraphQL foundations, CDF (Continuous Delivery Foundation), Core Infrastructure Initiative, and more.
CNCF: One Billion lines of code across CNCF projects is an impressive milestone! How did we get here, and what does that mean?
LG: This is a huge milestone for CNCF. First, it means that both CNCF and its projects are growing at an incredible pace. When you think about the fact that “Google Chrome has 6.7 million lines of code, and the operating system Microsoft Windows 10 reportedly has 50 million,” one billion seems all the more impressive. As projects work their way through sandbox and incubation to graduation, they grow and become hardened for enterprise use. The DevStats dashboard shows the number of lines of code by project.
CNCF: Anything else about DevStats the community should know?
LG: DevStats is open source – anyone can fork it and deploy their own instance for their own project(s). We regularly add new dashboards by creating feature requests on DevStats repository, so if you need a special dashboard for your project, file a feature request, and we will review the dashboard for you!
CNCF: Any more exciting new features on the horizon?
LG: More recently, we have been iterating on several modified versions of a project status dashboard based on feedback from the TOC and project maintainers.
We are also in the process of creating RESTful API for DevStats. This means that people will be able to write their own tool, and there is a DevStats API server that can return data on their tool requests. For example, they can write something that queries their project usage daily, and DevStats will return that data as JSON.
Łukasz Gryglicki has been a Senior Developer at CNCF since 2017. Before joining CNCF, Łukasz worked remotely for companies based in the US, including Cleverstep, Jamis, and Spreemo Health.
He loves to travel in polar areas, such as polar Norway, Finland, and Russia. From 2011 to 2012, he was a scientist for a Polish polar expedition to the Hornsund fjord on the island of Spitsbergen, in northern Norway. Łukasz graduated from Warsaw University of Technology with a Master of Science in Engineering. He lives in a small town in Poland with his wife and two kids.