Guest post originally published on the Elastisys blog by the Elastisys team
DevOps is a cultural movement that has brought a lot of much-needed agility to software development. But it is also misunderstood: does DevOps really mean that developers should, in addition to development, also be amazing at difficult system administration (“operations”) work?
In this article, we discuss the difference in skill sets between developers and platform operators. And what a realistic approach to DevOps truly means: that developers take full ownership of development, release and life cycle management, and perform operations of applications – not the underlying platform.
The Rise and Misunderstanding of DevOps
DevOps is a cultural movement in software development, in which developers take on operational responsibility for their applications. This includes not only development of code as before. It also includes operative tasks such as releasing new versions, managing the life cycle of their code (upgrades, database migrations, etc.), and observing its behavior over time via monitoring and logging.
This greatly improved productivity and agility. Developers taking full ownership of the life cycle of the application means they closed the feedback loop of how their code was behaving in reality. Performance issues were identified directly by the ones responsible for fixing them. Application insights were delivered directly to the developers and designers that would optimize the application user experience. And so on.
But DevOps is often misunderstood. Because developers were encouraged to bite off far more than they could chew.
The Tech that Enabled the DevOps Movement and the Hidden Heroes that Manage it
DevOps sparked an organizational evolution, and a cultural shift. The reason DevOps could enter the stage when it did, in the form we have seen, is that there is tech that supports this way of working. The ability to work with infrastructure as code tools is enabled by technology. Be it hypervisors in the VM setting, containers in the Kubernetes setting, or language runtimes in the function-as-a-service setting. Without these technologies, we would not be able to adopt DevOps as we know it.
But somebody still needs to manage that underlying technology.
In this article, let’s call them platform operators.
What makes a platform operator different from a developer?
Developer vs. Platform Operator Skill Sets
Developers develop, and as such, their most visible output is code. But it’s not just code, because a developer is primarily focused on addressing problems from a problem domain on a high level of abstraction, modeling the problem domain, and turning those models and abstractions into code.
In contrast, a platform operator is focused on the inner workings of computer systems, and making sure that they operate efficiently, securely, correctly, and are available at all times.
The skill sets needed for software development and platform operations tasks differ considerably.
DevOps, in which developers should also be responsible for managing the life cycle of their applications and observing them during runtime, implies not only a shift left in responsibility for developers. It also means they need to learn a set of tools and supporting technology. Observability stacks, such as logging and monitoring systems. Deployment orchestration, such as APIs and tooling for working with the cloud, Kubernetes, or function-as-a-service platforms.
And developers typically embrace these tools and new ways of working. Because with their increased responsibility comes greater agility, and they are more efficient at doing their job. The benefits are immediate and can be felt across the entire organization.
The point, though, is that somebody has to make those new tools and platform components work. Not just install them once, but maintain them, upgrade them, troubleshoot them, and secure them.
Crucially, it is unreasonable to expect developers to also manage the underlying systems.
But that is what platform operators do!
Platform operators have the skills needed to understand the way the technology works on a deep enough level to troubleshoot, e.g., why networking fails, an operating system runs out of file descriptors, or why I/O performance suddenly has dropped.
It is difficult enough to correctly model the “human reality” of the problem domain and turn that model into code. To also understand operating systems on a deep enough level to confidently troubleshoot the “computer’s reality” is inherently different.
Some problems can of course be solved with the “pets vs. cattle” approach to operations, where the runtime (VM, container platform, function runtime) is replaced. But the deeply rooted ones, the ones caused by mistakes that will cause the error to appear repeatedly, must be actually taken care of.
Even cattle need veterinary help if they get sick. Platform operators are those vets.
This is how developers were encouraged to bite off more than they could chew: they were told to do both development and all of the operations. Not the reasonable application operations, but the unreasonable management of the underlying platform, too.
Just because a person is great at JavaScript for frontend and backend, does not automatically mean they are amazing at system administration work, as well.
It also doesn’t mean they are efficient at doing these tasks. Their time is likely more well spent doing what they are experts at, rather than frantically googling for answers to system administration questions. Any and all time spent doing this, means they are pulled from developing new features, and instead (inefficiently?) troubleshooting the platform.
The Security Risk Angle
Putting people in charge of platform security without proper training is also a serious security risk. That field moves at the speed of light. And developers will never be allocated enough time to deal with that whole space. That’s the reason why, in the 2020 Container Report by Datadog, the most common Kubernetes distribution was 17 months old, and therefore unsupported and ineligible for security updates. Unless you specifically make time and resources available for handling security, it will not happen.
The countless security tasks on your backlog are all the proof you need for this. And those are typically about the application’s security. If not even those are addressed, who would take care of the platform-related ones?
So what does real DevOps look like, in practice?
Real DevOps in Practice
Let’s again look at this picture, and focus on the right side of it.
Developers can practice the DevOps way of working by taking on the responsibilities of development and application operations. This allows developers to iterate fast. They can monitor, analyze, plan, and execute code changes in an agile way. It allows them to focus and put all their efforts into building value for the organization via application development.
Enabling them to do so, on a technical level, is the job of the platform operators. This team works in the background, and keeps the platforms and systems humming along nicely. They are the enablers who lay the foundation upon which developers can perform their new duties in a DevOps way.
DevOps was never intended to make developers do both application and platform operations.
Putting realistic expectations on developers reclaims all time lost by having them troubleshoot platform issues. That means more time generating value for your end users while also being more productive and content with their work, without having to mentally switch between development and platform operations.