Member post by Jatinder Singh Purba, Principal, Infosys; Krishnakumar V, Principal, Infosys; Prabhat Kumar, Senior Industry Principal, Infosys; and Shreshta Shyamsundar, Distinguished Technologist, Infosys
Emerging Trends in the Cloud-native Ecosystem
In the last quarter of 2024, the cloud-native ecosystem continues to be strong as end users reap the benefits of over 10 years of modernization initiatives. This article covers a list of trends observed that will steer investment and end-user interest into cloud-native ecosystems in the year 2025. Most of these trends are driven by foundations such as the Cloud Native Computing Foundation (CNCF), The FinOps Foundation, Open Source Security Foundation (OpenSSF), and LF AI & Data Foundation, under The Linux Foundation.
Cloud Cost Control Gets Cool: FinOps is the New Black
Cloud-native architecture today is the default choice for greenfield projects. The large adoption of cloud-native architecture by enterprise customers is accompanied by an increasing focus on costs over the past few years [1]. Cloud-native architecture is more complex and layered compared to a typical monolith. Automation with infrastructure-as-code (IaC) allows teams to create and destroy infrastructure at scale, creating a highly dynamic environment to meter and manage costs. As a larger number of organizations modernize to cloud and cloud-native architecture, measuring and controlling costs will become crucial.
While FinOps does have a separate foundation, CNCF projects like OpenCost are driving this area forward [2, 3]. OpenCost is a tool that provides visibility into Kubernetes spend and resource allocation. It helps accurately measure and proportion cost. Further, customers are also looking for the right tools and best practices that reduce overall spend without sacrificing performance. The cloud-native ecosystem is fast evolving to meet this need.
This trend is closely associated with the ability to observe and measure resource consumption and other parameters of IT estates. As such, associated projects such as OpenTelemetry, Jaeger, Prometheus, and OpenSearch (part of the Open Search Foundation) also help with this goal.
Organizations can apply this trend in practice by piloting FinOps projects such as OpenCost and trying to obtain a clear picture of where their IT dollars are being spent.
Are Your Devs Drowning in Tools? IDPs Are the Life Raft!
It is no secret that additional tools introduced by cloud-native approaches increase developer friction. While concepts such as containers have revolutionized operations, they have also led to additional concepts that developers must master in addition to the application codebase and tooling. Consequently, the year 2023 saw the rapid rise of internal developer portals (IDPs) and platform engineering to address this.
Backstage is an open framework to develop IDPs [4]. It clocked the highest number of end-user contributions in the 2023 CNCF annual report and the 4th highest velocity across the landscape [5]. This is a testament to the interest that end users have in IDPs. Backstage, in particular, is fast becoming the de-facto standard for building IDPs and accelerating cloud native productivity.
An IDP is necessary to ensure developer productivity and is a sizeable part of the larger topic of platform engineering. In addition to developer portals, clients are looking at platform engineering capabilities to build developer and operations-friendly abstractions on top of the Kubernetes core.
A recent Backstage implementation by Infosys for a leading US insurance company has shown promising results [6]. The solution reduced onboarding time for new developers by about 40% and increased code deployment frequency by 35%. It has greatly optimized the lead time from generating requirements to moving to production, leading to a commensurate increase in customer satisfaction.
End users can benefit by performing an analysis of internal developer portals in the industry and piloting these within their organizations. Implementing Backstage is a promising first step as it is a CNCF project with a great deal of momentum and an increasing number of adopters.
Cloud Native Powers AI
Since 2016, OpenAI, a pioneer in the industry, has been running its training and inference workloads on Kubernetes [7]. It has pushed the limits of platform technology by running clusters with up to 2500 nodes. All the advantages of cloud-native technologies and platforms such as scalability and dynamic nature transfer directly to artificial intelligence (AI) workloads. This is especially true for large language models (LLMs), a fast-moving area of AI technology that is transforming every industry it touches. The trend within the cloud-native landscape to cater to AI training and services is spread across the LF AI & Data and the CNCF foundations [8]. CNCF has also developed and published a cloud-native AI landscape along with a white paper earlier this year [9, 10].
LF AI & Data and CNCF house open-source projects that are critical building blocks for the AI revolution. These include projects such as:
- OPEA, a collection of cloud-native patterns for GenAI workloads [11]
- Milvus, a high-performance vector database [12]
- Kubeflow, a project to deploy machine-learning workflows on Kubernetes [13]
- KServe, a toolset for serving predictive and generative machine-learning models [14]
Apart from projects, there are also foundational improvements and changes being made to Kubernetes such as the elastic indexed job to better handle the demands of AI workloads [15]. Considerable thought leadership in this space is being driven by the members of the cloud-native AI working group under the Technical Advisory Group for Runtime (TAG-Runtime) of CNCF [16].
Companies experimenting with AI typically start with a proprietary SaaS or cloud offering. They can further expand their reach into cloud-native AI by setting up and running open-source projects such as KServe to experiment with open-source LLMs. The ability to curate and self-host LLMs is the first step to addressing privacy, security, and regulatory concerns with proprietary offerings and developing in-house capabilities in this area.
Beyond Silos: Open Standards Unify Observability
Observability is critical to the success of cloud-native programs as their architectures are complex and dynamic. In addition, with the rise of hybrid and multi-cloud environments, it becomes critical to have a comprehensive observability solution. Cloud-native observability must go beyond legacy metrics such as CPU, memory, storage, and network throughput. Further, the volume of metrics and the metadata associated with cloud-native observability are orders of magnitude higher as compared to legacy environments. All of this creates technical and operational challenges in enabling effective observability for cloud-native architectures.
Over the last decade, this area of cloud-native technology has been driven by large, closed-source commercial vendors. Though open-source projects such as Prometheus (a monitoring system with time series database [17]) and Fluentd (a unified logging layer [18]) are well-adopted, end users typically turn to commercial vendors such as Dynatrace, AppDynamics, and Splunk with more fully-featured suites. However, this has led to cost and portability concerns.
Two key CNCF projects are driving change in this area through initiatives such as ‘observability query language specification’ [19]. These projects are OpenTelemetry – a set of tools, APIs and SDKs to create telemetry pipelines for metrics, logs, and traces [20], and the Technical Advisory Group for Observability (TAG-Observability) [21, 22]. This move towards open-source standards is driving healthy changes in the area of observability and offers immense opportunity for both end-users and service providers to get involved.
Piloting CNCF projects such as OpenTelemetry and Fluentd can improve observability pipelines, minimize vendor lock-in, and reduce cost.
Cloud-native Security: A New High-stakes Game
Modern architectures require new and innovative security methods. Concepts such as zero trust and secure supply chain are receiving attention at both end-user organizations and nation-state levels [23]. As the number of applications adopting microservices architecture increase, along with the growing sophistication of organized bad actors (e.g., Lockbit 2.0, Conti), security is on top of the priority list for CNCF [24].
The recently graduated project, Falco (a runtime tool to detect security threats), is a great step in this direction [25]. In addition, cloud-native architectures rely on both CNCF as well as associated landscapes and foundations such as OpenSSF [26] that drive innovation in secure supply chains. TAG-Security is a key part of CNCF [27]. Beyond ensuring security for CNCF projects, it also publishes white papers that offer direction to the industry on the topic of security [28]. The area of policy as code, which is a critical part of security, is being driven by projects such as Open Policy Agent (OPA) and Kyverno [29, 30]. Both provide functionalities to define security policies as code.
While there are cutting-edge open-source projects covering certain parts of the security landscape, this trend is dominated by product suites offered by established security vendors. In addition, each cloud vendor offers a suite of security products native to their implementations such as AWS Inspector and Microsoft Defender for Cloud. This trend of partnerships between open-source and established vendors to offer comprehensive security solutions is likely to continue into the foreseeable future.
End users can work with both vendors and open-source projects to implement security guardrails across all layers of their infrastructure and applications. Organizations can uplift the capability of security teams by educating them on the new challenges created by cloud-native and microservices architecture.
The Future is Green: Why Sustainable IT is No Longer Optional
GreenIT, GreenOps, and sustainability are popular topics today. There are some CNCF projects that service this need by measuring the carbon consumption of applications on the Kubernetes platform. Some examples of these projects include Kepler [31], an operator that collects and exports metrics using Extended Berkely Packet Filter (eBPF) to estimate energy consumption by pods, and OpenCost [32], a tool that provides visibility into Kubernetes spend and resource allocation. Observability into carbon consumption is a key first step to reduce carbon emissions and increase sustainability.
Sustainable IT operations are being driven increasingly by legislation such as the EU sustainability reporting rules and regulations in 2024 [33]. Corporations are currently harvesting low-hanging fruit through programs such as green datacenters, reducing end-user devices, etc. However, in the coming years, they will need to drill down to identify the most sustainable choices for application development.
While tooling for this area is still in its infancy, multiple open-source projects and standards are being developed. End users leading the charge are working to integrate green principles into their tools and applications to reduce their carbon footprint.
As a first step, organizations can pilot Kepler or OpenCost for their container workloads.
For other layers of their installed IT landscape, companies may need to identify and shortlist tools from their cloud provider or third-party vendors to measure carbon cost and sustainability impact.
This list would not be complete without an overarching trend that has been driving cloud-native over the last decade, the adoption of Kubernetes as the cloud-native orchestrator-of-choice for modern technology platforms.
From Niche to Norm: Kubernetes is Now an Essential Enterprise Tool
Google first open-sourced Kubernetes as an open implementation of its internal orchestrator Borg in 2014 [34]. It was the first project of CNCF and the first project to graduate. It has become the de-facto platform for modernization projects. Kubernetes continues with a cadence of three releases per year but a majority of the features being worked on at this stage of its maturity are around reliability, scaling, and security. This is beneficial as it enhances the production readiness of Kubernetes, which is likely to remain the platform-of-choice for both modernization and greenfield developments for the foreseeable future.
End users are preparing for the future of cloud-native implementations by standardizing best practices and creating stable blueprints for the implementation of effective platforms around Kubernetes. This trend is also illustrated by the rise and acceptance of platform engineering, which now focuses on the tooling around Kubernetes required to create an effective platform [35].
Platform engineering is the practice of designing, building, and operating reusable software platforms that provide a foundation for multiple applications to operate. It enables faster delivery, improved quality, and increased scalability. In addition to Kubernetes, the core engine for most platforms today, platform engineering focuses on building blocks such as observability, policy as code, internal developer portals, security, continuous integration/continuous development (CI/CD), and storage. The end result is to provide a set of business capabilities such as reliability, performance, scalability, and availability for any application hosted on the platform and a declarative, self-service approach for users to access these capabilities.
End-user organizations need to build strong platform engineering capabilities. This will have a major impact on how Kubernetes is used to create effective container platforms. A good place to start is to map critical use cases within their organizations that can be standardized and automated through a platform engineering approach.
References:
[1] https://www.cloudkeeper.com/insights/blog/2024-state-finops-report-key-trends-cloud-finops
[2] https://www.finops.org/introduction/what-is-finops/
[5] https://www.cncf.io/reports/cncf-annual-report-2023/
[6] https://www.cncf.io/case-studies/infosysinsurancecustomer/
[7] https://kubernetes.io/case-studies/openai/
[8] https://lfaidata.foundation/
[9] https://landscape.cncf.io/?group=cnai
[10] https://www.cncf.io/reports/cloud-native-artificial-intelligence-whitepaper/
[11] https://opea.dev/
[12] https://milvus.io/
[13] https://kserve.github.io/website/latest/
[14] https://www.kubeflow.org/
[16] https://tag-runtime.cncf.io/wgs/cnaiwg/
[19] https://github.com/cncf/tag-observability/blob/main/working-groups/query-standardization.md
[20] https://opentelemetry.io/
[21] https://github.com/cncf/tag-observability
[22] https://github.com/cncf/tag-observability/blob/whitepaper-v1.0.0/whitepaper.md
[23] https://www.whitehouse.gov/wp-content/uploads/2022/01/M-22-09.pdf
[25] https://falco.org/
[26] https://openssf.org/community/sigstore/
[27] https://github.com/cncf/tag-security
[28] https://tag-security.cncf.io/publications/
[29] https://www.openpolicyagent.org/
[30] https://kyverno.io/
[31] https://sustainable-computing.io/
[34] https://www.sdxcentral.com/articles/news/how-kubernetes-1-29-improves-open-source-cloud-native-production-readiness/2023/12/ [35] https://www.gartner.com/en/articles/what-is-platform-engineering