Source |ERDA official account

Take you through the cloud native technology map

If you’ve researched cloud native applications and related technologies, chances are you’ve come across CNCF’s cloud native panorama. The sheer scale and technology of this panorama will no doubt come as a shock, so how do we make sense of it?

If you take it apart and analyze only one piece at a time, you will find that the overall picture is not that complicated. In fact, the panorama is organized by function, and once you know what each category represents, you can easily navigate through the panorama.

In this paper, we first disassemble the whole panorama and make a review of the whole panorama. Then, we focus on each layer (or each column) and make a more detailed interpretation of the problems and principles solved by each category.

1. 4 layers of the cloud native panorama

First, we strip away all the individual technologies and just look at the categories (see figure below). There are different “rows” in the diagram, like different floors of a building, each with its own subcategory. The bottom layer provides the tools to build the cloud’s native infrastructure. From there, you can begin to add the tools needed to run and manage the application, such as the runtime and scheduling layers. At the top, there are tools for defining and developing applications, such as databases, mirroring, and CI/CD tools (which we’ll discuss later).

Well, by now you should remember that the cloud native panorama starts with the infrastructure, and each layer up is closer to the actual application. This is what each layer represents (we will discuss the two “columns” on the right of the figure below). Let’s start at the bottom and parse it layer by layer.

1) Provisioning

Provisioning refers to the tools involved in preparing a standard infrastructure environment for cloud native applications. It includes infrastructure creation, management, automation of configuration processes, scanning, signing, and storing of container images. The provisioning layer also extends into the security realm by providing tools to set and enforce policies, build authentication and authorization in applications and platforms, handle key distribution, and so on. The supply layer includes:

Automation and deployment tools: help engineers build computing environments without human intervention;
Container registry: The executable file that stores the application;
Safety and compliance frameworks for different security areas; Key management solution: Ensuring that only authorized users can access a specific application through encryption.
These tools enable engineers to write infrastructure parameters and enable systems to build new environments on demand, ensuring consistency and security.

2) Runtime layer (Runtime)

Next comes the runtime layer. This word may confuse you. Like many IT terms, runtime is not strictly defined and can be used differently depending on the context. In a narrow sense, a runtime is a sandbox of applications ready to run on a particular machine — that is, the minimum configuration required to keep the application running properly. Broadly speaking, a runtime is all the tools needed to run an application. In the CNCF Cloud native panorama, the runtime ensures the operation and communication of containerized application components, including:

Cloud native storage: virtual disk or persistent storage for containerized applications;
Container runtime: Provides isolation, resources, and security for containers;
Cloud network: The nodes (machines or processes) of a distributed system through which they connect and communicate.

3) Orchestration and Management

Once the infrastructure provisioning is automated in accordance with security and compliance standards (provisioning layer) and the tools needed to run the application (runtime layer) are installed, engineers need to figure out how to orchestrate and manage the application. Choreography and management manage all containerized services (application components) as a group. These containerized services need to identify and communicate with each other and need to be coordinated. This layer provides automation and resilience to cloud native applications, making cloud native applications naturally scalable. This layer contains:

Choreography and scheduling: Deploy and manage container clusters to ensure they are elastically scalable, poorly coupled to each other, and scalable. In fact, orchestration tools (mostly Kubernetes) cluster by managing containers and operating environments;
Coordination and service discovery: enabling services (application components) to locate and communicate with each other;
Remote Process Call (RPC) : A technique that enables communication between services across nodes;
Service broker: A mediator for communication between services. The sole purpose of a service proxy is to have more control over the communication between services without adding anything to the communication itself. Service proxies are critical to the Service Mesh described below.
API Gateway: An abstraction layer through which external applications communicate;
Service Mesh: Similar in some ways to an API gateway, it is a dedicated infrastructure layer for applications to communicate and provides policy-based internal communication between services. In addition, it may include traffic encryption, service discovery, application monitoring, and so on.

4) Application Definition and Developement

Now, we’re at the top. The Application Definition and Development layer, as the name implies, aggregates the tools that allow engineers to build and run applications. All of the above is about building a reliable, secure environment and providing all the required application dependencies. This layer consists of:

Database: Enables applications to collect data in an orderly fashion;
Streaming and messaging: Enables applications to send and receive messages (events and streams). It is not a network layer, but a tool for queuing messages and processing them.
Application definition and mirroring: Services for configuring, maintaining, and running container images (applications’ executables);
Continuous Integration and Continuous Delivery (CI/CD) : Enables developers to automatically test code compatibility with the code base (the rest of the application). If the team is mature enough, you can even automate the deployment of code into production.

2. Tools across all layers

Next we will enter the two columns that run through all the layers on the right side of the cloud native panorama. Observability& Analysis is a tool to monitor the layers, and the platform bundles the different technologies in each layer into a single solution.

1) Observability and Analysis

To limit service outages and reduce average time to problem resolution (MRRT), you need to monitor and analyze all aspects of the application hierarchy so that exceptions can be detected and corrected immediately when they occur. In complex environments where failures are common, these tools can quickly identify and resolve failures, reducing the impact of failures. Because this category runs through and monitors the layers, it is flanked rather than embedded in a layer. This category you will find:

Logging tools: Collect event logs (information about the process);
Monitoring scheme: Collect metrics (numerical system parameters, such as RAM availability);
Tracing tools: Tracing goes a step further than monitoring. They monitor the propagation of user requests in relation to the service grid.
Chaos Engineering: A tool for testing software in a production environment, identifying defects and fixing them to reduce their impact on service delivery.

2) Platform

As you can see, each module in the diagram solves a specific problem. But we know that storage alone does not provide all the functionality an application needs. You also need orchestration tools, container runtimes, service discovery, network and API gateways, and more. The platform covers multiple layers, bringing together different tools to solve larger problems. Configuring and fine-tuning the different modules to make them secure and reliable, and making sure that the technology it leverages is up to date and that all vulnerabilities are patched, is not an easy task. Users don’t have to worry about these details when using the platform. You may notice that all the categories revolve around Kubernetes. That’s because Kubernetes is just one piece of the cloud-native landscape puzzle, but it’s the core of the cloud-native stack. By the way, Kubernetes was the first seed project when CNCF was created, and others followed. Platforms fall into four categories:

K8S release: Unmodified open source (although it has been modified), with additional features added as needed by the market.
Hosted K8S: Similar to the Kubernetes distribution, but hosted by the provider.
K8S Installer: Automates the installation and configuration of Kubernetes.
PaaS/container services: Similar to hosted Kubernetes, but includes a broader set of application deployment tools (often from cloud native landscape maps).

3. Summary

In each category, there are different tools to choose from for the same or similar problems. Some are pre-cloud native technologies for the new reality, while others are entirely new. The difference lies in their implementation and design approach. There is no perfect technology for all your needs. In most cases, technology is constrained by design and architectural choices — always a trade-off. When selecting a technology stack, the engineer must carefully consider each capability and the trade-off to determine the most appropriate option. While this can make things more complicated, it is the most feasible way to choose the most appropriate data storage, infrastructure management, messaging systems, and so on that your application needs. Building a system is now easier than it was in the days before the cloud was native. Cloud native technologies offer greater flexibility when built properly. This may be one of the most important capabilities in today’s rapidly changing tech ecosystem. Let’s go through each layer of the cloud native panorama in detail.

Supply layer

The first layer of the cloud native panorama is provisioning. This layer contains tools for building cloud native infrastructure, such as infrastructure creation, management, automation of configuration processes, scanning, signing, and storing container images. The provisioning layer is also about security, with tools for setting and enforcing policies, building authentication and authorization into applications and platforms, handling secret distribution, and so on.

Let’s take a look at each category of the provisionlayer, the role it plays, and how these technologies can help applications adapt to the new cloud-native environment.

1. Automation and configuration

1) What is it

Automation and configuration tools speed up the creation and configuration of computing resources (virtual machines, networks, firewall rules, load balancers, and so on). These tools can handle different parts of the infrastructure build process, and most can be integrated with other projects and products in the space.

2) Solved problems

Traditionally, IT processes have relied on a highly intensive manual release process with lengthy cycles, typically lasting 3-6 months. These cycles, with many manual processes and controls, make changes to the production environment very slow. This slow release cycle and static environment is not a good match for cloud native development. To shorten development cycles, the infrastructure must be configured dynamically and without human intervention.

3) How to solve the problem

These provision-layer tools enable engineers to build computing environments without human intervention. Through the code environment Settings, just click the button to achieve the environment configuration. Manual Settings are error-prone, but once coded, the environment is created to match the exact state required, which is a huge advantage. Although different tools implement them in different ways, they all use automation to simplify the manual process of configuring resources.

4) Corresponding tools

As we move from the old manual driven build approach to the scale on demand model required by the cloud environment, we find that the old models and tools are no longer adequate, and organizations cannot sustain a 7×24 staff to create, configure, and manage servers. Automated tools such as TerraForm reduce the amount of work required to scale the number of servers and associated network and firewall rules. Tools such as Puppet, Chef, and Ansible can configure servers and applications programmatically when they start up and allow developers to use them.

Some tools interact directly with infrastructure APIs provided by platforms such as AWS or vSphere, while others focus on configuring individual computers to be part of a Kubernetes cluster. Tools such as Chef and Terraform can interoperate to configure the environment. Tools like OpenStack can provide an IaaS environment for other tools to use.

Basically, at this layer, you need one or more tools to set up the computing environment, CPU, memory, storage, and network for the Kubernetes cluster. In addition, you’ll need some of these tools to create and manage the Kubernetes cluster itself.

At the time of this writing, there are three CNCF projects in the field: KubeEdge (a sandbox project) and Kubespray and Kops (the latter two are subprojects of Kubernetes that are also CNCF, although not listed in the panorama). Most of the tools in this category are available in open source and paid versions.

External chain picture transfer failed, the source station may have anti hotlinking mechanism, it is recommended to save the picture straight

2. Container Registry

1) What is it

Before defining the Container Registry, let’s first discuss three closely related concepts:

A container is a set of technical constraints for executing a process. Processes started inside the container believe that they are running on their own dedicated machine, rather than on a machine shared with other processes (similar to virtual machines). In short, containers give you control over how your code runs in any environment.
An image is a set of archive files needed to run a container and its procedures. You can think of it as a form of template on which you can create an unlimited number of containers.
A repository is a space where images are stored.

Back to Container Registry, this is a dedicated Web application for sorting and storing repositories.

The image contains the information needed to execute the program (within the container) and is stored in the repository, which is sorted and grouped. The tools that build, run, and manage the container need to access these images (by referring to the repository).

2) Solved problems

Cloud native applications are packaged and run as containers. The Container Registry is responsible for storing and providing these Container images.

3) How to solve it

By storing all container images centrally in one place, these container images can be easily accessed by the developer of the application.

4) Corresponding tools

The Container Registry either stores and distributes images or enhances existing repositories in some way. Essentially, it is a Web API that allows the container engine to store and retrieve images. Many Container Registry provide interfaces that enable Container scanning/signing tools to enhance the security of stored images. Some Container Registry can distribute or replicate images in a particularly efficient way. Any environment that uses containers requires the use of one or more repositories.

Tools in this space can provide integrated capabilities to scan, sign, and inspect their stored images. At the time of this writing, Dragonfly and Harbor are CNCF projects in the space, and Harbor recently became the first OCI compliant repository. The major cloud providers provide their own managed repositories, and other repositories can be deployed independently or directly into a Kubernetes cluster through tools such as Helm.

3. Safety and compliance

1) What is it

The goal of cloud native applications is to iterate quickly. In order to publish code on a regular basis, you must ensure that the code and operating environment is secure and accessible only to authorized engineers. The tools and projects in this section allow you to create and run modern applications in a secure way.

2) Solve what problem

These tools and projects can enforce, monitor, and enforce security for platforms and applications. They enable you to set policies (for compliance) in the container and Kubernetes environment, gain insight into existing vulnerabilities, catch misconfigurations, and harden containers and clusters.

3) How to solve it

To run containers securely, they must be scanned for known vulnerabilities and signed to ensure that they have not been tampered with. The default access control of Kubernetes is relatively loose, which makes a Kubernetes cluster an easy target for someone who wants to attack the system. Tools and projects in this space help to enhance the cluster and provide tools to detect anomalies when the system is running.

4) Corresponding tools

To operate safely in a dynamic, rapidly evolving environment, we must consider security as part of the platform and application development life cycle. There is a wide variety of tools in this section that address different aspects of the security domain. Most tools fall into the following categories:

Audit and Compliance
The production environment reinforces the tool path
- Code scanning
- Vulnerability scanning
- Image signature
Strategy formulation and implementation
Network layer security

Some of these tools and projects are rarely used directly. Examples are Trivy, Claire, and Notary, which are utilized by Registry or other scanning tools. Other tools are key enhancements to modern application platforms, such as Falco or the Open Policy Agent (OPA).

There are many established vendors offering solutions in this space, and there are many startups in the business of bringing the native Kubernetes framework to market. At the time of this writing, FALCO, Notary/TUF, and OPA are the only CNCF projects in the field.

4. Key and identity management

1) What is it

Before we get into key management, let’s first define the key. A key is a string used to encrypt or sign data. Like real life keys, keys lock (encrypt) data, and only those with the correct key can unlock (decrypt) the data. As applications and operations begin to adapt to the new cloud-native environment, security tools are evolving to meet new needs. Tools and items in this category can be used to securely store passwords and other secrets (such as API keys, encryption keys, and other sensitive data), securely remove passwords and secrets from micro-service environments, and more.

2) Solved problems

Cloud native environments are highly dynamic and require fully programmed (unattended) and automated Secret on demand distribution. The application must also know whether a given request comes from a valid source (authentication) and whether the request is authorized to perform an action (authorization). These are commonly referred to as AuthN and AuthZ.

3) How to solve it

Each tool or project is implemented in a different way, but they all provide:

A method of securely distributing a secret or key.
A service or specification for authentication or/and authorization.

4) Corresponding tools

Tools in this category can be divided into two groups:

Some tools focus on key generation, storage, management, and rotation.
Others focus on single sign-on and identity management.

Vault, for example, is a general-purpose key management tool that manages different types of keys. KeyCloak is an identity broker that manages access keys for different services.

At the time of this writing, Spiffe/Spire is the only CNCF project in the space.

The provisioning layer focuses on building the foundation for cloud native platforms and applications, including tools for infrastructure provisioning, container registries, and security. This section details the lowest level of the cloud native panorama.

The runtime layer

In this section we’ll take a look at the Runtime layer, which contains everything the container needs to run in the cloud native environment. The code that starts the container, also known as the runtime engine; Tools that enable the container to obtain persistent storage And tools for managing the container environment network.

Be careful, however, not to confuse this layer of resources with the infrastructure and provisioning layer of networking and storage, whose job is to get the container platform up and running. Containers use run-time layer tools directly to start and stop, store data, and communicate with each other.

1. Cloud native storage

1) What is it

Storage is where an application’s persistent data is stored, also known as a persistent volume. Easy access to persistent volumes is critical for your application to run reliably. Usually, when we say persistent data, we mean databases, messages, etc., or any other information that is not lost when the application is restarted.

2) Solved problems

Cloud native architectures are highly flexible and resilient, which makes it challenging to store persistent data when an application is restarted. Containerized applications constantly create or delete instances and change their physical locations over time as they scale, shrink, or automatically recover. Therefore, cloud native storage must be provided in a node-independent manner. However, to store data, you need hardware (specifically disks). Disks, like other hardware, are limited by infrastructure. That’s the first big challenge. The second challenge is the storage interface. This interface can vary a lot between data centers (in the past, different infrastructures had their own storage solutions with their own interfaces), which made portability difficult. Finally, due to the resilience of the cloud, storage must be configured in an automated manner because manual configuration and automatic scaling are incompatible. In the face of these problems, cloud native storage is tailored for the new cloud native environment.

3) How to solve it

Tools for this category can:

Provide cloud native storage options for containers
Standardize the interface between the container and the storage provider.
Provide data protection through backup and restore operations.

Cloud native storage means using container storage interfaces that are compatible with the cloud native environment (the tools in the next category) and can be configured automatically, allowing for automatic scaling and self-recovery by eliminating labor bottlenecks.

4) Corresponding tools

The Container Storage Interface (CSI) makes cloud native storage possible to a large extent. CSI allows file and block storage to be provided to containers using standard APIs. There are a number of tools in the space, both open source and vendor, that leverage CSI to provide on-demand storage for containers.

In addition to this extremely important capability, there are a number of other tools and technologies designed to address storage issues in the cloud native space. Minio is a popular project that provides an S3-compliant API for object storage. Tools like Velero can help simplify the backup and restore process of the Kubernetes cluster itself as well as the persistent data used by the application.

2. Container runtime

1) What is it

As mentioned earlier, a container is a set of technical constraints used to execute an application. Containerized applications believe they are running on a dedicated computer, ignoring that they are actually sharing resources with other processes (similar to virtual machines). A container runtime is software that executes a containerized (or “quarantined”) application. If there is no runtime, there will only be a container image — a file that specifies the appearance of the containerized application. The runtime starts the application in the container and provides it with the resources it needs.

2) Solved problems

Container images (files with application specifications) must be started in a standardized, secure and isolated manner:

Standardization: Standard operating rules are required no matter where they are run;
Security: Access permissions should be carefully set;
Isolation: The application should not affect or be affected by other applications (for example, an application in the same location crashes). Quarantine is basically protective.

In addition, resources such as CPU, storage, memory, and so on must be provided to the application.

3) How to solve it

The container runtime does all of this. It starts applications in all environments in a standardized manner and sets security boundaries. Security boundaries are what make the runtime different from other tools, and runtimes such as CRI-O or Gvisor enforce their security boundaries. The runtime also sets resource limits for the container. Without resource constraints, applications may consume resources as needed, potentially hogging other applications’ resources. It is therefore necessary to set resource limits.

4) Corresponding tools

Not all tools in this category are the same. Containerd (part of the Docker product) and CRI-O are standard container runtime implementations. There are tools that extend the use of containers to other technologies, such as KATA, which allows containers to run as VMs. Other tools are designed to address container-specific issues, such as gVisor, which provides an additional layer of security between the container and the OS.

3. Cloud native network

1) What is it

Containers communicate with each other and with the infrastructure layer through the cloud native network. Distributed applications have multiple components that use the network for different purposes. Tools in this category overlay a virtual network on top of an existing network specifically for applications to communicate, known as an overlay network.

2) Solve what problem

Usually we refer to the code that runs in a container as an application, but in reality, most containers contain only a small portion of the specific functionality of a larger application. Modern applications such as Netflix or Gmail are actually made up of many smaller components, each running in its own container. Containers need to communicate with each other in order for all of these separate parts to function properly and form a complete application. The tools in this category provide the private communication network. In addition, the messages exchanged between these containers may be private, sensitive, or very important. This leads to other requirements: for example, the ability to provide isolation for various components and to examine traffic to identify network problems. In some cases, these networks and network policies (such as firewalls and access rules) may also need to be extended so that applications can connect to VMs or services running outside the container network.

3) How to solve it

Projects and products in this category use a project in CNCF, the Container Network Interface (CNI), to provide networking capabilities for containerized applications. Some tools, such as Flannel, provide only basic connections for containers. Other tools, such as NSX-T, provide a complete software-defined network layer that creates an isolated virtual network for each Kubernetes namespace. At a minimum, the container network should assign IP addresses to PODs (where containerized applications run in Kubernetes) to allow other processes to access them.

4) Corresponding tools

CNI standardizes the way the network layer provides functionality to POD, which largely enables diversity and innovation in the field. Choosing a network for the Kubernetes environment is critical, and there are many tools to choose from. Weave Net, Antrea, Calico, and Flannel all offer effective open source networking layers that vary in functionality and should be selected for your specific needs.

In addition, many vendors are ready to support and extend the Kubernetes network with software-defined networking (SDN) tools that allow you to gain insight into network traffic, enforce network policies, and even extend container networks and policies to a wider range of data centers.

This article provides an overview of the runtime layer, which provides the tools the container needs to run in the cloud native environment, including:

Storage: Make it easy and fast for applications to access the data they need to run.
Container runtime: Executes application code;
Networking: Ensures communication between containerized applications.

In the next section, we’ll explore orchestration and management, which deals with how to manage all containerized applications as a group.

Choreography and Management

Choreography and management are the third layer of the CNCF cloud native panorama. Before using the tools at this layer, the engineer would presumably have automatically configured the infrastructure to comply with security compliance standards and set up the runtime (runtime layer) for the application. Now they must figure out how to orchestrate and manage all the application components as a whole. These components must recognize each other to communicate and coordinate to achieve a common goal. Cloud native applications are naturally extensible based on the automated and elastic scalability of the orchestration and management tools.

1. Choreography and scheduling

1) What is it

Choreography and scheduling refers to running and managing containers in a cluster (a new way to package and ship applications). A cluster is a group of machines (either physical or virtual) connected over a network. The Container Choreographer (and Scheduler) is similar to the operating system on a computer that manages all applications, such as Microsoft 360, Zoom, Slack, and so on. The operating system executes the applications you want to use and plans which applications should use the computer’s CPU and other hardware resources and when. While it’s great to run everything on one machine, most applications today are far too large for a single machine to handle, and most modern applications are distributed, which requires some kind of software that can manage components running on different machines. Simply put, you need a “clustered operating system.” This is the orchestration tool. You may have noticed that containers came up a lot in the previous articles in this series. The ability of containers to allow applications to run in different environments is key. The same is true of container choreographers (in most cases, Kubernetes). Containers and Kubernetes are at the heart of the cloud native architecture, so we hear about them all the time.

2) Solved problems

In a cloud native architecture, an application is decomposed into many small components or services, each of which is placed in a container. You’ve probably heard of microservices, and that’s exactly what it means. Instead of having one large application, you now have several small services, each of which needs resources, needs to be monitored, and needs to be fixed when things go wrong. It is possible for a single service to perform these operations manually, but when you have hundreds of containers, you need automated processes.

3) How to solve it

The container choreographer automates the container-managed process. What does this mean in practice? Let’s answer this question with Kubernetes, because Kubernetes is the de facto container choreographer. What Kubernetes does is “expected state coordination” : it matches the current state of the containers in the cluster with the expected state. The engineer specifies the required state in the file, for example: 10 instances of service A running on three nodes (i.e., machines), access to the B database, and so on. This state needs to be continuously compared with the actual state. If the expected state does not match the actual state, Kubernetes will mediate by creating or destroying objects (for example, if a container crashes, Kubernetes will start a new container). In short, Kubernetes allows you to think of a cluster as a computer. It only focuses on the environment and handles the implementation details for you.

4) Corresponding tools

Kubernetes and other container choreographers (Docker Swarm, Mesos, etc.) are choreographing and scheduling tools whose basic purpose is to allow multiple different computers to be managed as a resource pool and managed declaratively without having to tell Kubernetes what to do. Rather, it provides a definition of the work to be done. This allows you to maintain the required state in one or more YAML files and apply it to other Kubernetes clusters. The choreographer itself then creates missing content or removes things that don’t need to be there.

Although Kubernetes is not the only choreographer hosted by CNCF (Crossplane and Volcano are the other two incubation projects), it is the most commonly used and the project also has a large number of active maintainers.

2. Coordination and service discovery

1) What is it

Modern applications consist of multiple separate services that need to collaborate to provide value to the end user. To collaborate, these services communicate over the network (as discussed in the runtime layer). To communicate, services need to be able to locate each other. Service discovery is the solution to this problem.

2) Solved problems

The cloud native architecture is dynamic and always changing. When a container on one node crashes, a new container is started on another node to replace it. Or, as an application expands, its copies are scattered throughout the network. There is no one place to provide a specific service, and the location of everything is constantly changing. Tools in this category keep track of services in the network so that they can look up each other when needed.

3) How to solve it

Service discovery tools provide a common place to find and identify individual services. There are two tools in this category:

Service discovery engine: Database-like tool that stores information about what services exist and how to locate them;
Name resolution tools such as CoreDNS: Receive service location requests and return network address information.

Note: In Kubernetes, to make POD reachable, a new abstraction layer called “Service” was introduced. Service provides a single, stable address for dynamically changing POD groups. Please note that “Service” has different meanings in different contexts and may cause confusion. “Services” usually refer to services that reside in a container /Pod and are application components or micro-services that have specific functions in the actual application (e.g., the iPhone’s facial recognition algorithm). Kubernetes’ Service is an abstraction that helps PODs find and locate each other. It is a service (functionally) that acts as an entry point to a process or POD. In Kubernetes, when you create a Service (abstraction), you create a set of PODS that together provide a Service (functionality) through a single Endpoint (entry).

4) Corresponding tools

As distributed systems become more common, traditional DNS processes and load balancers can no longer keep up with changing Endpoint information, hence the service discovery tool. They can be used to handle individual application instances that quickly register and unregister themselves. Some service discovery tools (such as ETCD and Coredns) are native to Kubernetes, while others have custom libraries or tools that make services run efficiently. Coredns and ETCD are CNCF projects and are built into Kubernetes.

3. Remote process calls

1) What is it

Remote Procedure Call (RPC) is a special technique that enables applications to communicate with each other. It represents a way for applications to build communication between each other.

2) Solved problems

Modern applications consist of many individual services that must communicate to collaborate. RPC is a way for applications to communicate with each other.

3) How to solve it

RPC handles communication between services in a tightly coupled and highly self-conscious manner. It allows bandwidth efficient communication, and many languages support RPC interface implementations. RPC is not the only solution to this problem, nor is it the most common one.

4) Corresponding tools

RPC provides a highly structured and tightly coupled interface for communication between services. GRPC is a very popular RPC implementation that has been adopted by CNCF.

4. Service Agents

1) What is it

Service proxy tools are used to intercept traffic going in and out of one service, apply some logic to it, and then forward that traffic to another service. A service broker is essentially a “middleman” that collects information about network traffic and applies rules to it. It can be as simple as acting as a load balancer to forward traffic to a single application, or as complex as a side-by-side grid of agents, with a single containerized application handling all network connections. The service broker itself is useful, especially when directing traffic from the wider network to the Kubernetes cluster. Service proxies also provide the foundation for other systems, such as API gateways or service grids, which we will discuss below.

2) Solved problems

Applications should send and receive network traffic in a controlled manner. In order to track traffic and convert or redirect it, we need to collect data. Traditionally, the code that enabled data collection and network traffic management was embedded in each application. Service proxies allow us to “externalize” this functionality so that it no longer exists in the application, but is embedded in the platform layer (where the application runs). This is a very powerful feature because it allows developers to focus entirely on writing application logic while the common task of handling traffic is managed by the platform team (which is the primary responsibility of the platform team). By centrally assigning and managing globally required service functions (such as routing or TLS termination) from a single common location, communication between services becomes more reliable, secure and efficient.

3) How to solve it

Agents act as gatekeepers between users and services or between different services. From this unique location, they can gain insight into the type of communication that is taking place. Based on the insight, they can determine where to send a particular request or even reject it altogether. The agent collects critical data, manages routing (evenly distributing traffic between services or rerouting when certain services fail), encrypts connections, and caches content (reduces resource consumption).

4) Corresponding tools

Service proxies work by intercepting traffic between services, performing some logic on them, and then perhaps allowing traffic to proceed. By putting a centrally controlled set of functions into this agent, an administrator can accomplish several things. They can gather detailed metrics about communication between services, prevent service overloads, and apply other common standards to services. Service proxies are the foundation of other tools, such as the Service Grid, because they provide a way to enforce higher-level policies for all network traffic.

Note that CNCF includes the Load Balancer and the Ingress Provider in this category. Envoy, Contour and BFE are all CNCF projects.

5. API gateway

1) What is it

People usually interact with computer programs through GUIs (graphical user interfaces), such as Web pages or (desktop) applications, and computers interact with each other through APIs (application programming interfaces). However, do not confuse an API with an API gateway. API gateways allow organizations to move critical functions, such as authorizing or limiting the number of requests between applications, to a centrally managed location. It also serves as a common interface for (often external) API users. Through API gateways, organizations can centrally control (restrict or enable) the interactions between applications and track them, enabling functions such as request denial, authentication, and preventing overuse of services (also known as rate limiting).

2) Solved problems

Although most containers and core applications have APIs, an API gateway is more than just an API. The API Gateway simplifies the way organizations manage rules and apply them to all interactions. API gateways allow developers to write and maintain less custom code. They also enable the team to view and control the interaction between the user and the application itself.

3) How to solve it

The API gateway sits between the user and the application. It acts as a mediation to forward messages (requests) from the user to the appropriate service. But before handing in a request, it evaluates whether the user’s request was allowed, keeping a detailed record of who made the request and how many requests were made. In short, an API gateway provides a single point of entry for application users with a common user interface. It can also hand over tasks that would otherwise be implemented in the application to the gateway, saving the developer time and money.

4) Corresponding tools

Like many categories in this layer, the API Gateway removes custom code from the application and brings it into the central system. The API Gateway works by intercepting calls to back-end services, performing some value-added activity, such as validating authorization, collecting metrics, or transformation requests, and then performing whatever action it deems appropriate. The API gateway is a common entry point for a set of downstream applications, while providing a place for teams to inject business logic to handle authorization, rate limits, and request denial. They enable the application developer to extract changes to the downstream API from the customer and leave tasks such as adding a new customer to the gateway.

6. Service grid

1) What is it

If you’ve learned a bit about cloud nativeness, the term “service grid” has probably been heard. Service grids have gotten a lot of attention lately. “After Kubernetes, service grid technology has become the most critical part of the cloud native stack,” said Janakiram MSV, a long-time contributor to TNS. The service grid manages the traffic (that is, communication) between services. They enable the platform team to uniformly add reliability, observability, and security capabilities across all services running within the cluster without changing any code.

2) Solve what problem

In the cloud native environment, we deal with a lot of services that need to communicate. This means that more traffic has to be sent back and forth over an otherwise unreliable and often slow network. To meet these new challenges, engineers must implement additional functions. Before the service grid, this functionality must be encoded into each individual application. This code often becomes technical debt and leads to failures or bugs.

3) How to solve it

Service grids increase reliability, observability, and security across all services at the platform level without touching application code. They are compatible with any programming language, allowing development teams to focus on writing business logic. Note: Traditionally, these service grid capabilities have to be coded into each service, so each time a new service is released or updated, developers have to ensure that these capabilities are also available, which can lead to a lot of human error. In fact, developers prefer to focus on business logic (the functions that generate value) rather than building reliability, observability, and security functions. But for platform owners, reliability, observability, and security are core features that are critical to everything they do. Putting developers in charge of adding functionality that platform owners need is difficult in itself. Service grids and API gateways solve this problem because they are implemented by the platform owner and are universally applied to all services.

4) Corresponding tools

A service grid creates a grid of services by tying together all the services running on a cluster through a service broker. These are managed and controlled through the service grid control plane. A service grid allows platform owners to perform common operations or collect data on applications without requiring developers to write custom logic. In essence, a service grid is an infrastructure layer that manages communication between services by providing command and control signals to the network or grid of the service broker. Its ability to provide critical system functions without modifying the application.

Some service grids use a generic service proxy (see above) for their data plane. Others use dedicated agents. For example, Linkerd uses Linkerd2-proxy “microproxies” to gain performance and resource consumption advantages. These agents are uniformly attached to each service via sidecars. Sidecar means that the agent runs in its own container but exists in the same Pod, like a motorcycle Sidecar, which is a separate module attached to the motorcycle.

The service grid provides many useful features, including displaying detailed metrics, encrypting all traffic, limiting the actions that services can authorize, providing additional plug-ins for other tools, and so on.

For more details, see the Service Grid Interface Specification:https://smi-spec.io/.

7. Summary

Choreography and management tools are designed to manage individual containerized applications as a group. The orchestration and scheduling tool can be thought of as a cluster operating system for managing containerized applications across the cluster. Coordination and service discovery, service brokers and service grids ensure that services can find each other and communicate effectively, collaborating with each other to become a smooth application. The API gateway is an additional layer that gives more control over service communication, particularly between external applications. In the next section, we will discuss the application definition and development layer — the last layer of the CNCF panorama. It covers databases, data flow and messaging, application definition and mirroring, and continuous integration and delivery.

Application definition and development layer

Now we’re at the top of the cloud native panorama. The Application Definition and Development layer, as the name implies, focuses on the tools that help engineers build applications and get them running. While the previous part of this article is all about building a reliable and secure environment and providing all the necessary application dependencies, the application definition and development layer is all about building software.

1. The database

1) What is it

A database management system is an application that helps other applications store and retrieve data efficiently. Databases secure data storage, only authorized users have access to the data, and allow users to retrieve the data on specific requests. Although there are many different types of databases, their overall goal is the same.

2) Solved problems

Most applications need efficient ways to store and retrieve data, and to keep the data secure. The database uses mature technologies to do this in a structured way.

3) How to solve it

Databases provide a common interface for storing and retrieving application data. Developers use these standard interfaces and a simple query language to store, query, and retrieve information. At the same time, the database allows users to continuously backup and save data as well as encrypt and manage data access rights.

4) Corresponding tools

We have seen that a database management system is an application for storing and retrieving data. It uses a common language and interface and can be easily used by multiple languages and frameworks.

Two common database types are Structured Query Language (SQL) databases and NoSQL databases. Which database an application uses should be driven by its requirements.

Kubernetes supports stateful applications, and its use has grown in recent years, and we’ve seen a new generation of databases that take advantage of containerized technology. These new cloud native databases are designed to bring the scalability and availability benefits of Kubernetes to the database. Tools such as Yugabyte and Couchbase are typical cloud native databases, and Vitess and TIKV are CNCF projects in this area.

Note: When you look at this category, you’ll find multiple names ending in DB (for example, MongoDB, CockroachDB, Faunadb) that you might assume represent databases. There are also various names that end in SQL (such as MySQL or MemSQL). Some are “old school” databases that have adapted to cloud native environments, while others are SQL-compatible NoSQL databases, such as Yugabyte and Vitess.

2. Data flow and messaging

1) What is it

Data flow and messaging tools enable service-to-service communication by transferring messages (that is, events) between systems. A single service connects to a messaging service to publish events and/or read messages from other services. This dynamic creates an environment in which a single application, either as a publisher, can write events; Either a subscriber to an event, or more likely both.

2) Solved problems

As services proliferate, application environments become more complex and orchestration of communication between applications becomes more challenging. The data flow or messaging platform provides a central place to publish and read all the events that occur in the system, allowing applications to work together without having to know each other.

3) How to solve it

When a service does something that other services should know about, it “publishes” events to a data stream or messaging tool. Services that need to be aware of these event types will subscribe to and monitor data flows or messaging tools. This is the essence of publish-subscribe. Services can be decoupled from each other by introducing an “intermediate layer” that manages communication. Services simply monitor events, take action, and publish new events, creating a highly isolated architecture. In this architecture, services can collaborate without knowing each other. This decoupling enables engineers to add new functionality without having to update downstream applications (consumers) or send numerous queries. The more decoupled the system, the more flexible and adaptable it is to change, which is what engineers are looking for in a system.

4) Corresponding tools

Data streaming and messaging tools existed long before cloud native technologies became a reality. To centralize the management of critical business events, organizations have established a large enterprise-class service bus. However, when we talk about data flow and messaging in a cloud-native environment, we usually mean tools like NATS, RabbitMQ, Kafka, or cloud-provided message queues.

Messaging and data flow transmission systems provide a central location for the orchestration system to communicate. The message bus provides a common place that all applications can access, either by publishing messages to tell their services what they are doing or by subscribing to messages to see what is happening.

Both the NATS and CloudEvents projects are incubators in this space, with NATS providing a mature messaging system and CloudEvents working to standardize message formats between systems. Strimzi, Pravega, and Tremor are sandbox projects, each tailored to a unique use case for data flow and messaging.

3. Application definition and mirroring

1) What is it

Application definition and mirroring is a broad category that can be divided into two main subcategories:

Development-focused tools: Helps build application code into containers and Kubernetes.
Operations focused tools: Deploy applications in a standardized way.

Whether it’s speeding up or simplifying the development environment, providing a standardized way to deploy third-party applications, or simplifying the process of writing new Kubernetes extensions, tools like this can optimize the experience of Kubernetes development and operations personnel.

2) Solved problems

Kubernetes (or the containerized environment) is very flexible and powerful. This flexibility brings with it complexity, mainly in the form of numerous configuration options for a variety of new use cases. The developer must containerize the code and develop it in a class production environment. With rapid release planning cycles, operations personnel need a standardized way to deploy applications to a container environment.

3) How to solve it

Tools in this area are designed to address some of the challenges faced by people in development or operations. For developers, there are tools that make it easy to extend Kubernetes to build, deploy, and connect applications. Many projects and products can store or deploy pre-packaged applications, allowing operations personnel to quickly deploy a streaming service like Kafka or install a service grid like Linkerd. Developing cloud native applications presents a whole new set of challenges that require a wide variety of tools to simplify application building and deployment. Check out the tools in this category when you need to solve development and operations problems in your environment.

4) Corresponding tools

Application definition and build tools cover a wide range of capabilities, such as using Kubevirt to extend Kubernetes to virtual machines, or using tools such as Telepresence to port the development environment to Kubernetes to speed up application development. In general, tools in this domain can address the problems faced by developers in correctly writing, packaging, testing, or running custom applications, as well as the problems faced by operations personnel in deploying and managing applications.

Helm, the only graduate in this category, laid the foundation for many application deployment patterns. Helm allows Kubernetes users to deploy and customize some popular third-party applications, and projects such as Artifact Hub (the CNCF Sandbox project) and Bitnami have used Helm to provide a selected catalog of applications. Helm is also flexible enough to allow users to customize their own application deployments.

Operator Framework is an incubator project designed to simplify the process of building and deploying Operators. Operator is outside the scope of this article, but note that it is similar to Helm in helping to deploy and manage applications. Cloud Native BuildPacks is another incubation project designed to simplify the process of building application code into containers.

Continuous integration and continuous delivery

1) What is it

Continuous integration (CI) and continuous delivery (CD) tools enable a fast and efficient development process with embedded quality assurance. CI automates code changes by building and testing code immediately, ensuring that deployable artifacts are produced. The CD goes one step further, pushing the artifact into deployment. Mature CI/CD systems monitor changes in source code, build and test the code automatically, and then move it from development to production. During this process, the CI/CD system must pass various tests or validations to determine whether the process will continue or fail.

2) Solved problems

Building and deploying applications is a difficult and error-prone process, especially when there are many human interventions and manual steps involved. If the code is not integrated into the code base, the longer the developer spends on the software, the longer it takes to identify bugs and the more difficult it is to fix them. By integrating code on a regular basis, errors can be found early and troubleshooted more easily. After all, it’s much easier to find errors in a few lines of code than in hundreds of lines. While tools like Kubernetes offer great flexibility for running and managing applications, they also present new challenges and opportunities for CI/CD tools. Cloud native CI/CD systems can leverage Kubernetes itself to build, run, and manage CI/CD processes (commonly referred to as pipelining). Kubernetes also provides information about the health of the application, making it easier for the cloud native CI/CD tool to determine whether a given change was successful and needs to be rolled back.

3) How to solve it

CI tools ensure that any code changes or updates introduced by developers are automatically and continuously built, validated, and integrated with other changes. Automated tests are triggered each time the developer adds an update to ensure that only good code can import it into the system. CD extends CI and can push the results of CI processes into class production and production environments. Suppose a developer changes the code for a Web application. The CI system sees the code changes and builds and tests a new version of the Web application. The CD system takes the new version and deploys it into the development, test, pre-production, and final production environments. It does this when the deployed application is tested after each step in the process. Together, these systems form the CI/CD pipeline for the Web application.

4) Corresponding tools

Over time, a number of tools have been available to help move code from the repository to the production environment where the final application runs. As in most other areas of computing, the arrival of cloud native development has changed CI/CD systems. Traditional tools like Jenkins (probably the most widely used CI tool on the market) have gone through refinement iterations to better fit into the Kubernetes ecosystem. Companies such as Flux and Argo have pioneered a new method of continuous delivery called GITOPS. In general, projects and products in this area are:

CI system;
The CD system;
Tools to help the CD system determine if code is ready for production;
A collection of the first three (such as Spinnaker and Argo).

Argo and Brigade are the only CNCF projects in the field, but you can find many more hosted by the Continuous Delivery Foundation. Find tools in this space that can help your organization automate the production path.

5. Summary

Tools in the application definition and development layer enable engineers to build cloud native applications. Tools for this layer include:

Database: Stores and retrieves data.
Data flow and messaging tools: Implement a separate, well-designed architecture.
Application definition and mirroring tools: Contains a variety of techniques to improve the developer and operator experience.
CI/CD tools: Ensures code is deployable and helps engineers detect errors early to ensure code quality.

Hosted K8S and PaaS to solve the problem

In the above section, we discussed the layers of the CNCF Cloud Native Panorama: provisioning, runtime, choreography management, and application definition and development. In this chapter we will focus on the platform layer.

As we’ve seen in this article, each category addresses a specific problem. Storage alone doesn’t provide all the functionality you need to manage your application; you also need tools for choreography management, container runtimes, service discovery, networking, API gateways, and more. The platform bundles together tools from different layers to solve larger problems. There are no new tools in the platform. You can certainly build your own platform, and in fact, many organizations do. However, it is not easy to reliably and securely configure and fine-tune different modules while ensuring that all technologies are always up to date and bugs are fixed. You need a dedicated team to build and maintain it. If you don’t have the required expertise, it might be better to use the platform. For some organizations, especially those with small engineering teams, platforms are the only way to adopt cloud-native technologies. As you may have noticed, all platforms have evolved around Kubernetes because Kubernetes is the core of the cloud native stack.

1. Kubernetes distribution

1) What is it

A distribution is when a vendor takes Kubernetes as the core (using unmodified open source code, although some people have modified it) and packages it up for redistribution. Often this process involves finding and validating Kubernetes software and providing a mechanism for cluster installation and upgrading. Many distributions of Kubernetes include other closed or open source applications.

2) Solved problems

The open source Kubernetes does not specify a specific installation tool, but rather provides a number of Settings and configuration options to the user. In addition, limited community resources (including community forums, StackOverflow, Slack, or Discord, etc.) can no longer solve all problems. As Kubernetes grows in popularity, it becomes easier to use, but finding and using open source installers can be challenging. Users need to know which version to use, where to get it, and whether a particular component is compatible. In addition, you need to decide what software to deploy on the cluster and what Settings to use to ensure the security, stability, and performance of the platform. All of this requires extensive Kubernetes expertise, which may not be easy to come by.

3) How to solve it

The Kubernetes distribution provides a reliable way to install Kubernetes and provides reasonable defaults to create a better and more secure operating environment.

The Kubernetes distribution provides vendors and projects with the control and predictability they need to help them support customers in deploying, maintaining, and upgrading Kubernetes clusters. This predictability enables distribution providers to support customers when they encounter production issues. Distributions often provide tested and supported upgrade paths to keep users’ Kubernetes clusters up to date. In addition, distributions often provide software that can be deployed on Kubernetes to make it easier to use.

4) Corresponding tools

If you already have Kubernetes installed, you may already be using a tool like Kubeadm to get the cluster up and running. Even so, you may need CNI (Container Network Interface) to install and configure it. Then, you might have added some storage classes, a tool to handle log messages, maybe an Ingress Controller, and more. The Kubernetes distribution will perform some or all of the setup automatically. It will also provide configuration Settings based on its understanding of best practices or smart defaults. In addition, most distributions will come bundled with some tested extension or add-on to ensure that users can use the new cluster as quickly as possible.

Let’s take Kublr as an example. It is built around Kubernetes and bundles tools from the provisioning layer, runtime layer, and orchestration management. All modules are pre-configured with some options and work out of the box. Different platforms focus on different functions. In Kublr’s case, the focus is on operations, while other platforms might focus on development tools.

There are many tool options in this category. As shown in the figure below, companies can choose to cooperate with suppliers such as Canonical, VMware, Mirantis, Suse from abroad, netease, Volcano Engine and JD from China, all of which can provide excellent open source and commercial tools. It is recommended to carefully consider your own needs when evaluating the release.

2. Hosting Kubernetes

1) What is it

Hosted Kubernetes is a service offered by an infrastructure provider (cloud vendor) such as Amazon Web Services (AWS), DigitalOcean, Azure or Google that allows customers to start a Kubernetes cluster on demand. The cloud vendor is responsible for managing a portion of the Kubernetes cluster, often referred to as the control plane. The hosted Kubernetes service is similar to the distribution, but is managed by the cloud vendor on top of its infrastructure.

2) Solved problems

Hosting Kubernetes allows teams to start using Kubernetes by simply opening an account with the cloud vendor. It solves the “five W” problem in the five processes of getting started with Kubernetes:

Who: Cloud Vendor
What: They host Kubernetes products
Now the When:
Where: On the cloud vendor’s infrastructure
Why: It’s up to you

3) How to solve it

Since the Kuberentes hosting service provider is responsible for managing all the details, a hosted Kubernetes service is the easiest way to get started on the path to cloud nastiness. All users need to do is develop their own applications and deploy them on a hosted Kubernetes service, which is very convenient. The hosted product allows users to start a Kubernetes cluster and start immediately *, with some responsibility for cluster availability. It is important to note that the additional convenience of these services results in reduced flexibility: the hosted Kubernetes service is tied to the cloud vendor, and users do not have access to the Kubernetes control plane, so certain configuration options are limited. Note: AWS EKS is a slight exception, as it also requires users to take a number of other steps to prepare for the cluster.

4) Corresponding tools

Managed Kubernetes are on-demand clusters of Kubernetes provided by a vendor (typically an infrastructure hosting provider), who is responsible for configuring the cluster and managing the Kubernetes control plane. Again, the notable exception is Eks, where the individual node configuration is determined by the client.

Hosted Kubernetes services enable organizations to outsource infrastructure component management, which enables rapid configuration of new clusters and reduces operational risk. The main trade-offs are the possibility of paying for control plane administration and the limited administrative authority of the user. Hosting services have more stringent restrictions on configuring Kubernetes clusters than they do on setting up Kubernetes clusters themselves.

There are many vendors and projects in this area, and at the time of this writing, no CNCF projects exist.

3. Kubernetes installer

1) What is it

The Kubernetes installers help you install Kubernetes on your machine. They automate the installation and configuration of Kubernetes, and can even help with upgrades. The Kubernetes installer is usually used in conjunction with or by Kubernetes distributions or hosted Kubernetes products.

2) Solved problems

Similar to the Kubernetes distribution, the Kubernetes installer simplifies getting started with Kubernetes. The open source Kubernetes relies on an installer like Kubeadm. As of this writing, Kubeadm can be used to start and run a Kubernetes cluster as part of the CKA (Kubernetes Administrator Certification) test.

3) How to solve it

The Kubernetes installer simplifies the installation process of Kubernetes. Like distributions, they provide audited sources for source code and versions. They also often come with a Kubernetes environment configuration. Kubernetes installers like Kind (Kubernetes in Docker) allow a Kubernetes cluster to be obtained with a single command.

4) Corresponding tools

Whether it’s installing Kubernetes locally on Docker, starting and configuring a new virtual machine, or preparing a new physical server, you need a tool to handle the preparation of various Kubernetes components.

The Kubernetes installer simplifies this process. Some processing nodes are started, and others only configure provisioned nodes. They all provide varying degrees of automation and are appropriate for different use cases. When you start using the Kubernetes installer, you should first understand your needs and then choose an installer that meets those needs. At the time of this writing, Kubeadm is a critical tool in the Kubernetes ecosystem and is included in the CKA tests. Minikube, Kind, Kops, and Kubespray are all Kubernetes installer items in CNCF.

4. PaaS/container services

1) What is it

A PaaS (Platform as a Service) is an environment that allows users to run applications without having to understand the underlying computing resources. PaaS and container services in this category are mechanisms for developers to host PaaS or services that they can use.

2) Solved problems

In this article, we discussed tools and techniques for “cloud native.” PaaS connects many technologies in this space and provides immediate value to developers. It answers the following questions:

How do I run the application in a variety of environments?
How will my team and users interact with the applications once they are up and running?

3) How to solve it

PaaS provides a choice of open source and closed source tools needed to combine to run an application. Many PaaS products include tools to handle PaaS installation and upgrade, as well as mechanisms to transform application code into a running application. In addition, PaaS can address the runtime requirements of an application instance, including extending individual components on demand and visualizing the performance and logging messages of an individual application.

4) Corresponding tools

Organizations are adopting cloud-native technologies to achieve specific business or goals. PaaS enables organizations to realize value faster than building custom application platforms. Tools such as Heroku or Cloud Foundry Application Runtime can help organizations get new applications up and running quickly, and they provide the tools needed to run Cloud native applications. Any PaaS has its own limitations. Most support only one language or a subset of application types, and they come with some tool options that may not suit your needs. Stateless applications typically perform well in PaaS, while stateful applications such as databases are often less suited to PaaS. There are currently no CNCF projects in this area, but most of the products are open source.

5. Summary

As described in this article, several tools are available to help simplify the adoption of Kubernetes. The Kubernetes distribution, the hosted Kubernetes service, the Kubernetes installer, and PaaS all do some of the installation and configuration work and can be pre-packaged. Each solution has its own characteristics. Before adopting any of the above approaches, you need to do some research to determine the best solution for your needs. You may want to consider:

Will I encounter situations where I need to master the control plane? If so, a managed solution may not be a good choice.
Do I have a small team to manage the “standard” workload and need to shunt as many operational tasks as possible? If so, a hosting solution may be perfect for you.
Is portability important to me?
How ready is production?

There are more questions to consider. There is no one “best tool,” but you can certainly choose one that works for your specific needs. Hopefully this article has helped you narrow your search to the right area.

Observability and analysis

Finally, we come to the last chapter of the detailed cloud native panorama. This section introduces you to the “column” of observability and analysis in a cloud native panorama.

First let’s define Observability & Analysis. Observability is a system property that describes the extent to which a system can be understood by external output. By measuring CPU time, memory, disk space, latency, errors, and other metrics, you can more or less observe the state of a computer system. Analysis is an attempt to understand the observable data. To ensure that the service does not break down, we need to observe and analyze various aspects of the application so that anomalies can be detected and fixed immediately. That’s what observability and analysis is all about. It penetrates and looks at all layers, so it is on the side of the whole panorama rather than embedded in a single layer. Tools in this category include Logging, Monitoring, Tracing, and Chaos Engineering. Although chaos engineering is listed here, it is more of a reliability tool than an observability or analysis tool.

1. Logging

1) What is it

The application outputs a steady stream of log messages describing what it did and when. These log messages capture various events occurring in the system, such as failed or successful operations, audit information, or health. Logging tools collect, store, and analyze these messages to trace error reports and associated data. Loging, metrics, and tracing are the three pillars of observability.

2) Solved problems

Collecting, storing, and analyzing logs is a critical part of building a modern platform, and logging can help perform any or all of these tasks. Some tools handle everything from collection to analysis, while others focus on a single task (such as collection). All logging tools are designed to help organizations have better control over logging messages.

3) How to solve it

As we collect, store, and analyze the log messages of our application, we learn what our application is communicating at any given time. Note, however, that logs represent messages that the application intentionally outputs, and they do not necessarily pinpoint the root cause of a given problem. Nonetheless, collecting and retaining log messages at any time is a powerful feature that will help the team diagnose problems and meet compliance requirements.

4) Common tools

While collecting, storing, and processing log messages is nothing new, the cloud native pattern and the advent of Kubernetes have dramatically changed the way we handle logging. Traditional logging methods that work for virtual and physical machines (such as writing logs to files) are not suitable for containerized applications, where the life cycle of the file system may not be longer than that of the application. In a cloud native environment, a log collection tool such as Fluentd runs with the application container and collects messages directly from the application, then forwards the messages to a central log store for aggregation and analysis.

The only logging tool in CNCF is Fluentd.

2. Monitor

1) What is it

Monitoring refers to the detection of the application and the collection, aggregation, and analysis of logs and metrics to improve our understanding of the application’s behavior. Logs describe specific events, while metrics are measures of the system at a given point in time. Logs and metrics are two different things, but both are necessary to fully understand the health of the system. Monitoring includes looking at disk space, CPU utilization, memory consumption on individual nodes, and performing detailed integrated transactions to see if the system or application is responding correctly and in a timely manner. There are many different ways to monitor systems and applications.

2) Solved problems

When we run an application or platform, we want it to do what it’s supposed to do and ensure that only authorized users can access it. By monitoring, we can know if the application/platform is running properly, safely, and efficiently, and if only authorized users can access it.

3) How to solve it

Good monitoring methods enable operations personnel to respond quickly and even automatically in the event of an accident. Monitoring gives us insight into the current state of the system and detects problems to fix. It can track application health, user behavior, etc., and is an important part of running an application effectively.

4) Common tools

Monitoring in a cloud native environment is similar to monitoring in a traditional application. We need to track metrics, logs, and events to understand the health of the application. The main difference is that some of the managed objects in the cloud native environment are temporary and may not be persistent, so it is not a good strategy to tie the monitoring system to automatically generated resource names. There are many monitoring tools in CNCF, the main one being Prometheus (graduated from CNCF).

3. Tracking

1) What is it

In a microservice architecture, services constantly communicate with each other over a network. Tracing is a specialized use of logging to trace the path of a request as it moves through a distributed system.

2) Solved problems

Understanding the behavior of a microservice application at a point in time can be a challenging task. Although there are many tools that can provide insight into the behavior of a service, it can be difficult to understand how an entire application is performing from the behavior of a single service.

3) How to solve it

Tracing solves this problem by adding a unique identifier to the messages sent by the application. This unique identifier can be used to follow/track the path of individual transactions as they move through the system, and the tracking information can be used to understand the health of the application, and to debug problematic microservices or behaviors.

4) Common tools

Tracing is a powerful debugging tool for troubleshooting and fine-tuning the behavior of distributed applications. There are also costs associated with implementing tracing, such as the need to modify the application code to emit trace data and the need for all Spans to be propagated by the infrastructure components in the application data path. The Tracing tools in CNCF are Jaeger and Open Tracing.

4. Chaos Engineering

1) What is it

Chaos engineering is the practice of deliberately introducing failures into systems to create more resilient applications and engineering teams. Chaos engineering tools introduce failures into the system in a controlled manner and run specific experiments for specific instances of the application.

2) Solved problems

Complex systems can fail. There are many causes of failure, and the consequences for distributed systems are difficult to predict. Some organizations have accepted this and are willing to adopt chaotic engineering techniques, not trying to prevent failures, but trying to practice recovering from them. This is called Optimized Mean Repair Time (MTTR).

3) How to solve it

In a cloud native environment, applications must dynamically adapt to failures — a relatively new concept. This means that when a failure occurs, the system does not crash completely, but can be gracefully degraded or restored. Chaos engineering tools can be used to experiment on systems in production environments to ensure that the system can cope in the event of a real failure. In short, chaos engineering experiments on a system are conducted to ensure that the system can withstand unexpected conditions. With chaotic engineering tools, instead of waiting for failures to occur and then dealing with them, you can inject failures into the system under controlled conditions to find vulnerabilities and fix them before changes overwrite them.

4) Common tools

Chaos engineering tools and practices are critical to the high availability of applications. Distributed systems are often too complex, and no change process can fully determine its impact on the environment. By deliberately introducing chaos engineering practices, teams can practice recovering from failures and automate the process. Chaos engineering tools in CNCF include Chaos Mesh and Litmus Chaos. There are several other open and closed source chaos engineering tools.

5. Summary

Observability and analysis is a list of tools that can be used to understand the health of a system and ensure that the system performs well even under harsh conditions. Logging tools capture event messages emitted by applications, monitoring tools monitor logs and metrics, and tracing tools track the path of individual requests. Combined with these tools, you should ideally have a 360-degree view of what’s happening in your system. Chaos engineering provides a safe way to ensure that the system can withstand unexpected events, which can basically ensure the healthy operation of the system. Source: K8smeetup community, click to see the original article.

One-stop cloud native PaaS platform

As an open source one-stop cloud native PaaS platform, Erda has platform-level capabilities such as DevOps, micro-service observation governance, multi-cloud management and fast data governance. Click the link below to participate in open source, discuss and communicate with many developers, and build an open source community. Everyone is welcome to follow, contribute code and STAR!

Erda Github address:https://github.com/erda-project/erda
Erda Cloud Website:https://www.erda.cloud/

Must see! The most complete cloud native panorama interpretation guide in history

Take you through the cloud native technology map

1. 4 layers of the cloud native panorama

1) Provisioning

2) Runtime layer (Runtime)

3) Orchestration and Management

4) Application Definition and Developement

2. Tools across all layers

1) Observability and Analysis

2) Platform

3. Summary

Supply layer

1. Automation and configuration

1) What is it

2) Solved problems

3) How to solve the problem

4) Corresponding tools

2. Container Registry

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

3. Safety and compliance

1) What is it

2) Solve what problem

3) How to solve it

4) Corresponding tools

4. Key and identity management

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

The runtime layer

1. Cloud native storage

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

2. Container runtime

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

3. Cloud native network

1) What is it

2) Solve what problem

3) How to solve it

4) Corresponding tools

Choreography and Management

1. Choreography and scheduling

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

2. Coordination and service discovery

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

3. Remote process calls

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

4. Service Agents

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

5. API gateway

1) What is it

2) Solved problems

3) How to solve it

4) Corresponding tools

6. Service grid

1) What is it

2) Solve what problem

3) How to solve it

4) Corresponding tools

7. Summary