The author | Zhang Zhen alibaba senior technical experts

1. Resource meta-information

1. Kubernetes resource object

Kubernetes resource object consists of two parts: Spec and Status. The Spec section describes the expected state, and the Status section describes the observed state.

Today we will introduce you to another part of K8s, the metadata part. This section mainly includes Labels used to identify resources: Labels used to describe resources; Annotations, an OwnerReference used to describe the relationship between multiple resources. This metadata plays a very important role in K8s operation.

2. labels

The first and most important metadata is resource tags. Resource tags are Key: Value metadata with identifiers. The following figure shows several common tags.

The first three tags are attached to the Pod object, identifying the application environment, release maturity, and version of the application. As you can see from the tag application example, the tag name includes a domain name prefix that describes the system and tools that tag it, the last label is placed on the Node object, and the beta string is added before the domain name to identify the version.

Labels are used to filter resources and combine resources. You can use an SQL query similar to SELECT to query resources based on labels.


3. Selector

The most common Selector is an equal Selector. Here’s a simple example:

Suppose there are four pods in the system, and each Pod has a label that identifies the system level and environment. We use Tie: Front to match the Pod in the left-hand column. We can also include multiple equality conditions, which are logical “and” relationships between them.

In the previous example, we can filter out all the Pod of Tie=front,Env=dev’s Selector, which is the Pod in the upper left corner of the figure below. The other kind of Selector is a collection Selector. In this case, the Selector filters all the pods whose environment is Test or gray.

In addition to the in collection operation, there are notin collection operations, such as tie Notin (front,back), which will filter all pods whose ties are not front and not back. In addition, you can filter all pods with the release label based on the presence or absence of a filter for a certain lable, such as Selector Release. Set and equal selectors, you can also use a comma to connect them, the same logic that identifies the relationship between “and”.

4. Annotations

Another important type of metadata is annotations. Annotations are used by a system or tool to store non-identifying information about a resource. They can be used to extend the description of a resource’s spec/status.

In the first example, the certificate ID of Ali Cloud payload is stored. We can see that Annotations can have the prefix of the domain name, and the version information can also be contained in the annotations. The second annotation stores configuration information for the NGINx access layer. We can see that Annotations include “, “, special characters that do not appear in labels. Annotations the third annotations can generally be seen in the resource after kubectl apply command line operations. The annotation value is a structured piece of data that is actually a JSON string that marks the JSON description of the resource from the last Kubectl operation.

5. Ownereference

The last piece of metadata is called Ownereference. The owner generally refers to a collection of resources, such as Pod collection, Replicaset, and StatefulSet, which will be discussed in a later lecture.

The controller of a collection resource creates the corresponding owning resource. Such as: The Replicaset controller creates a Pod. The Pod Ownereference refers to the REPLICaset that creates the Pod. Ownereference allows users to easily locate the object that created the resource. It can also be used for cascading deletion. * * * *

Two, operation demonstration

Kubectl connect to a K8s cluster, and then show how to check and modify metadata (Pod tag, annotation, and Ownerference).

First let’s look at the current configuration of the cluster:

  1. Look at the Pod, now there is no one Pod;
  • kubectl get pods
  1. Then create a Pod using yamL of a Pod prepared in advance.
  • kubectl apply -f pod1.yaml
  • kubectl apply -f pod2.yaml
  1. Now look at the labels of pods. With –show-labels, you can see that both pods are labeled with a deployment environment and hierarchy.
  • Kubectl get the pods – show – labels
  1. We can also view specific resource information in another way. -o yaml = -o yaml = -o YAMl = -o YAMl = -o YAML = -o YAML = -o yamL
  • kubectl get pods nginx1 -o yaml | less
  1. Now let’s think about how to modify the existing Lable of Pod. Let’s change its deployment environment from development to test, and then specify the Pod name, add one of its values to the environment, test, and see if it succeeds. I’ve got an error here, and you can see that it’s actually saying that now this label has a value;
  • kubectl label pods nginx1 env=test
  1. To override it, you have to add an override option. Plus, we should be able to see that this has been marked successfully;
  • Kubectl label Pods nginx1 env=test — overwrite
  1. Nginx1 already has a test environment attached to it.
  • Kubectl get the pods – show – labels
  1. If you want to remove a tag from Pod, it is the same as labeling, but env is not equal. Add only the name of the label, do not add an equal sign, change to use a minus sign to remove the k:v of the label;
  • kubectl label pods nginx tie-
  1. You can see the label, the de-labeling is completely successful;
  • Kubectl get the pods – show – labels

  1. Nginx1 Pod is missing a tie=front label. So now that we have this Pod tag, we can look at how do we match with a Label Selector? First, the label Selector is specified with the -l option. When specifying this, we first try to filter with a label of the same type. So we specify that the deployment environment is equal to one Pod for the test.
  • Kubectl get pods — show-labels-l env=test
  1. If env is equal to dev, we can’t get a Pod.
  • Kubectl get pods — show-labels-l env=test,env=dev
  1. Then if we say env=dev, but tie=front, we can match the second Pod, namely nginx2;
  • Kubectl get Pods — show-labels-l env=dev,tie=front
  1. We can also try to filter with a collection of label selectors. Again, we want to match that all deployment environments are test or dev’s Pod, so put a quotation mark here and specify a set of all deployment environments in parentheses. This time both created pods can be filtered out;
  • Kubectl get Pods — show-allelags-l ‘env in (dev,test)’
  1. Let’s try adding an annotation to Pod. Annotate does the same thing as annotate, but change the label command to annotate. Then, specify the type and corresponding name as well. So instead of putting the label k:v, we’re going to put the annotation K :v. We can specify an arbitrary string, such as a space or a comma;
  • Kubectl annotate Pods nginx1 my-annotate= ‘My annotate, OK’
  1. And then we’re going to look at some of the metadata for this Pod, and we can see that in the metadata for this Pod there’s annotations, there’s a My-Annotate notation;
  • kubectl get pods nging1 -o yaml | less

And then we can actually see here that when we have a Kubectl apply, the Kubectl tool adds an annotation, which is also a JSON string.

  1. Then we will demonstrate how to see the owner ence of Pod. A ReplicaSet object is used to create a ReplicaSet resource, which is used to create pods. First, create a ReplicaSet object, which can be viewed.
  • kubectl apply -f rs.yaml
  • kubectl get replicasets  nginx-replicasets -o yaml |less

  1. In the ReplicaSet spec, it says that two pods are created, and the selector matches the label that the deployment environment is the production environment of the product. So we can look at the Pod situation in the cluster now;
  • **kubectl get pods **

  1. ReplicaSet creates a ReplicaSet Pod with an Ownereference that points to a Replicasets type. If ReplicaSet creates a ReplicaSet Pod with an Ownereference that points to a Replicasets type, It’s called nginx-replicasets;
  • kubectl get pods nginx-replicasets-rhd68 -o yaml | less

Controller mode

1. Control loop

At the heart of the control model is the concept of a control loop. In the control cycle includes the controller, the controlled system, and the sensor to observe the system, three logical components.

The controller compares the resource spec and status to calculate a DIFF. The DIFF is used to determine what control actions to perform on the system. The control actions will result in new output. Each component of the controller will operate independently and continuously make the system closer to the final state represented by spec.

2, Sensor

The logical sensors in the control loop are mainly composed of Reflector, Informer and Indexer.

Reflector uses the List and Watch K8s Server to get the data of the resource. List is used to fully update system resources in the case of Controller restart and Watch interruption. Watch updates resources incrementally between lists. After Reflector gets the new resource data, it will insert a Delta record into the Delta queue containing the resource object information itself and the event type of the resource object. The Delta queue ensures that there is only one record for the same object in the queue. This avoids duplicate records when Reflector resets the List and Watch.

The Informer component continuously pops Delta records from the Delta queue, then hands the resource object to Indexer, which records the resource in a cache that is indexed by default using the resource’s namespace. And can be shared by the Controller Manager or multiple controllers. The event is then passed to the event’s callback function

The controller components in the control cycle are mainly composed of event handlers and workers. Event handlers pay attention to each other’s new, updated and deleted events of resources and decide whether to deal with them according to the logic of the controller. For the event to be processed, the namespace and name of the resource associated with the event will be packed into a work queue, which will be processed by a worker in the subsequent worker pool. The work queue will deduplicate the stored object, so as to avoid multiple Wokers processing the same resource.

Worker in dealing with a resource objects, generally need to use the name of the resource to get the latest resource data, is used to create or update resource objects, or other external services, call the Worker if fail, normally will change the name of the resource to rejoin the work queue, convenient to retry later.

3. Control loop example – Capacity expansion

Here is a simple example of how a control loop works.

The ReplicaSet controler listens to the ReplicaSet resource to maintain the desired number of stateless applications. In ReplicaSet, the associated Pod is matched by a selector. In ReplicaSet rsA, replicas are changed from 2 to 3.

First of all, Reflector will watch the changes of ReplicaSet and Pod resources. Why we also watch the changes of Pod resources will be discussed later. When ReplicaSet was changed, the delta queue was stuffed with an rsA object and an updated record of type.

Informer updates the new ReplicaSet to the cache and indexes it with Namespace nsA. On the other hand, when the Update callback is called, the ReplicaSet controller finds that the NEW ReplicaSet changes and inserts the nsA/rsA string into the work queue. A Worker behind the work queue retrieves the nsA/rsA key from the work queue and the latest ReplicaSet data from the cache.

The Worker compares the spec and status values in ReplicaSet and finds that the ReplicaSet needs to be expanded. Therefore, the ReplicaSet Worker creates a Pod. The Ownereference in this POD is based on ReplicaSet rsA.


Then Reflector Watch added events to the Pod and added deTA records of type Add to the Delta queue. On the one hand, the new Pod records were stored in the cache through Indexer. On the other hand, the Add callback of the ReplicaSet controller is called. The Add callback finds the corresponding ReplicaSet by checking pod ownerReferences and inserts the ReplicaSet namespace and string into the work queue.

After ReplicaSet’s Woker gets the new work item, it retrieves the new ReplicaSet record from the cache and gets all of its created pods because ReplicaSet’s state is not up to date, meaning the number of created pods is not up to date. So at this point, ReplicaSet updates the status to make the spec and status agree.

Four, controller mode summary

1. Two API design methods

The Kubernetes controller pattern relies on a declarative API. Another common TYPE of API is imperative API. Why does Kubernetes use declarative apis instead of imperative apis to design the entire controller?

First, compare the interaction behavior of the two apis. In daily life, the common imperative interaction is between parents and children. Because children lack a sense of goals and cannot understand parents’ expectations, parents often teach children some clear actions through some commands, such as eating and sleeping commands. In our container choreography architecture, imperative apis are executed by issuing explicit operations to the system.

And a common declarative interaction is the way bosses communicate with their employees. Bosses don’t usually make very clear decisions for their employees, and in fact they may not be as clear as their employees about what they are doing. Therefore, bosses give play to employees’ subjective initiative by setting quantifiable business goals for them. For example, the boss may demand an 80% market share for a particular product without specifying the specific operations needed to achieve that market share.

Similarly, in container choreography, we can keep the number of copies of an application instance to three without explicitly expanding or deleting existing pods.

2. Imperative API issues

Now that you understand the differences between the two interaction apis, you can examine the issue of imperative apis.

  • One of the biggest problems with command apis is error handling;

In large-scale distributed systems, errors are ubiquitous. If the issued command does not respond, the caller can only try to recover the error by repeatedly retrying it, but blind retries can cause bigger problems.

Assume that the original command has been executed on the background. After a retry, another command operation is executed. To avoid the retry problem, the system often needs to record the commands to be executed before the command is executed and redo the commands in scenarios such as restart. In addition, complex logic such as the sequence of multiple commands and overwriting relationships needs to be considered during the execution.

  • In fact, many command interactive systems often make a patrol system in the background, to correct the command processing timeout, retry and other scenarios caused by data inconsistency problems;

However, because the inspection logic is different from the daily operation logic, it is often not enough in the test coverage, and is not rigorous enough in the error handling, which has great operation risk. Therefore, many inspection systems are often triggered by manual.

  • Finally, imperative apis are also prone to problems when handling multiple concurrent accesses;

If there are multiple concurrent operations on a resource request, and one of the operations fails, retry is required. It’s hard to know, and it’s impossible to guarantee, which one of these finally works. Many imperative systems often lock the system before operation, so as to ensure the predictability of the final effective behavior of the whole system, but the locking behavior will reduce the operation efficiency of the whole system.

  • In contrast, declarative API systems naturally record the current and final state of the system.

No additional operational data is required. In addition, because of the idempotent nature of the state, it can be repeated at any time. In the way of declarative system operation, the normal operation is actually the inspection of the resource status, there is no need to develop additional inspection system, the system running logic can also be tested and tempered in the daily operation, so the stability of the whole operation can be guaranteed.

Finally, because the final state of the resource is clear, we can incorporate multiple changes to the state. Multiple concurrent access can be supported without locking.

3. Summary of controller mode

Finally, we conclude:

  1. The controller pattern used by Kubernetes is driven by a declarative API. Specifically, it is driven based on modifications to Kubernetes resource objects;
  2. After the Kubernetes resource comes the controller that cares about that resource. These controllers drive the control system asynchronously towards the set end state;
  3. These controllers operate autonomously, making it possible to automate and unattended the system.
  4. Because Kubernetes’ controllers and resources are customizable, it is easy to extend the controller pattern. Especially for stateful applications, we often customize resources and controllers to automate o&M operations. This is the operator scenario that will be introduced later.


This paper summarizes

Here is a brief summary of the main content of this paper:

  • The metadata part of Kubernetes resource object mainly includes Labels used to identify resources: Labels used to describe resources; Annotations, an OwnerReference used to describe the relationship between multiple resources. These metadata play an important role in K8s operation.
  • The core of control mode is the concept of control cycle;
  • There are two approaches to API design: declarative API and imperative API; The controller model used by Kubernetes is driven by a declarative API;

Alibaba cloudnative wechat public account (ID: Alicloudnative) focuses on the technical fields such as micro-service, Serverless, container, Service Mesh, etc., focuses on the popular technology trend of cloudnative, large-scale implementation practice of cloudnative, and becomes the technical public account of the most knowledgeable cloudnative developers.