The author | ming-shan zhao (vertical scale) source | alibaba cloud native public number

preface

OpenKruise is an open source Cloud Native application automation management suite of Ali Cloud. It is also a Sandbox project currently hosted under Cloud Native Computing Foundation (CNCF). It comes from alibaba’s containerized, cloud-native technology precipitation over the years. It is a standard extension component based on Kubernetes for large-scale application in Alibaba’s internal production environment. It is also a technical concept and best practice closely related to upstream community standards and adapted to the large-scale scene of the Internet.

On March 4, 2021, OpenKruise released the latest version of ChangeLog (V0.8.0), which enhanced the capability of SidecarSet, especially the log management class Sidecar has more complete support.

background

Sidecar is a very important container design pattern in cloud native, which separates helper capabilities from the main container into a separate Sidecar container. In microservice architecture, sidecAR mode is usually used to separate configuration management, service discovery, routing, fusing and other general capabilities in microservice from the main program, thus greatly reducing the complexity in microservice architecture. With the gradual popularity of Service Mesh, sidECar mode is gaining popularity and is widely used in Alibaba Group to manage common components such as operation and maintenance, security and message-oriented middleware.

In Kubernetes cluster, Pod can not only realize the construction of master container and Sidecar container, but also provide many powerful workload (such as: Deployment, statefulset) to manage and upgrade Pod. However, with the increasing business on Kubernetes cluster, the types and scale of Sidecar containers are also increasing, and the management and upgrade of online Sidecar containers become more and more complicated work:

  • Business Pod contains multiple SIDecar containers such as o&M, security, and proxy. Business line students not only need to complete the configuration of their own main container, but also need to be familiar with the configuration of these Sidecar containers, which not only increases the workload of business students, but also increases the risk of sidecAR container configuration.

  • The upgrade of Sidecar container needs to restart the workload of Deployment and Statefulset together with the service master container (the rolling upgrade of Pod is realized based on Pod destruction and reconstruction mode), promote and upgrade the Sidecar container supporting hundreds of online businesses. There must be significant business resistance.

  • As the provider of Sidecar containers, there is no direct and effective means to upgrade all kinds of configurations and versions of Sidecar containers online, which means great potential risks for sidecar container management.

Alibaba Group has millions of containers and thousands of businesses, so the management and upgrade of Sidecar containers has become a topic to be improved. Therefore, we summarized the generalization requirements of many internal Sidecar containers, deposited them on OpenKruise, and finally abstracted SidecarSet as a powerful tool for managing and upgrading a wide variety of Sidecar containers.

OpenKruise SidecarSet

SidecarSet is an abstract concept for Sidecar in OpenKruise. It is responsible for injecting and upgrading Sidecar containers in Kubernetes cluster, and it is one of OpenKruise’s core workload. It provides very rich functions. Users can easily manage sidecar containers by using SidecarSet. The main features are as follows:

  • Separate Configuration management: Configure a separate SidecarSet for each Sidecar container for easy management.

  • Automatic injection: Automatic injection of sidecar containers in pod creation, capacity expansion, and reconstruction scenarios.

  • In situ upgrade: The sidecar container can be upgraded in situ without rebuilding PODS. It does not affect the main service container, and contains a variety of grayscale publishing strategies

Note: For the mode with multiple containers in Pod, the container that provides the main business logic capability is called the master container, and the other containers with auxiliary capabilities such as log collection, security, and proxy are called Sidecar containers. For example, if a POD provides web capabilities externally, the Nginx container provides major Web server capabilities, that is, the main container, and the Logtail container collects and reports Nginx logs, that is, the Sidecar container. The SidecarSet resource abstraction in this article also addresses some of the problems of Sidecar containers.

1. Sidecar logging architectures

Application logs give you insight into the health of your application and are useful for debugging problems and monitoring cluster activity. With containerization, the simplest and most widely used method of logging is to write standard output and standard error.

However, in the era of distributed system and large-scale cluster, the above scheme is not enough to meet the standards of production environment. First, for distributed systems, logs are scattered in a single container and there is no one place to aggregate them. Second, log loss occurs in the event of container crashes, Pod ejections, and other scenarios. Therefore, a more reliable, container life-cycle independent logging solution is needed.

Sidecar Logging Architectures place the logging agent in a separate Sidecar containerContainer logs can be collected by sharing log directories and stored in the back-end storage of the log platform.

Alibaba and Ant Group also realized log collection of containers based on this architecture. Next, I will introduce how OpenKruise SidecarSet helps Sidecar log architecture to be implemented in Kubernetes cluster on a large scale.

2. Automatic injection

OpenKruise SidecarSet realizes the automatic injection of Sidecar container based on Kubernetes AdmissionWebhook mechanism. Therefore, as long as sidecar is configured in SidecarSet, Regardless of whether the user is deployed using CloneSet, Deployment, StatefulSet, etc., the defined Sidecar container is injected into the expanded Pod.

Sidecar container owners only need to configure their own SidecarSet to complete Sidecar container injection without service awareness. This method greatly reduces the threshold for Sidecar container use and facilitates Sidecar owner management. In order to meet various scenarios of SIDecar injection, SidecarSet extends the following fields in addition to containers:

# sidecarset.yaml apiVersion: apps.kruise.io/v1alpha1 kind: SidecarSet metadata: name: test-sidecarset spec: # Select pod selector by selector: matchLabels: app: web-server Ns-1 # container definition containers: -name: logtail image: logtail:1.0.0 # Share specified volume volumeMounts: -name: Web-log mountPath: /var/log/web # Sharing all volumes shareVolumePolicy: disabled # Sharing environment variables transferEnv: -sourcecontainerName: Web-server # TZ indicates the time zone. For example, the environment variable TZ=Asia/Shanghai envName: TZ volumes: -name: web-log emptyDir: {}Copy the code
  • Pod selector

    • Support selector to select pods to inject, as in the example where labels[app] = Web-server Pod is selected, inject logtail into it, You can also add an Labels [Inject /logtail] = true to all pods to implement global Sidecar injection.
    • Namespace: sidecarSet takes effect globally by default. If sidecarSet takes effect for only one namespace, set this parameter.
  • Data Volume Sharing

    • Sharing specified volumes: You can use volumeMounts and volumes to share specific volumes of the master container. For example, you can share web-log volumes to collect logs.
    • Share all volumes: Through shareVolumePolicy = enabled | disabled to control whether all the coils mounted pod main container, often used for log collection and sidecars, If enabled is enabled, all mount points in the application container will be injected into the same path of the Sidecar (except the data volumes and mount points declared in the Sidecar).
  • Environment variable sharing: You can use transferEnv to obtain environment variables from other containers. The environment variables named envName in sourceContainerName are copied to the sidecar container. For example, sidecar shares the time zone TZ of the primary container. This is especially common in overseas environments.

Note: The Kubernetes community does not allow you to change the number of Containers for pods that have been created, so this can only happen during Pod creation. For pods that have been created, you need to rebuild them.

3. Upgrade in place

SidecarSet not only implements sidecar container injection, but also reuse OpenKruise original upgrade features, realizing the ability to independently upgrade sidecar container without restarting Pod and the master container. Because this upgrade method can basically achieve the business side is not aware of the degree, so the sidecar container upgrade is no longer a difficult problem, which greatly liberates sidecar owners and improves the speed of sidecar version iteration.

Note: Kubernetes allows you to modify only the container.image field for created pods. Therefore, if you modify sidecar containers that contain other fields except container.image, you need to use Pod reconstruction instead of upgrading them directly.

To accommodate some complex sidecar upgrade scenarios, SidecarSet provides a very rich grayscale publishing strategy in addition to in-place upgrade.

4. Grayscale release

Grayscale publishing is one of the most common tools in daily publishing. It is a smooth way to publish sidecar containers, especially in large-scale cluster scenarios. It is highly recommended. Here are the first pauses, followed by an example of a scrolling publication based on maximum unavailability, assuming a publication with 1000 pods:

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: sidecarset
spec:
  # ...
  updateStrategy:
    type: RollingUpdate
    partition: 980
    maxUnavailable: 10%
Copy the code

If (1000-980) = 20 pods, then the SidecarSet configuration will be suspended. If (1000-980) = 20 pods, then the SidecarSet configuration will be suspended.

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: sidecarset
spec:
  # ...
  updateStrategy:
    type: RollingUpdate
    maxUnavailable: 10%
Copy the code

After this adjustment, the remaining 980 pods will be published in the order of the maximum number of unusable pods (10% * 1000 = 100) until all pods are published.

The semantics of Partition is the number or percentage of pods retained from the previous version. The default value is 0. The partition here does not represent any order. If partition is set during publishing:

  • If it is a number, the controller will update the number of PODS (replicas-partition) to the latest version.

  • If it is a percentage, the controller updates the (Replicas * (100%-partition)) number of pods to the latest version.

MaxUnavailable is the maximum number of pods that are guaranteed to be unavailable at any one time during publishing. The default value is 1. The user can set it to absolute value or percentage (the percentage is calculated by the controller using the selected Pod base to calculate the absolute value behind the selected pod).

Note: The maxUnavailable and partition values are not necessarily related. For example:

  • When {matched pod}=100,partition=50,maxUnavailable=10, the controller will issue 50 pods to the new version, but the release window is 10, that is, only 10 pods will be issued at the same time. For every Pod released, another release will be found until 50 releases are completed.

  • When {matched POD}=100,partition=80,maxUnavailable=30, the controller will issue 20 pods to the new version.

5. Canary Release

For businesses with canary publishing requirements, this can be done through strategy.selector. Approach: the need to take the lead in the canary grayscale pod on fixed labels [canary. Release] = true, again through the strategy. The selector. MatchLabels to select the pod.

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: sidecarset
spec:
  # ...
  updateStrategy:
    type: RollingUpdate
    selector:
      matchLabels:
      - canary.release: true
    maxUnavailable: 10%  
Copy the code

The above configuration will only publish containers labeled with Canary Labels. After canary validation is complete, it will continue to publish through Max unserviceable scrolling by removing the updateStrategy.selector configuration.

6. Break up the post

SidecarSet The default pod upgrade sequence is as follows:

  • Ensure that the pod sets are upgraded in the same order.

  • The options are as follows: Unscheduled < scheduled, pending < unknown < running, not-ready < ready, newer Pods < older Pods.

In addition to the default release order described above, the Scatter policy allows users to customize the scatter of pods that meet certain labels throughout the release process. For example, for a global Sidecar container such as Logtail, dozens of service Pods may be injected into a cluster. Therefore, logtail can be scattered based on the application name to achieve the effect of gray distribution between different applications. And this approach can be used with maximum unavailability.

apiVersion: apps.kruise.io/v1alpha1 kind: SidecarSet metadata: name: sidecarset spec: # ... UpdateStrategy: type: RollingUpdate # Set POD labels, assuming all pods contain labels[app_name] scatterStrategy: -key: app_name value: nginx - key: app_name value: web-server - key: app_name value: api-gateway maxUnavailable: 10%Copy the code

Note: the current version must list all application names, we will support smart label key only in the next version.

Practice of 7.

Alibaba and Ant Group have used SidecarSet to manage sidecar containers on a large scale. Here, I will collect Logtail Sidecar as an example.

  1. Create a sidecarSet resource based on the sidecarset.yaml configuration file.
# sidecarset.yaml apiVersion: apps.kruise.io/v1alpha1 kind: SidecarSet metadata: name: logtail-sidecarset spec: selector: matchLabels: app: nginx updateStrategy: type: RollingUpdate maxUnavailable: 10% containers: - name: Log-service /logtail:0.16.16 # When recevie sigterm, logtail will delay 10 seconds and then stop command: - sh - -c - /usr/local/ilogtail/run_logtail.sh 10 livenessProbe: exec: command: - /etc/init.d/ilogtaild - status resources: limits: memory: 512Mi requests: cpu: 10m memory: 30Mi ##### share this volume volumeMounts: - name: nginx-log mountPath: /var/log/nginx transferEnv: - sourceContainerName: nginx envName: TZ volumes: - name: nginx-log emptyDir: {}Copy the code
  1. Create a pod based on pod.yaml.
apiVersion: v1
kind: Pod
metadata:
  labels:
    # matches the SidecarSet's selector
    app: nginx 
  name: test-pod
spec:
  containers:
  - name: nginx
    image: log-service/docker-log-test:latest
    command: ["/bin/mock_log"]
    args: ["--log-type=nginx", "--stdout=false", "--stderr=true", "--path=/var/log/nginx/access.log", "--total-count=1000000000", "--logs-per-sec=100"]
    volumeMounts:
    - name: nginx-log
    	mountPath: /var/log/nginx
    envs:
    - name: TZ
      value: Asia/Shanghai
  volumes:
  - name: nginx-log
  	emptyDir: {}    
Copy the code
  1. Create this Pod and you’ll find that it’s injected into the Logtail container:
$ kubectl get pod NAME READY STATUS RESTARTS AGE test-pod 2/2 Running 0 118s $ kubectl get pods test-pod -o yaml |grep 'logtail: 0.16.16' image: the log - service/logtail: 0.16.16Copy the code
  1. At this point, SidecarSet status is updated to:
$ kubectl get sidecarset logtail-sidecarset -o yaml | grep -A4 status
status:
  matchedPods: 1
  observedGeneration: 1
  readyPods: 1
  updatedPods: 1
Copy the code
  1. Updated the image logtail of sidecar Container in sidecarSet to 0.16.18.
$ kubectl edit sidecarsets logtail-sidecarset # sidecarset.yaml apiVersion: apps.kruise.io/v1alpha1 kind: SidecarSet metadata: name: logtail- SidecarSet spec: containers: - name: logtail image: log-service/logtail:0.16.18Copy the code
  1. At this point, we find that the Logtail container in pod has been updated to Logtail :0.16.18, and pod and other containers have not been restarted.
$ kubectl get pods |grep test-pod test-pod 2/2 Running 1 7m34s $ kubectl get pods test-pod -o yaml |grep 'image: Logtail :0.16.18' image: log-service/logtail:0.16.18 $kubectl describe Pods test-pod Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Killing 5m47s kubelet Container logtail definition changed, Will be restarted Normal Pulling 5 m17s kubelet Pulling image "log - service/logtail: 0.16.18" Normal Created five m5s (x2 over 12m) kubelet Created container logtail Normal Started 5m5s (x2 over 12m) kubelet Started container logtail Normal Pulled 5m5s kubelet Successfully pulled image "log-service/logtail:0.16.18"Copy the code

conclusion

In this update to OpenKruise V0.8.0, SidecarSet features mainly improve the ability of log management Sidecar scenarios. In the future, we will continue to deepen the stability and performance of SidecarSet, but also cover more scenarios. For example, the next version will add support for Service Mesh scenarios. At the same time, we also welcome more students to participate in the OpenKruise community, and jointly build a more rich and perfect K8s application management and delivery expansion ability, which can be oriented to more large-scale, complicated and extreme performance scenarios.

If you are interested in the OpenKruise project and have any topics you would like to talk about, please visit the OpenKruise website, GitHub, and the Tidbit search group number: 23330762 to join the exchange group!