The introduction

Blueking Container Service (BCS) is a container-on-cloud platform of Tencent IEG Interactive Entertainment Business Group, based on Tencent Kubernetes Engine (TKE). Provide containerization and microservitization construction work for cloud of IEG’s self-developed game business. Different from the general Internet business, Tencent’s game business has a series of characteristics such as large-scale, low latency, network sensitivity, ultra-high reliability requirements and a large number of technologies such as shared memory communication, which is a huge challenge to the original cloud on the cloud. BCS developed two enhanced versions of Kubernetes workload operators in the on-container cloud process serving each game business, combining business requirements with community solutions: GameStatefulSet and GameDeployment, closer to the business scenario, meet the complex and diverse cloud requirements on containers.

The complexity of the game’s business characteristics

There are many types of game businesses, from room-based games to Mmos. No matter what type of game, there are such characteristics as large-scale online players, extremely sensitive to network delay and jitter, multi-zone and multi-service, etc. In order to meet these needs, the game background service will naturally pursue real-time high-speed communication and performance maximization. Extensive use of technologies such as interprocess shared memory communication, data preloading into memory, cross-host TCP communication, and minimal use of remote data, RPC, is actually a bit of the opposite of the requirements of microservices. Combined with the requirements of containerized cloud, game services generally have the following characteristics:

  • Heavy use of shared memory technology in favor of stateful services.
  • Large scale, divided service, need to be able to do batch gray release, in order to reduce the difficulty of operation and maintenance, it is best to achieve intelligent control, control the release scale, speed, steps.
  • Data relocation is required when an instance is expanded or updated, and the instance cannot be immediately removed from service.
  • Before scaling down an instance, complete route changes. For example, before shrinking an instance of the micro-service name communication grid, the controller of the name communication grid should interact with it to confirm whether the routing change has been completed and then decide whether to delete the instance.
  • Before scaling down or updating a matchmaking game, you need to wait for all matches on the instance to end and then exit the service.
  • In order to ensure smooth upgrade, some game background services use reload technology. In the reload process, the new version of the process takes over the old version of the process to provide services, memory data is not lost, and players have no perception during the upgrade process.

All of these features are a huge challenge for Kubernetes and cloud native. Kubernetes natively fits the microservices architecture, treating all instances as livestock rather than pets. Even when StatefulSet(originally named PetSet) was introduced to support stateful services, each instance was assigned a network and storage number. Even if the instance failed, an instance with the same number could be pulled to replace it, without involving complex processes such as shared memory loss, data relocation, and route changes. This is why PetSet was later renamed StatefulSet. To support the cloud of complex business such as games, we need to go one step further and develop workload that is more suitable for business scenarios, reducing the barriers and costs of business access.

BCS New Workload: GameDeployment & GameStatefulSet

BCS discusses business scenarios with various game businesses and platforms and abstracts business commonalities and needs in the process of serving Tencent IEG’s various types of container cloud, including but not limited to game business. At the same time, I actively learn and refer to the excellent open source projects of cloud native community, such as OpenKruise, Argo-Rollouts, Flagger, etc. On the basis of Kubernetes native and other open source projects, Two operators, BCS-GameDeployment-operator and BCs-GamestateFulset-operator, are developed. The enhanced Kubernetes workloads for Both GameDeployment and GameStatefulSet provide a series of enhanced features and performance improvements over the native Deployment and StatefulSet workloads. To meet the cloud native on-cloud requirements of complex services. Although GameDeployment and GameStatefulSet are generated in the scene of serving game business, the features we abstract from them can actually meet the needs of most types of business, especially complex business, with stronger controllability and closer to the scene of BUSINESS R&D, operation and maintenance release. Can greatly improve the cloud native cloud capabilities.

GameDeployment

Kubernetes native Deployment is a stateless service-oriented workload, which is implemented based on ReplicaSet. One Deployment implements rolling update and rollback of applications by controlling the number of ReplicaSet versions in the underlying layer. Although it is stateless service, most applications still have some other requirements such as POD in-place upgrade, HOT update of POD image (separate below) and so on. However, the implementation of native Deployment is complicated because it is based on multiple versions of ReplicaSet iteration, and it is difficult to add in-place upgrade and other functions in it. Based on the code implementation of native Deployment and StatefulSet and other open source projects, we developed and implemented an enhanced Version of Deployment: GameDeployment to meet more high-level requirements of complex stateless applications. Compared to Deployment, GameDeployment has the following core features:

  • Support RollingUpdate.
  • Support POD in-place upgrade
  • Support pod container image hot update
  • Grayscale partition publishing is supported
  • Support intelligent step-by-step grayscale release, hook check can be added in the grayscale release step
  • Supports hook verification before removing or updating pods for an elegant POD exit
  • Supports image prefetching before in-place restart to speed up the in-place restart
apiVersion: tkex.tencent.com/v1alpha1
kind: GameDeployment
metadata:
  name: test-gamedeployment
  labels:
    app: test-gamedeployment
spec:
  replicas: 5
  selector:
    matchLabels:
      app: test-gamedeployment
  template:
    metadata:
      labels:
        app: test-gamedeployment
    spec:
      containers:
      - name: python
        image: python:3.5
        imagePullPolicy: IfNotPresent
        command: ["python"]
        args: ["-m", "http.server", "8000" ]
        ports:
        - name: http
          containerPort: 8000
  preDeleteUpdateStrategy:
    hook:
      templateName: test
  updateStrategy:
    type: InplaceUpdate
    partition: 1
    maxUnavailable: 2
    canary:
      steps:
        - partition: 3
        - pause: {}
        - partition: 1
        - pause: {duration: 60}
        - hook:
            templateName: test
        - pause: {}
    inPlaceUpdateStrategy:
      gracePeriodSeconds: 30
Copy the code

This is an example of a GameDeployment YAML configuration that is not very different from a Deployment configuration and largely inherits the meaning of the Deployment parameters. We will go through the differences or additions one by one:

  • UpdateStrategy /type update type, which supports RollingUpdate(RollingUpdate),InplaceUpdate(in-place update), and HotPatchUpdate(mirroring hot update). RollingUpdate is defined the same as Deployment, and InplaceUpdate and HotPatchUpdate are described separately below.
  • UpdateStrategy /partition is a new parameter compared with Deployment, which is used for grayscale publishing and has the same meaning as StatefulSet partition.
  • UpdateStrategy/maxUnavailable refers to in the process of updating each batch update, the number of instances in the process of updating the instance is not available. For example, if there are 8 instances and maxUnavailable is set to 2, scroll or restart 2 instances in each batch and wait for the updates to complete before making the next batch of updates. It can be set to an integer value or percentage. The default value is 25%.
  • If each batch of the updateStrategy/maxSurge rolling update process deletes the number of pods from the old version and then creates the number of pods from the new version, then during the entire update process, Only the number of replicas-maxUnavailable instances can be serviced. If the total number of instances is small, the service capability of the application is affected. When you set the maxSurge, you create more pods for the maxSurge before rolling updates, then update them batch by batch, and then delete the pods for the maxSurge after all the instances are updated to ensure the total number of instances that can be served during the update. The default value of maxSurge is 0. InplaceUpdate and HotPatchUpdate do not restart pod. Therefore, it is recommended that you do not set the maxSurge parameter in the case of InplaceUpdate and HotPatchUpdate.
  • UpdateStrategy/inPlaceUpdateStrategy situ upgrade gracePeriodSeconds time, see below “InplaceUpdate situ upgrade” is introduced.
  • UpdateStrategy/Canary defines the steps for batch grayscale publishing, as described in “Automated Step-by-step Grayscale publishing” below.
  • PreDeleteUpdateStrategy Deletes or updates the hook policy before the previous POD to gracefully exit the POD. See “PredeleteHooks: Gracefully Removing and updating pods” below.

GameStatefulSet

Kubernetes’ native StatefulSet is a workload for stateful applications where each application instance has a separate network and storage number and instances are updated and downsized in an orderly manner. StatefulSet To meet the needs of some of the more complex stateful applications described above, we developed an enhanced version of The original StatefulSet: GameStatefulSet. GameStatefulSet contains the following new features compared to StatefulSet:

  • Support POD in-place upgrade
  • Support pod container image hot update
  • Parallel updates are supported to speed up updates (including rolling updates, in-place updates, and mirrored hot updates)
  • Support intelligent step-by-step grayscale release, hook check can be added in the grayscale release step
  • Supports hook verification before removing or updating pods for an elegant POD exit
  • Supports image prefetching before in-place restart to speed up the in-place restart
apiVersion: tkex.tencent.com/v1alpha1
kind: GameStatefulSet
metadata:
  name: test-gamestatefulset
spec:
  serviceName: "test"
  podManagementPolicy: Parallel
  replicas: 5
  selector:
    matchLabels:
      app: test
  preDeleteUpdateStrategy:
    hook:
      templateName: test
  updateStrategy:
    type: InplaceUpdate
    rollingUpdate:
      partition: 1
    inPlaceUpdateStrategy:
      gracePeriodSeconds: 30
    canary:
      steps:
      - partition: 3
      - pause: {}
      - partition: 1
      - pause: {duration: 60}
      - hook:
          templateName: test
      - pause: {}
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: python
        image: python:latest
        imagePullPolicy: IfNotPresent
        command: ["python"]
        args: ["-m", "http.server", "8000" ]
        ports:
        - name: http
          containerPort: 8000
Copy the code

The following is an example of YamL for GameStatefulSet:

  • PodManagementPolicy supports both “OrderedReady” and “Parallel”. The definition is the same as StatefulSet and the default is OrderedReady. Unlike StatefulSet, if configured as Parallel, not only is instance scaling Parallel, but instance updating is also Parallel, i.e., automatic Parallel updating.
  • UpdateStrategy /type Supports RollingUpdate, OnDelete, InplaceUpdate, and HotPatchUpdate. Compared with the original StatefulSet, InplaceUpdate and HotPatchUpdate are added.
  • UpdateStrategy rollingUpdate/partition control the amount of gray released in accordance with StatefulSet meaning. For compatibility, the number of grayscale releases for InplaceUpdate and HotPatchUpdate is also configured by this parameter.
  • UpdateStrategy/inPlaceUpdateStrategy situ upgrade gracePeriodSeconds time, see below “InplaceUpdate situ upgrade” is introduced.
  • UpdateStrategy/Canary defines the steps for batch grayscale publishing, as described in “Smart Step-by-step Grayscale publishing” below.
  • PreDeleteUpdateStrategy Deletes or updates the hook policy before the previous POD to gracefully exit the POD. See “PredeleteHooks: Gracefully Removing and updating pods” below.

Function features and scenario coverage

Upgrade InplaceUpdate in place

Both GameDeployment and GameStatefulSet support the InplaceUpdate update strategy. An in-place upgrade means that, when updating a POD version, only one or more containers in the POD are restarted, keeping the pod lifecycle unchanged, so that the shared memory IPC, etc., of the POD can be maintained during the upgrade. Using the instance update mode of in-place upgrade has the following benefits:

  • Pods have multiple containers that communicate with each other through shared memory. The upgrade is expected to maintain the POD life cycle, only part of the container is updated, IPC shared memory is not lost, and pod continues to provide service after the update is completed.
  • The original rolling update strategy requires deleting old version instances one by one or in batches and creating new version instances, which is inefficient. Using the in-place upgrade approach, which eliminates the need to rebuild pod instances, can greatly speed up the release of updates.

Kubernetes native Deployment and StatefulSet workloads do not directly support in-place upgrade updates, but the Kubelet component implicitly supports this capability. For a Pod in the running state, we only need to update the image version of the Pod spec by patch. When kubelet monitors the change, it will automatically kill the container of the image of the corresponding old version and pull up the container of the image of the new version. That is to achieve Pod in situ upgrade. We through the combination of ReadinessGate and inPlaceUpdateStrategy/gracePeriodSeconds, to achieve the smooth flow of in situ upgrade service switching.

In situ upgrade update strategy, you can configure the spec/updateStrategy/inPlaceUpdateStrategy/gracePeriodSeconds parameters, assuming that configured for 30 seconds, So GameStatefulSet/GameDeployment update in place before a pod, will through ReadinessGate first set the pod to unready, 30 seconds later will really go back to restart the pod in the container. In this case, k8S removes pod instances from service endpoints during the 30 seconds during which the pod becomes unready. Such as in situ after the success of the upgrade, GameStatefulSet/GameDeployment again set the pod for the ready state, after k8s will add the pod instance to new service endpoints. By using this logic, service traffic is not damaged during the entire in-place upgrade. GracePeriodSeconds default value is 0, when 0 GameStatefulSet/GameDeployment immediately in situ upgrade of containers in pod, could lead to the loss of service flow. InplaceUpdate also supports grayscale distribution partition configuration, which is used to configure the grayscale distribution ratio.

GameDeployment InplaceUpdate Example GameStatefulSet InplaceUpdate example

HotPatchUpdate Container image HotPatchUpdate

The in-place update strategy preserves the pod life cycle and IPC shared memory, but always requires a container restart. For matchable GameServer containers, upgrading the GameServer container in place will interrupt the player’s service if a player is in matchable service. In order to realize continuous update, some businesses use the reload technology. During the reload process, the new version of the process takes over the old version to provide services, without losing memory data and without being perceived by players during upgrade. In order to meet the container cloud requirements of this kind of business, we investigated the incremental update strategy of Docker image merge and modified docker source code to add a container image hot update interface. When an image version is updated by calling the image hot update interface on a running container, the lifetime of the container and the processes inside the container remain the same, but the underlying image of the container is replaced with the new version. Through such changes to Docker, after image hot update of a container in running state, the container state remains unchanged, but the version and data of its basic image have been incremental updated. If the process in the container implements reload function, and the SO file or configuration in the base image has been updated to the new version, it only needs to send reload signal to the process in the container to complete the hot update of the server process and realize the continuous service upgrade. In order to implement hot update capability of container image in Kubernetes, we modified kubelet’s code. Based on Kubelet’s in-place upgrade capability, when pod adds specified annotation, Kubelet updates pod from in-place upgrade operation to container image hot update operation, call docker image hot update interface to complete container image hot update. The detailed implementation of hot update of container images on Docker and Kubelet will be covered in a separate article.

GameStatefulSet/GameDeployment integrates the function of vessel mirror hot update, when the spec/updateStrategy/type is configured to HotPatchUpdate, By updating the container image version in POD and adding annotations, kubelet and Docker are linked to complete the hot update of container image. During the whole process, pod and its container’s life cycle remain unchanged. After that, users can complete reload of the business process by sending signals to the process in the container to ensure the uninterrupted service. HotPatchUpdate also supports grayscale distribution partition configuration and is used to configure the grayscale distribution ratio.

The update strategy of HotPatchUpdate needs to be combined with our customized Kubelet and Docker versions to take effect. GameDeployment HotPatchUpdate Example GameStatefulSet HotPatchUpdate example

Interactive release of applications based on Hook

As mentioned above, most complex applications have many external dependencies or data index dependencies of the application itself in the process of releasing updates. As mentioned above, data relocation is required before instance scaling or updating; Route change must be completed before scaling down an instance. Instance scaling or updating requires waiting for the end of the game. In addition, at the time of grayscale release, we sometimes need to see if the metrics from Prometheus monitoring data are meeting expectations to decide whether to continue with grayscale for more instances. This can actually be seen as a variety of hooks in the application release process, and the results of the hook can be used to judge whether the next release process can be continued. Both GameDeployment for stateless applications and GameStatefulSet for stateful applications have this publishing requirement. After deeply digging business requirements and investigating solutions, we abstract out a general operator at Kubernetes level: BCS-hook-operator. The main responsibility of BCS-hook-operator is to perform hook operations according to hook template and record the state of hook, the final state of GameDeployment or GameStatefulSet watch hook. The hook results are used to determine what to do next.

Bcs-hook-operator defines two types of CRDS:

  • HookTemplate
apiVersion: tkex.tencent.com/v1alpha1
kind: HookTemplate
metadata:
  name: test
spec:
  args:  
    - name: service-name
      value: test-gamedeployment-svc.default.svc.cluster.local
    - name: PodName
  metrics:
  - name: webtest
    count: 2
    interval: 60s
    failureLimit: 0
    successCondition: "asInt(result) < 30"
    provider:
      web:
        url: http://1.1.1.1:9091
        jsonPath: "{$.age}"
Copy the code

HookTemplate is used to define a hook template. Multiple metrics can be defined in a HookTemplate, and each metric is a hook to execute. Metric allows you to define the number of hooks, the interval between two hooks, the criteria for success, the provider, and so on. Provider defines the type of hook, and currently supports two types of hooks: Webhook and Prometheus.

  • HookRun
apiVersion: tkex.tencent.com/v1alpha1
kind: HookRun
metadata:
  name: test-gamedeployment-67864c6f65-4-test
  namespace: default
spec:
  metrics:
  - name: webtest
    provider:
      web:
        jsonPath: '{$.age}'
        url: http://1.1.1.1:9091
    successCondition: asInt(result) < 30
  terminate: true
status:
  metricResults:
  - count: 1
    failed: 1
    measurements:
    - finishedAt: "2020-11-09T10:08:49Z"
      phase: Failed
      startedAt: "2020-11-09T10:08:49Z"
      value: "32"
    name: webtest
    phase: Failed
  phase: Failed
  startedAt: "2020-11-09T10:08:49Z"
Copy the code

HookRun is an operational hook CRD created based on the HookTemplate template. Bcs-hoo-operator monitors and controls the HookRun status and lifecycle, and performs hook operations according to metrics defined in HookTemplate. And record the result of hook call in real time. For more details about BCS-hook-operator, see BCS-hook-operator

GameDeployment/GameStatefulSet with BCS – hook – operator in using hook interaction in the process of application architecture diagram:

Automated step-by-step grayscale publishing

GameDeployment & GameStatefulSet supports the intelligent step batch grayscale publishing function, which allows users to configure the automatic steps of grayscale publishing. By configuring multiple grayscale publishing steps, the purpose of batch publishing can be achieved, the effect of releasing can be automatically monitored, and the intelligent control of grayscale publishing can be realized. Currently, the following four steps can be configured in the grayscale publishing step:

  • The number of grayscale instances is specified by partition
  • Permanently suspend grayscale unless manually triggered by the user to continue with subsequent steps
  • Pause the specified time before continuing with the next step
  • TemplateName specifies the HookTemplate to use. The HookTemplate must already be created in the cluster. GameDeployment&GameStatefulSet creates a HookRun from a HookTemplate, which bcS-hook-operator manipulates and executes. GameDeployment&GameStatefulSet watch the status of HookRun. If the results meet expectations, subsequent grayscale steps continue, if the results do not meet expectations, grayscale publishing is suspended and human intervention is required to decide whether to continue with subsequent grayscale steps or roll back. In the following example, six steps for grayscale publishing are defined:
. spec: ... updateStrategy: type: InplaceUpdate rollingUpdate: partition: 1 inPlaceUpdateStrategy: gracePeriodSeconds: 30 canary: Steps: -partition: 3 # Number of releases - pause: {} # number of releases - pause: {duration: 60} # # # # # # # # # # # # # # # # # # # # #Copy the code

The configuration and usage of smart step Grayscale publishing on GameDeployment and GameStatefulSet are basically the same. For details, please refer to smart Step Grayscale Publishing Tutorial

PreDeleteHook: Gracefully remove and update pods

In the section of “Interactive Release of Hook-based applications” above, we mentioned that there are many external dependencies or data index dependencies of the application itself in the process of release and update. Especially in the case of reducing or upgrading the instance version, the instance of the old version needs to be deleted, but often there are still services on the instance can not be interrupted, such as players in the game. In this case, the capacity reduction or update of the instance is dependent and cannot be performed immediately. You need to query the capacity reduction or update condition when the condition is met. We developed predeleteHooks on GameDeployment and GameStatefulSet based on the BCS-hook-Operator abstraction to gracefully remove and update application Pod instances.

apiVersion: tkex.tencent.com/v1alpha1 ... Spec: preDeleteUpdateStrategy: Hook: templateName: Test inPlaceUpdateStrategy: gracePeriodSeconds: 30Copy the code

Specified in GameDeployment/GameStatefulSet spec/preDeleteUpdateStrategy HookTemplate, so when the shrinkage or update the Pod instance, for each to delete or update the Pod, GameDeployment/GameStatefulSet according to create a HookRun HookTemplate template, and then watch the HookRun state. Bcs-hook-operator controls the running of HookRun and records its status in real time. When HookRun run is completed, GameDeployment/GameStatefulSet watch to its final state, according to the final state to decide whether to delete or update Pod properly.

Further, we support automatic rendering of common parameters such as PodName, PodNamespace, PodIP, etc. in HookTemplate and HookRun. For example, assuming that the hook to be run in a PreDeleteHook is an HTTP interface of the application instance itself, exposed to port 8080 of the container, we could define a HookTemplate like this:

apiVersion: tkex.tencent.com/v1alpha1
kind: HookTemplate
metadata:
  name: test
spec:
  args:
    - name: PodIP
  metrics:
    - name: webtest
      count: 3
      interval: 60s
      failureLimit: 2
      successCondition: "asInt(result) > 30"
      provider:
        web:
          url: http://{{ args.PodIP }}:8080
          jsonPath: "{$.age}"
Copy the code

So GameDeployment/GameStatefulSet stay in to delete or update the Pod create HookRun, the Pod IP rendering into webhook url, Ultimately, a Webhook call to the HTTP interface provided by the application Pod itself is created and executed.

The configuration and use of PredeleteHooks on Both GameDeployment and GameStatefulSet are basically the same. For detailed tutorials, see: PredeleteHooks: Gracefully remove and update pods

Mirror preheating

Pod in-place upgrades are used to maximize the efficiency of distribution and reduce the time of service outages. However, the biggest time consumption in the in-place upgrade of a Pod is the time to pull the image of the new version, especially if the image is large. Therefore, during the process of using in-place upgrade, the most common problem reported to us is that the speed of in-place upgrade is still too slow, which is not the ideal speed. With this in mind, we worked with the public support team at Happy Game Studios to build an in-place upgrade image warm-up solution for GameStatefulSet&GameDeployment. Taking GameDeployment as an example, the process architecture of the image warm-up solution is as follows:

  • 1. The user triggers the GameDeployment in-place upgrade.
  • 2. Kube-apiserver intercepts the request through Admission Webhook and submits it to BCS-Webhook-Server for processing.
  • 3. Bcs-webhook-server determines to trigger in-place upgrade for the user, modiates the content of GameDeployment, changes the mirrored patch version to the original version, and adds a mirrored patch of the new version to Annotations.
  • 4. Bcs-webhook-server uses the image of the new version to create a Job on all nodes running the application instance, and watches the status of these jobs. When the Job runs, it pulls the image of the new version.
  • 5. Bcs-webhook-server, after monitoring the running results of all jobs, modifies the content of GameDeployment, deletes the patch of the new version of annotations, and changes the patch of the mirrored version into the mirror of the new version. Trigger a true in-place upgrade. Then, the Job that finished running is cleared.
  • 6. Bcs-gamedeployment-operator Watch Executes the update strategy of the in-place upgrade after the true in-place upgrade.

Using this scheme ensures decoupling of Kubernetes workload GameDeployment&GameStatefulSet from the mirror preheating scheme, assuming that more Kubernetes workload mirror preheating is supported, Just add support for CRD for this workload on BCS-Webhook-Server. Based on this, we reconstructed and developed BCS-Webhook-Server, which supports adding webhooks in plug-in mode:

For more details about the image preheating solution and the implementation of BCS-Webhoo-server, please refer to BCS-webhoo-Server

conclusion

In the process of building a cloud platform based on TKE, the BCS team discussed with different business teams, mined business needs, abstracted the commonalities of requirements, and combined with the open source solutions of the community. Two Kubernetes workloads, GameDeployment and GameStatefulSet, were developed. Both workloads and their features, while designed for the cloud of complex gaming businesses, cover the needs of most Internet businesses and are more closely aligned with the operation and release scenarios of various businesses. As we move forward, we will continue to discuss and collaborate with business teams to abstract more requirements features, iterate, and continue to enhance GameStatefulSet and GameDeployment capabilities. Blue Whale Container Service BCS has been open source, for more cloud solutions on containers and details please refer to our open source project: BK-BCS

Thanks to the following committers of co-developers

  • stonewesley
  • pang1567