Speaker:

  • Xu Di, Technical expert of Ant Financial: Responsible for the construction of Ant Financial Cloud PaaS platform, veteran of Kubernetes community, top 50 in the community in terms of core code base contribution;
  • Zhang Xiaoyu, Technical expert of Alibaba Cloud: responsible for the ecological construction of alibaba Cloud’s native application container platform, mainly designing and developing solutions related to node stability and resource utilization, and also an enthusiastic member and contributor of Kubernetes community.

This article is shared by Xu Di and Zhang Xiaoyu at KubeCon NA2019. The sharing will start from the following aspects: first, a brief introduction to Sidecar containers; Second, we will share a few common scenarios for Ant Financial and Alibaba Group and how we are addressing these challenges. Of course, there are still many challenges that need to be solved in the future. We invite you to work with us.

Sidecars profile

Sidecar containers are not new. It is a design pattern that is mainly used for ancillary tasks such as network connectivity, downloading and copying files, and so on. If you’re familiar with Docker Swarm, Docker Ambassador is actually Sidecar.

Consider the example above, where the Service Consumer and Redis Provider are strongly coupled and deployed on the same node. If the Redis Provider fails, you need to connect to another Redis instance, reconfigure it, and restart the Service Provider.

So once Ambassador was introduced, the problem became relatively simple, just restarting Redis Ambassador here, without any changes from The Service Consumer.

Of course, cross-node communication is also possible in this mode, as shown in the figure below. This allows the Service Consumer and Redis Provider to be deployed on different nodes. To some extent, it is easy to decouple the two services.

Sidecar case sharing

What can a Sidecar container be used for?

In general, Sidecar containers can:

  • Log proxy/forwarding, such as Fluentd;
  • Service Mesh, such as Istio, Linkerd;
  • Agents, such as Docker Ambassador;
  • 1. To check whether certain components are working properly;
  • Other auxiliary work, such as copying files, downloading files, etc.
  • .

Is that all?

In fact, Sidecar is becoming more and more accepted and widely used. Sidecar containers are usually deployed in the same Pod as service containers (non-Sidecar containers), share the same life cycle, and provide auxiliary functions for service containers. This is a great model to decouple applications to a great extent and support heterogeneous components, lowering technical barriers.

However, the management of Sidecar by Kubernetes is not perfect, and it is increasingly not satisfied with our use, especially in the production environment.

A few typical cases

1. Sequential dependence

Suppose we inject multiple SidecArs within a Pod, but there are dependencies between sidecArs or between sidecArs and the business container.

In the following example, we need to start the proxy Sidecar container to establish a network connection so that the mysql client can connect to the remote mysql cluster and expose the service locally. The backend business container works properly.

   #1 proxy_container (sidecar)
   #2 mysql_client
   #3 svc_container
Copy the code

Of course, some people think that this can be fixed by, for example, changing the image startup script to delay startup. But these methods are too intrusive to scale and difficult to configure accurately.

2. Sidecars management

Let’s look at another example. Sidecar containers and business containers are coupled within the same Pod and share the same lifecycle. Therefore, it is difficult to manage the Sidecar container alone, such as updating the Sidecar image.

For example, we’ve injected a Sidecar container like Istio Proxy into many pods, and it’s working fine. But what if we want to upgrade the Proxy image at this point?

If we follow the Istio community documentation, we need to re-inject these Sidecar containers. Specifically, the original Pod needs to be deleted and a new ONE needs to be generated (some workload associated PODS will be automatically generated by the corresponding workload controller).

What if we have a lot of these pods to deal with? Using the command line is too inconvenient and error-prone. Extensibility is a problem with code that you write separately, and you need to change that code frequently.

And there’s another problem. We’re not going to upgrade all sidecars at once. We’re going to have a gray-scale process, which means we’re only going to upgrade some sidecars.

Community development

The upstream community

Here we are very grateful to Joseph Irving (@Joseph-Irving) for coming up with a Sidecar kep that differentiates whether a Sidecar container is a LifecycleType.

type Lifecycle struct {
  // Type
  // One of Standard, Sidecar.
  // Defaults to Standard
  // +optional
  Type LifecycleType `json:"type,omitempty" protobuf:"bytes,3,opt,name=type,casttype=LifecycleType"`
}

// LifecycleType describes the lifecycle behaviour of the container
type LifecycleType string

const (
  // LifecycleTypeStandard is the default container lifecycle behaviour
  LifecycleTypeStandard LifecycleType = "Standard"
  // LifecycleTypeSidecar means that the container will start up before standard containers and be terminated after
  LifecycleTypeSidecar LifecycleType = "Sidecar"
)
Copy the code

In the future, just mark it in the Pod Spec as follows:

name: sidecarContainer
image: foo
lifecycle:
  type: Sidecar
Copy the code

Containers in Pod are started in the following sequence: Initializing container >Sidecar container > Service container.

The kubelet side implementation of keP is in progress.

To support more usage scenarios of Sidecar, we propose PreSidecar and PostSidecar based on this, for starting before and after the business container, respectively. See our PR for specific usage scenarios.

Why do we think Sidecar should distinguish between front and back?

This is because in some scenarios, we need the Sidecar container to start before the application container to help with the preparatory work. For example, distribute certificates, create shared volumes, or copy and download other files.

In other scenarios, we need some Sidecar containers to start after the container is applied. For reasons of decoupling and versioning, we split the application into two parts, with the application container focusing on the business itself and some data and personalized configuration in the Sidecar container. Typically, the two containers will share a storage volume, and the post-sidecar container will update and replace some of the default and outdated data.

Of course, considering more complex scenarios in the future, we may also do DAG choreography for the startup sequence of containers, depending on the actual needs of production.

How do Ant Financial and Alibaba respond

In order to manage Sidecar, we needed a more fine-grained workload to facilitate our management. We named this workload as SidecarSet, which is now open source and available for production. You can visit OpenKruise and see what we’re doing under the roadmap. The OpenKruise project currently has three workloads available for production, which are Advanced StatefulSet, BroadcastJob and SidecarSet. The other two workload (AdvancedHPA and PodHealer) are under intensive development and will be open source soon, so stay tuned.

OpenKruise: OpenKruise. IO/OpenKruise roadmap: github.com/openkruise/…

To use the Demo, you can watch Lachlan Evenson’s video.

So here’s our definition of SidecarSet,

// SidecarSetSpec defines the desired state of SidecarSet
type SidecarSetSpec struct {
	// selector is a label query over pods that should be injected
	Selector *metav1.LabelSelector `json:"selector,omitempty"`

	// Containers is the list of sidecar containers to be injected into the selected pod
	Containers []SidecarContainer `json:"containers,omitempty"`

	// List of volumes that can be mounted by sidecar containers
	Volumes []corev1.Volume `json:"volumes,omitempty"`

	// Paused indicates that the sidecarset is paused and will not be processed by the sidecarset controller.
	Paused bool `json:"paused,omitempty"`

	// The sidecarset strategy to use to replace existing pods with new ones.
	Strategy SidecarSetUpdateStrategy `json:"strategy,omitempty"`
}

// SidecarContainer defines the container of Sidecar
type SidecarContainer struct {
	corev1.Container
}

// SidecarSetUpdateStrategy indicates the strategy that the SidecarSet
// controller will use to perform updates. It includes any additional parameters
// necessary to perform the update for the indicated strategy.
type SidecarSetUpdateStrategy struct {
	RollingUpdate *RollingUpdateSidecarSet `json:"rollingUpdate,omitempty"`
}

// RollingUpdateSidecarSet is used to communicate parameter
type RollingUpdateSidecarSet struct {
	MaxUnavailable *intstr.IntOrString `json:"maxUnavailable,omitempty"`
}
Copy the code

The SidecarContainer definition in the spec is the Corev1.container definition in the Kubernetes code base. An additional labelSelector makes it easy to operate on a given group of containers. With our support for RollingUpdate, it is easy for users to upgrade Sidecar bit by bit. The pause function is also provided to suspend Sidecar upgrade in an emergency.

If you simply upgrade the Sidecar image, the SidecarSet controller will only patch the original pod, which is very convenient to upgrade the image in one click.

Other challenges

We also found some other challenges in the process of production practice, and are still looking for better solutions. If you have any good methods or suggestions, welcome to discuss and build together.

1. Manage Sidecar container resources

Generally speaking, Sidecar containers occupy relatively small resources. Should this resource be counted in the entire POD? Or can we just share the resources of the business container? The same Sidecar is used with different application containers, and the exact allocation of resources to Sidecar containers needs to be considered.

2. Fault tolerance of Sidecar containers

In general, Sidecar containers are non-primary containers, so when there is a problem with the Sidecar container, such as LIVENESS detection, does it affect the status of the primary container or the entire POD? Or, if there is a problem with the Sidecar image update, do you want to flag the entire pod directly as a problem?

Of course, there are other challenges, but we’ve just listed a few generic ones. We need to put our heads together to find sensible solutions to these challenges.

summary

As Sidecar is more and more widely used in production environment, more and more attention should be paid to its management. The Sidecar is deployed in the same Pod as the business container, but it is essentially a secondary container. This article introduces the typical use cases of Sidecar and the challenges it faces, as well as working with upstream communities to bring ali economy’s technology solutions to the community and help more users.

Financial Class Distributed Architecture (Antfin_SOFA)