The Birth Background of CUBE
With the popularization and landing of cloud native technology, container technology has been used more and more in the production environment of enterprises. Kubernetes, the de facto standard for container choreography, is widely adopted in enterprise services. In 2018, the UCloud Container team launched the Kubernetes product UK8S, which is based on the UCloud public cloud environment and seamlessly integrates the UCloud IaaS layer computing, networking and storage services, enabling customers to quickly access the product-usable Kubernetes cluster. And have the ability to control the cluster flexibly.
However, in the process of promoting UK8S products and accessing customers, the container team also received some feedback from users:
- Maintaining the Kubernetes cluster adds an additional burden; Users need backend resources in addition to managing applications and fail to achieve application-centric business management.
- The Kubernetes system is complex and the learning curve is steep, which requires the customer team to have a certain technical reserve. For customers who have used containers but have not yet tried Kubernetes, the same is true. On the one hand, they need to understand the technical system of Kubernetes, and on the other hand, they need to modify the application architecture to adapt to Kubernetes.
- It is hoped that there will be an out-of-the-box container product that can pull applications directly from the container without having to wait for the virtual machine to be ready before deploying the application, thus reducing the waiting time for application ready.
To address these user issues, the Container team has developed a new Serverless container product, Cube, which is currently in public beta. In addition to lowering the barriers for users to use the container, the product also has the following features:
1. Operation and maintenance free: no burden of maintenance resources, no need to care about the running location, application-centric, container image as the application packaging standard.
2. Pay-as-you-go: Pay for the resources your app actually uses.
3. Automatic scaling capacity: based on massive resources, API is provided, which can pull up and close applications as needed and automatically schedule resources.
4. High availability: the product itself is highly available, while providing self-healing ability of the application.
In implementing the Cube product features, the Container team had to address several technical issues: 1. Multi-tenant isolation must be considered in the selection of public cloud products for container runtimes. Unlike the UK8S product, which is built in isolation on a cloud host, the Cube product runs the container directly on the host physical machine. The containers implemented by standard Docker cannot achieve strong isolation between different containers for different users on the same host, so the Cube product requires a container runtime solution with both strong isolation of virtual machines and fast container startup.
The Container Team has noticed that AWS has opened up the lightweight Firecracker, which has many advantages such as less resource consumption, fast startup speed and easy maintenance, and has been used in the actual production environment, which is very consistent with the Cube business scenario, so finally it adopts the container runtime scheme based on the Firecracker lightweight virtual machine. It can be seen from the following two figures that, through the special simplification and optimization of cloud computing scene, Firecracker has obvious advantages in terms of starting speed and memory consumption compared with the mainstream virtualization component QEMU.
VMM boot time and memory footprint comparison, image referenced source
2. Container Managed Services
There are also open source container management services that support the virtual machine container runtime, such as Containerd/Cri-O, Kata-Container and Firecracker-Containerd, etc. After comparison, the container team chose the combo of Cri-O + Firecracker-Containerd. These two functions can meet the needs of stand-alone container management, and compared with other selection, the code architecture is clearer, the call link is simple and clear, and it is easy to customize and transform according to the product requirements.
3. Container scheduling service
Kubernetes has become the de facto standard for container scheduling with rich functionality and good scalability. Therefore, the container team adopted Kubernetes as the basic scheduling framework and made relevant modifications according to product requirements. The final basic service architecture is as follows:
Optimization to improve
Although the open source solution can speed up the development progress, there are still some problems to be solved in order to meet the product requirements, mainly including the following aspects:
In the standard container mirroring implementation, images are stored on the host in a hierarchical structure. When a container is created, the container runtime creates a writable layer on top of the mirror layer and mounts it on the host for use by the container instance. But the Cube container does not run directly on the host, and there is no need to mount the container root directory on the host. Therefore, the container team modified the implementation of the mirroring layer in CRI-O to mount the container writable layer directly into the lightweight virtual machine as a block device instead of the host, reducing host interference with the Cube container.
In addition, in order to solve the problem of slow startup of container instances caused by the slow pull of new images, the container team proposed a solution of remote mount of images. The container image is stored as a block device in the cache cluster. When a container instance needs to be generated on this image, the container image is first mounted to the host as a remote mount. Then, a writable layer is created on the host to generate the container instance when the container is run. At the same time, the background will synchronize the remote image to the host local, which will further accelerate the reading and reduce the cluster risk. The above method can shorten the time of the first image acquisition on the host to less than 3s, and there is room for further optimization. This functionality is currently available as a product of the image cache and is being gradually integrated into the general image pull process.
2. Use public cloud resources
Network-wise, the Cube container’s networking model is essentially the same as the cloud host’s. After the relevant network functions are implemented in the form of CNI plug-ins, the Cube container can be well connected to the public cloud VPC network.
On the storage side, the Cube container currently supports two types of storage: NFS, a network file system that can read and write multiple times, and Uisk, a cloud hard disk that can read and write only one time. In terms of file storage, the Cube product realizes the function of automatically mounting NFS in a lightweight virtual machine. Users only need to specify mount points and mount parameters in the configuration file to directly use the network file system in the container, and can support both user-built NFS in the VPC network and UFS in the UCloud public cloud product. In terms of the function of firecracker devices, the container team expands the implementation of firecracker devices. By adding support for the vhost-user protocol, the Cube lightweight virtual machine can be directly connected to the SPDK service, thus enabling the mounting and use of high-performance RSSD-type cloud hard disks.
3. Container running environment
To reduce the additional resource consumption, the container team did a lot of tuning on the container management services and the container runtime.
The container team modified the CRI-O architecture for managing container groups to adopt a single POD for single SHIM model. Managing all containers in a POD through a single SHIM can significantly reduce the consumption of SHIM resources and simplify container management. For lightweight virtual machines, the container team has also streamlined the kernel/rootfs/init process and retained only the most basic functions to speed up startup, reduce security attack surface, and reduce resource consumption. In addition, the Container team has built in an implementation of Infra Container into a lightweight virtual machine so that Cube can run as a POD without having to mount additional Infra containers.
4. K8s transformation
As a general purpose container scheduling framework, Kubernetes can meet most container management requirements. However, the container team still needs to make some modifications to the K8S component for specific Cube usage scenarios. On the control side, the container team adopted a custom scheduler to better meet the requirements of task priority, scheduling speed, and resource management in multi-tenant scenarios. On the host node, due to the Cube container runtime, the container team has streamlined some of the features that do not require Kubelet to implement, such as mounting the ConfigMap/Volume directory on the host, running the CNI plug-in, collecting directory specific logs, and so on, to enhance the security of isolation between the container and the host.
Future Prospects of CUBE
After the above development and transformation, the CUBE product has been successfully launched and achieved good results. Subsequent Cube products will continue to be iterated along the lines of helping users improve efficiency, reduce overhead, simplify maintenance, and save costs. In terms of container performance, the container team will continue to optimize the lightweight virtual machine IO path, reduce the performance loss of virtualization and management components, and ensure the stable and efficient operation of user container instances. In terms of service management, Cube products will launch a variety of container management controllers and realize the ability of Cube instances to directly connect to Kubernetes cluster, providing users with multi-level resource scheduling mode, which is convenient for users to manage and maintain according to their actual needs.
If you are interested in UCloud Cube products, please join the Cube Test Communication Group by scanning the code!
Automatic identification of https://u.wechat.com/ELATXrQmIDI2cCgzioMgl88 (qr code)