Ofo is based on the practices of K8S container cloud platform

| to | let | | | | technical surgery which | |

Lecturer: Wang Qiang/Head of Ofo container Cloud R&D

Editor: Little Junjun

In recent years, as the container technology ecosystem becomes more and more mature, both domestic and foreign Internet and traditional IT enterprises actively embrace container technology and use container deployment in test and production environments. Container technology reduces IT operation and maintenance costs and shortens business delivery cycle. Based on the mainstream container choreography technology (Docker/Kubernetes) container cloud platform set online release, container management, elastic scaling, resource monitoring and other functions of the PaaS platform is gradually popular.

Welcome to “read the original text” to watch the lecturer [live video], dialog box reply [information download] to obtain the PPT of this article.

Today’s lecture is divided into two parts. The first part shares the brief introduction of ofo’s PaaS platform based on Kubernetes. The second part is about the case of container landing.

Containerized platform

At present, there are many container cloud platform products in China, but considering the urgency of our own needs, we abandoned OpenShift and adopted independent development, mainly based on the PaaS code, autonomous controllability and optimization and customization of the underlying Kubernetes/Docker.

System Resource Dashboard allows you to view the overall resource usage of a cluster and nodes. Kubernetes cluster, Namespace management is to do the related Kubernetes cluster and Namespace management. Ruly PaaS allows you to manage multiple Kubernetes clusters. Currently Ruly PaaS manages four Kubernetes clusters, including domestic and overseas production/test clusters.

Containerized CI/CD. The underlying API service developed based on Jenkins Jnlp CI/CD is implemented. Git code is packaged into Docker image image, which is automatically deployed in Kubernetes designated cluster as container application. Application services and distributed task management are implemented based on the requirements of business container implementation. Unified configuration Management is an application configuration service based on Kubernetes ConfigMap. It mainly manages configuration files for online services, such as Nginx configuration files and sidecar Flume Agent configuration files for collecting service logs.

At the bottom are resource audit/account group management, and operational audit. We will group accounts by role QA, operations, and different R&D teams. Students in each relevant department can only access the specified application. Operation auditing In addition to auditing basic operations in PaaS, container operations are also logged in PaaS WebShell.

PaaS architecture and main functions

This is the overall architecture of PaaS, working from the base to the top. From the very beginning, THE selection of infrastructure, container host machine, operating system selection and optimization, Kubernetes/Docker selection and related pressure measurement optimization. It also involves building the underlying infrastructure services. For example, a mirror repository cluster includes domestic and overseas mirror replicas.

We are currently doing the container infrastructure adaptation work of our own IDC room in parallel. The upper workflow layer, mainly CI/CD function part, is the main function module of Ruly PaaS. The code audit function is mainly integrated with SonarQube for business code scanning. Continuous delivery, default Kubernetes Rolling Update mode.

The O&M module can view events such as the online/expansion/health check failure /OOM of a specific service application through the application and Pod event list in PaaS, and make an alarm based on these events. Elastic scaling supports not only Kubernetes HPA container based expansion, but also dynamic expansion of cluster host nodes. The top layer, mainly to do service governance, APM link monitoring.

At present, the programming languages of application business mainly include Node 8, Go, PHP 7, Java 8, C++ and so on. In order to facilitate developers to quickly master Dockerfile preparation, the Dockerfile template abstracted from these programming languages. Developers can select corresponding templates to complete corresponding template parameters, such as deployment paths and whether service logs can be collected. You can also write a complete Dockerfile by hand. Application type, currently supported are Kuberneetes Deployment and Statefulset. The default services are mainly stateless applications, and a small number of services are stateful applications.

Online applications can be configured to specify the number of container copies, CPU and memory resource limits, environment variables, etc. For Node 8 or Go services, environment variables can be specified for test or production clusters. The business finds the corresponding configuration based on it. The health check policy uses Kubernetes standard HTTP GET and TCP port probing to customize SHELL command lines for container health check. There are some configuration files for application services to be online. Ruly PaaS imports the configuration file to the configuration center of PaaS. Associate the application with the configuration item in CI/CD.

Operation/Development/QA students enter the specified COMMIT ID to build and deploy during online deployment. The image Tag constructed at the bottom layer consists of the auto-added ID, cluster ID, and commit ID. After the image is built, upload it to the image repository and request Ruly PaaS CDAPI to deploy it to the specified Kubernetes cluster.

In addition to the default RollingUpdate mode, the online deployment mode also supports manual deployment: After deploying a new container, check current service logs and manually confirm one container or other copies in the PaaS. Canary: In the deployment mode, the same number of copies are deployed for services of the new version, and all online traffic is transferred to the service container of the new version.

After online services are containerized, an automatic expansion or shrinkage policy is configured. The threshold is set based on some experience values of CPU and memory, and the threshold is adjusted after observing the load for a period of time, including the number of copies.

Provides the usage of cluster, Namespace, project and other dimension resources to provide reference for the lifting and scaling of containers or hosts.

This is mainly containerized landing. Then there are some customization requirements, such as some configuration center functions introduced earlier, based on Kubernetes ConfigMap implementation, with added version control to facilitate rollback.

Based on Kubernetes SVC to achieve. Most HTTP /gRPC services are exported as layer 4 services to ensure optimal performance. The Kubernetes SVC spec External TrafficPolicy: Local mode is used instead of SNAT for client IP addresses.

First, the best and most stable performance. Second, the back-end container can obtain the real IP address of the client and define the service for the specified application. After exporting the service IP address and port, downstream services only need to access the service IP address and port, regardless of the number of back-end containers. We have received feedback from the community about the stability issues with SVC SNAT. Let’s talk about the solutions we adopted.

This problem was found in the basic test of the model selection in December last year: a single concurrent short connection to the SVC backend Nginx container pressure test, about 2 minutes later there will be 1 to 2 requests of 1 to 3 seconds, and the client will have 5** response code. Through packet capture analysis in the container and the host computer, we found that the Socket tuple5 tuple overlapped after SNAT on the host computer to the container, resulting in TCP retransmission. Use 4-layer LB as Kubernetes SVC. External Traffic Policy:Local is used for all exposed services.

Our QA and security students needed a solution that could simulate real traffic on the line and replicate it to the test container cluster. The security team needs to perform online traffic bypass listening and traffic replication. Our current internal version is V0.1-alpha, which is based on the Envoy implementation and has some drawbacks: it can’t double the traffic, due to the introduction of a layer of Envoy proxies, performance is lost, it can’t do full traffic replication, only one or more Pod traffic replication.

This is a feature implemented from PaaS 0.1 for developers to connect containers and do real-time debugging. This view is similar to the original ECS deployment. It is used to view online real-time log output and process status.

This is done based on Kubernetes’ native CronJobs and Jobs, which run distributed tasks.

For example, TCP Listen Backlog requires the privileged container pattern to configure system parameters for the container. In addition, privileged containers can run Strace and similar diagnostic tools to perform diagnostic analysis of the container process performance.

Ruly PaaS’s support for Kubernetes initContainers combines safety considerations with performance optimization considerations. Privileged containers that can modify the system parameters of the container itself (SYSCTL), and run some strace-like diagnostic tools.

A business container, itself at risk in privileged container mode, follows the principle of minimum permissions. Our approach is to execute the system parameter optimization script at the initContainers stage within the business container. After execution, the actual business is integrated in this optimized system environment. The business container no longer has to use the privileged pattern.

The solution adopted to adapt existing self-registered architecture services during business containerization migration. This architecture service is based on Zookeeper or Spring Cloud Eureka for service discovery. When a service is started, it registers its IP address and port in a service registry for downstream access. Because the downstream business is not immediately containerized during the migration, it still runs on the original ECS host. This requires starting in host network mode without changing the business service code. In this mode, IP addresses and ports registered by services can be directly accessed by downstream non-containerized services.

Hardware system selection, most of the 16 core /32 G virtual machine as Kubernetes node. With a lot of backend business containerized, we are also using some 32 core /64 GB virtual machines.

The minimum standard of OS related optimization is the single machine support TCP millions of concurrent, with the advantage of the kernel itself, mainly optimize some kernel parameter configuration: / ect/sysctl. Conf, / etc/security/limits the conf, TCP, iptables, arp table, handle several parameters such as limit (net. Nf_conntrack_max, net. Core. Somaxconn, Fs file – Max,.net. Ipv4. Neigh. Default. Gc_thresh *, etc.), to the limits. The conf file mainly is to optimize the root file handle number (nofiles), processes (nproc), etc. This makes it easier for the container’s WWW permission applications (NodeJS/Go/Java) to use the base configuration for maximum performance. For the optimization of the host operating system, we wrote an optimization script to unify the package as the image of the host operating system.

We include the production of basic images (localization, such as zone 8 EAST). In terms of performance optimization, we also package the optimization script inside the container into the base image for easy execution in the initContainers stage. The base image for running the Go/Node/Java business is based on Alpine Linux3.7, 3.8 customization.

There are other optimizations, such as the Go program’s default TLS handshake, which requires a local certificate CA environment by default. The general way is to install a ca-certificates package, container after Go business TLS handshake, scan all the certificate files in the package, about more than 30 file IO operations. In fact, only one CA certificate is required, so we remove all other useless certificates from the base image.

CentOS 7.3 is mainly used to run complex C++ services and PHP 7 business scenarios. Use the Strip tool to crop the compiled binaries to keep mirroring to a minimum. This base image will be replaced with Alpine Linux later.

Currently, the native Kube-DNS is used for container DNS resolution. The initial method is HPA automatic scaling. The automatic capacity reduction will increase the service response time and cause dependent DNS services to time out. The current approach is kube-dnS-AutoScaler, which can be scaled up according to the number of CPU cores in the whole cluster and the number of pods. In terms of optimization, we mainly increase cache-size, set NEP-TTL, set a cache for invalid DNS, and increase THE CPU/memory of DNS process.

Automatic log clearing is deployed in the form of DaemonSet. When containerized services go online, environment variables in PaaS are configured with log paths to be cleared and log file retention duration. We will periodically clear service logs.

For GO, the MPG model of GO itself starts specific worker threads based on GOMAXPROCS values. However, after GO is containerized, the worker thread is started according to the number of CPU cores in the host machine. Instead of actually allocating the number of CPU cores to the container, more CPU contention can occur in some business scenarios. For example, high IOWait, when we set the environment variable GOMAXPROCS in PaaS. This value can more than double the QPS of our business and reduce the average response time by one third or more by specifying a specific number of allocated cores for the container. The CPU IOWAIT on the host also drops to normal levels. For Java business, its new version of the JDK 8 or 10 can provide some related parameters (such as: – XX: + UnlockExperimentalVMOptions), can intellisense cgroup CPU/memory limit.

Business containerization cases

Some of our container cases: API business is basically stateless, connecting to the APP side. There are user centers, location services, configuration services, and some gateway services. Developed based on Node/Go, gRPC is used internally to access the underlying dependent services.

GRPC service is equivalent to service after API service. Based on the current ZK service discovery mechanism, containers are required to register with ZK after being started, and they need to be de-registered when being destroyed. This involves the safe exit of the business container and the business code needs to do signal (SIGINT/SIGTERM) processing. This de-registration is consistent with previous ECS deployments, and no code changes are required after containerization.

Our long connection services include APP push service and IoT lock gateway access layer service. APP Push was fully containerized in early June. This includes the TCP access layer, the session layer, and the persistent storage layer at the bottom. At present, the number of concurrent connections is in the millions, and the access layer pushed by APP is carried by three 4-core 8G containers. The APP push distributed pressure tool is also deployed as a container.

Scheduled tasks, using distributed offline computing to do offline analysis of logs. Read distributed task fragments from Redis, download log files, and send them to Kafka/Hbase.

Real-time tasks and real-time log analysis utilize upstream message queues. Kafka, which processes messages in real time and publishes them downstream, has a similar message queue service. Other services are regular Daemon backend services that pull data formats, convert them and write them to another persistent store.

conclusion

Our Ruly PaaS now supports about 90% of the daily management of containers (online deployment, elastic scaling, monitoring, etc.) since its launch, and the resource utilization of containerized hosts has increased by about 3 times. In order to cope with the increasing stateful business and the containerization of enterprise micro-service architecture, we will integrate stateful storage management, ServiceMesh and other advanced functions in the Ruly PaaS iteration process. Thank you!

Wang Qiang/Ofo container Cloud R&D director

Over 14 years of experience in Internet, security, mutual finance and other development architecture, focusing on the design and implementation of infrastructure services.

END

Ofo is based on the practices of K8S container cloud platform

Related Posts

2018 cases done in summit start | witness the birth of 100 + selected technical case

2020.09.07- Remember this PPT experience

Google has released The Lever to share The best examples of machine learning applications