Based on the practice of Docker continuous delivery platform construction

China Minmetals and Alibaba have jointly launched Wuagai, a professional steel service platform, to bring one-stop purchasing experience to end users by leveraging Alibaba’s strengths in big data, e-commerce platforms and Internet products technology. This paper is about how to explore and practice Docker container technology in the process of continuous delivery. At present, the operation and maintenance technology team of Wuagge has opened the release and deployment authority to the owner of application development, realizing the “one-stop” continuous delivery of 7*24 hours, and improving the delivery ability of the company’s r&d process as a whole.

preface

As a startup and as a DevOps engineer, I’ve come across this problem:

Hardware resource utilization may cause some costs to be wasted in website functions. In different service scenarios, such as computing, I/O, network, and memory, centralized application deployment may result in unreasonable resource utilization. For example, if the services deployed on a machine are memory intensive, CPU resources can easily be wasted.
More than a single physical machine application cannot be isolated, unable to effectively lead to application for resources and influence each other A physical machine to run multiple applications, can’t be used to limit the CPU, memory, processes, if an application to preempt resources problems, it could cause a chain reaction, eventually causing the site to some function is not available.
Complex environment, version management, the lack of on-line deployment process, increase the complexity of the screen Due to internal development process is not standard, the code in the process of testing or online, for some configuration items and adjustment of system parameters are random, incremental release when released, once appear problem, will cause the test code and online running of the code is not consistent, This increases the risk of online services and the difficulty of troubleshooting online services.
The environment is not stable, high migration cost, increase the risk of online In the development process there are multiple projects in parallel development and service depends on the problem, because of the complexity of the environment and the version is very high, can not be fleetly and migration a environment, lead to cannot be in the test environment simulation on line test process, many students in the online environment for testing, There is a high potential for risk and a decrease in development efficiency.
Traditional VMS and dedicated servers occupy large space, slow startup, and complex management. As a result, the startup of traditional VMS and dedicated servers takes a long time, and various management problems occur during the startup process.

Based on Docker container technology, the operation and Maintenance technical team developed the container cloud platform of Wuagai website. 95% of application services have been containerized through container cloud platform. These applications support business expansion on demand, second scale, provide user-friendly interaction process, standardize the release process of test and production, free the development and testing students from the basic environment configuration and release, and enable them to focus on their own project development and testing.

Combined with the practice of Wuaghi container cloud platform and Docker container technology, this paper first introduces how to achieve 7*24 hours “one-stop” continuous delivery and realize the online product. Information about the container cloud platform, pay attention to: https://zhuanlan.zhihu.com/idevops

Docker image standardization

Docker’s image is known to be layered. Conventions for mirror layering:

The first layer is the operating system layer, made up of CentOS/Alpine and other basic images, installation of some general basic components;
The second layer is the middleware layer. According to different applications, various middleware and dependent software packages need to be installed when they run, such as Nginx and Tomcat.
The third layer is the application layer, which contains only the packaged application code.

Figure 2: Docker image layering convention

Lesson learned: How to make your mirror smaller and PUSH faster?

Dockerfile builds application images. When encountering some software packages that need to be installed in the middleware layer, try to use package management tools (such as yum) or download source packages in git Clone mode for installation, in order to control the copy and installation of software packages in the same layer. After the software is successfully deployed, some useless RPM packages or source packages are removed to make the size of the base image smaller.
JDK software packages are not put into the image in the Java application image. JDK is deployed on each host. When running the image, the Java home directory on the host is mounted to the specified directory in the container. Because it’s going to stretch out the base image really big;
When building application images, Docker will cache these two layers and directly use them. Only the application layers with changed codes will be recreated, thus improving the construction speed of application images and the speed of pushing them to the image warehouse after successful construction, thus improving the application deployment efficiency from the overall process.

Orchestration management of containers

Selection of layout tools:

Rancher graphical management interface, simple deployment, convenient, can be integrated with AD, LDAP, GITHUB, based on user or user group access control, quickly upgrade the system’s layout tools to Kubernetes or Swarm, at the same time, professional technical team support, reduce the difficulty of getting started container technology.

Based on the above advantages, we choose Rancher as the scheduling tool of our container cloud platform. When conducting unified scheduling for application container instances, the docker-compose component can perform scheduling operations for multiple host computers at the same time. At the same time, when there are peaks and valleys in service access, the unique rancher-comemage. yml file is used to call the “SCALE” feature to dynamically expand and shrink the application cluster, so that applications can handle different requests according to requirements. https:/zhuanlan.zhihu.com/p/29093407

Container network model selection:

As the back-end development is based on the HSF framework of Ali, network reachabability is required between producers and consumers, which requires high requirements for network and requires real IP addresses to register and pull services. Therefore, when selecting the container network, we use the Host mode. During the container startup process, scripts will be executed to check the Host and assign an independent port to the container to avoid conflicts.

Continuous integration and continuous deployment

Continuous integration, monitoring code submission status, continuous integration of the code, the implementation of unit tests in the integration process, code Sonar and security tools for static scanning, the results will be notified to the development students at the same time deployment of the integration environment, After successful deployment trigger test automation (automated test later will update https://zhuanlan.zhihu.com/idevops).

Figure 7: Continuous integration diagram

Static scanning results:

Continuous deployment is the ability, the ability to quickly deploy a package where you want it to be, is very important. The platform is built and deployed in a distributed manner. The master manages multiple slave nodes, each of which belongs to a different environment. Install and update plug-ins on master, create jobs, and manage permissions for development teams. Slave Is used to execute jobs.

Figure 9: Continuous deployment architecture diagram

Based on the architecture in Figure 9 above, we define a process for continuous deployment specifications:

(1) Develop students to submit codes to GitLab; (2) Pull the project code and configuration item file to execute the compilation task; (3) Pull the basic image, put the compiled application package into the latest application image, and push it to the image warehouse; (4) Generate docker-comemess. yml file according to the current application and its owning environment, execute the rancher-compose command based on this file, and deploy the application image to the pre-delivery environment (the test environment before production is released, and the relevant configuration and service dependency are consistent with the production environment). (6) After the pre-delivery environment test is passed, the application image is deployed to the online environment, and the test results are notified to the backend test students.

Operation management of containers

Now that the application container has been deployed to an online environment, the following two issues need to be addressed throughout the lifecycle of the container:

(1) How to save run logs and other business logs generated by the application program; (2) how nginx can automatically detect changes in back-end services and complete configuration updates.

Log management

At runtime, the container creates a read-write layer on top of the read-only layer, where all writing to the application takes place. When the container is restarted, data (including logs) in the read/write layer is also erased. Although this problem can be solved by mounting the log directory in the container to the host computer, when the container frequently drifts between multiple host computers, each host will have part of the log with the application name, which increases the difficulty for the developer to view and troubleshoot problems.

To sum up, the log service platform serves as the log warehouse of Wuagge website, stores logs generated during application running in a unified manner, and supports multiple query operations.

You can configure the log collection path on the log management interface, deploy agent in the container to deliver application logsto LogStore in a unified manner, and configure full-text indexes and word segmentation in LogStore so that students can search and query desired log content using keywords.

Summary of experience: How to avoid the problem of repeated log collection?

Json configuration file, specify the absolute path to the checkpoint file, and mount the path to the directory on the host. Ensure that checkpoint files are not lost during container restart, and repeat collection problems do not occur.

Registration of services

Etcd is a highly available and consistent key-value repository that uses a tree structure similar to a file system and all data begins with a slash. Etcd data is divided into two types: key directories, where a single string value is stored under the key, and directories, where a collection of keys or other subdirectories are stored.

In a five brother environment, each application service registered with etCD has its root directory as

"/ ${APP_NAME} _ ${ENVIRONMENT}"Copy the code

A name. The root directory stores the Key information of each application instance, which is named IP − {PORT}.

The following figure shows the data structure of an application instance stored on ETCD using the above conventions:

You can see that I am using the GET method to send a request to etCD for the search service (search) deployed in the pre-delivery environment (PRE); In the root directory /search_PRE, information about only one application instance is stored. The key of this instance is 172.18.100.31-86. The value is 172.18.100.31:86. The registration process is as follows:

(1) Generate random ports for container applications through the code, compare with the ports being used by the host, and write the program configuration file after ensuring that the ports do not conflict; (2) Integrate the service registration tool written by Python and ETCD module into the script, and pass the IP address and the random port obtained in the last step to the service registration tool as parameters; ③ After the application program is completely started, the service registration tool will write the application instance into the ETCD cluster with the agreed data structure to complete the service registration work; (4) The container periodically sends heartbeat to the ETCD to report survival and refresh TTL time; ⑤ The container script captures the Singnal Terminal signal sent by Rancher to the application instance, and sends a DELETE request to ETCD to delete the data of the instance after receiving the signal.

Note: The active clearing function is added on the basis of TTL. When the service is released normally, the registration information on etCD can be cleared immediately without waiting for TTL time.

Lesson learned: When a container is restarted or accidentally destroyed, let’s take a look at what the container and registry do during this process.

Application during registration is the key to carry and value carries the TTL timeout property, is given when the service cluster instance after downtime, it have been registered information in the etcd failure, if not clear, the failure information will become garbage data be kept, and the SCM tool will make it as a normal data read out, Write to the web Server configuration file. To ensure that data stored in etCD is always valid, you need to have etCD actively release invalid instance information.

#! /usr/bin/env pythonImport etcDimport sys arg_l=sys.argv[1:] etcd_clt=etcd.Client(host='172.18.0.7')def set_key(key,value,ttl=10): try: return etcd_clt.write(key,value,ttl) except TypeError: print 'key or vlaue is null'def refresh_key(key,ttl=10): try: return etcd_clt.refresh(key,ttl) except TypeError: print 'key is null'def del_key(key): try: return etcd_clt.delete(key) except TypeError: print 'key is null'if arg_l: if len(arg_l) == 3: key,value,ttl=arg_l set_key(key,value,ttl) elif len(arg_l) == 2: key,ttl=arg_l refresh_key(key,ttl) elif len(arg_l) == 1: key=arg_l[0] del_key(key) else: raise TypeError,'Only three parameters are needed here'else: raise Exception('args is null')Copy the code

Discovery of services

Confd is a lightweight configuration management tool that supports ETCD as a back-end data source to read data from the data source to ensure that the local configuration file is up to date. Not only that, it can also check the validity of the configuration file syntax after the configuration file is updated to reload the application for the configuration to take effect. It should be noted here that although ConfD supports Rancher as a data source, we chose ETCD for ease of use and scalability reasons.

As with most deployments, we deployed ConfD on the ECS where the Web Server resides so that ConfD can update the configuration file and restart the program as soon as it detects data changes. The confd configuration file and template file are deployed in the default directory /etc/confd. The configuration file structure is as follows:

├── ├─ /etc/ exercises / ├── exercises /etc/ exercises/exercises/exercisesCopy the code

Confd. Toml is the main configuration file of ConfD, written in TOML format. Because our ETCD is clustered deployment with multiple nodes, and I do not want to make confD instructions stinking and long, SO I write interval, Nodes and other options into this configuration file.

The cond.d directory stores the web Server template configuration source file, which is also written in TOML format. This file is used to specify the application template configuration file path (SRC), application configuration file path (dest), and data source key information (keys).

The Templates directory houses the template configuration files for each application in the Web Server. It is written using the Text/Template language format supported by Go. After ConfD reads the latest application registration information from etCD, write it to the template configuration file with the following statement:

{{range getvs "/${APP_NAME}/*"}} server {{.}}; {{end}}Copy the code

Manages the ConfD process using Supervisor. After running, ConfD polls ETCD every 5 seconds. When the K/V of an application service is updated, ConfD reads the data stored in etCD and writes it into the template configuration file to generate the application configuration file. Finally, ConfD writes the configuration file to the target path. Reload the nginx program for the configuration to take effect. (please refer to the code: https://zhuanlan.zhihu.com/idevops)

conclusion

This paper is about how to explore and practice Docker container technology in the process of continuous delivery. At present, the operation and maintenance technology team of Wuaguo has opened the release and deployment authority to the owner of application development, realizing the “one-stop” continuous delivery of 7*24 hours, and improving the delivery ability of the company’s r&d process as a whole.

Next, we will continue to optimize the various scenarios encountered in the continuous delivery process, and gradually improve the container cloud platform. Meanwhile, we will continuously share the various functions of the container cloud platform, and summarize the experience and lessons, so as to provide some references for everyone in their work, so as to avoid repeated “detdettions”.

About the author: Liu Xiaoming, operation and maintenance technical director of Wuage.com, has 10 years of experience in Internet development and operation. It has been committed to the development of operation and maintenance tools and the promotion of operation and maintenance expert services, enabling development, and constantly improve the efficiency of research and development.

This article is from the programmer’s original article. To subscribe, please click here.

PS: We recommend an online live stream of container technology. The lecturers are 6 front-line experts from Tencent, Huawei, Cisco, 58.com, Mushroom Street and Dangdang. The topics cover the latest practices of container cloud, micro-services, Servicemesh and so on.