This article is published by netease Cloud.

In recent years, container technology has been increasingly mature and applied. Docker, as a representative of container technology, is also developing rapidly at present, and various docker-based applications are becoming popular. Meanwhile, Docker has also brought an impact on the traditional operation and maintenance system. In the process of building the operation and maintenance platform, we also need to face and solve problems related to containers. The operation and maintenance of Docker is a system, and the monitoring system, as an important part of the operation and maintenance system, needs to be considered in the operation and maintenance process of Docker. This paper introduces an automatic monitoring implementation method for Docker container, aiming to provide relevant solutions for the establishment of Docker operation and maintenance system.

The container

When it comes to containers, the first thing that comes to mind is LXC (Linux Container). It is a kernel virtualization technology, is a virtualization of resources on the operating system level. Before Docker came along, there were companies using LXC technology. The use of container technology greatly improves the utilization of resources and reduces the cost. The direct use of LXC is a little complicated, and there is a certain threshold for enterprises to embrace container technology. It can be said that the emergence of Docker has changed this situation. Docker encapsulates the complex technology at the bottom of the container, greatly reducing the complexity of use, thus lowering the threshold of using container technology. Docker provides some basic specifications and interfaces. As long as users are familiar with Docker interfaces, they can easily play with container technology. It can be said that Docker has greatly accelerated the use and popularity of container technology, and is even regarded as the industry container specification.

Container monitoring

Containers differ from common VMS in virtualization degree and monitoring methods. A VIRTUAL machine can be treated as a physical machine, while a container can also be treated as a virtual machine, but this does not conform to the concept of using containers. In the implementation of monitoring, we tend to think of the container as a series of process trees on the host. In the process of mainstream monitoring system implementation, agent module is generally required to be deployed on the target machine for data collection. According to the concept of container use, it is generally not recommended to bind agent in container image. Of course, this does not mean that data cannot be collected. According to the characteristics of the virtualization technology of containers, it is completely feasible to collect data on the host of containers, and it can be more efficient. Of course, if the container is treated as a VIRTUAL machine and agent module is deployed on it to collect monitoring data, it is also a method, but this is not recommended. We can see that some Docker monitoring schemes have emerged in the industry, such as Docker Stats, CAdvisor, Scout, etc., are also used to monitor containers on the host. The monitoring scheme proposed in this paper will also start from the host machine.




Common container monitoring problems

With the application of Docker, the industry has also emerged a lot of monitoring tools, these tools can actually monitor the Docker container. Using these tools to build a monitoring system to use, is also able to solve some basic needs. However, there are two main problems in analyzing these monitoring tools.

1. Degree of integration with the operation and maintenance system

These tools are basically independent and difficult to integrate with other systems in the operation and maintenance system. Today, with the continuous development of operation and maintenance automation, more attention is often paid to the integration of the whole system. Therefore, it is necessary to have a better idea of modeling to facilitate the data opening between systems.

2. Monitoring levels

These tools usually monitor only the level of a container, such as the CPU, disk I/O, and so on. However, most application design architectures have a certain node fault tolerance capability, and the problems of a single node often cannot reflect the real problems of the application. So monitoring needs to cover more layers.

Model the container monitoring scheme

Here we propose a model monitoring scheme as a whole. This scheme is beneficial to get through with CMDB system based on operation and maintenance, and at the same time can take into account more levels of monitoring. Monitoring system generally involves: data acquisition, data storage, data analysis and alarm, data display and other parts. This paper will describe a model monitoring method, mainly proposed the following five models:

1 Monitor the object model

Here we will use a product tree structure to model monitoring objects. The monitored objects are divided into four categories: products, applications, clusters, and nodes. ○ Product: It is a high-level concept. A product can be exported independently to provide external services. ○ Application: A product consists of modules. Multiple applications form a product. ○ Cluster: Indicates an application. Multiple clusters can be deployed for an application based on the environment and region. ○ Node: Resources that bear services in a cluster, including the servers, VMS, and containers previously mentioned.




In this way, our monitoring data collection and view display can be based on the hierarchical monitoring object of the product tree. Each monitoring object can have user-defined monitoring items or inherit upper-layer monitoring items. At the same time, the hierarchical monitoring objects can reflect the running status and problems of the system from various levels while organizing the monitoring structure well. For example, we need to monitor a Docker-based application named myDocker. You can set up the following monitoring model: ○ Product: my_Docker_product ○ Application: my_Docker_app ○ Cluster: my_Docker_cluster ○ Node: my_Docker_container

2 Collector Model

The module is mainly used to collect data and also meets the data output specification. In order to facilitate parsing and have better data structure display, we can use Json format as the data specification. The data needs to match the corresponding data model semantically. For example, a collector for a node model can be a script that captures the output of the script execution to get data for the corresponding data model. The collector of the upper node is generally based on some calculations of the node data model, including sum, AVG, Max, min, etc., which generally reflect some aggregated data of the nodes in the whole cluster. For example, a simple collector model looks like this:




3 Data Model

Used to define the monitoring data format, the model includes data items and indicators. A data item generally contains one or more indicators. The data in the data model comes from the corresponding collector.




For example, you can monitor the following models for cpus: Data items: CPU indicators: USR, SYS,idle

4 alarm rule model

Based on the data model, alarm model can be set for each data index item. For example, if an alarm is triggered when the idle CPU is less than 50%, the following rule can be established: CPU. Idle < 50

5 View Model

This model associates the data model with the view. Contains data presentation definitions, such as trend charts, tables, etc. You can combine data items and indicators in the data model to describe the view display mode of specific data indicators. Views on different monitoring objects can display monitoring at different levels. The view model is described in XML format as follows:



This model represents a CPU trend graph based on usr and SYS metrics. The following is an example:




6 Monitoring item model

Monitoring item model, including collector model, data model, alarm rule model, view model, etc. By applying monitoring items to monitored objects. In this way, the monitored object can be monitored by custom modeling.

Container monitoring overall architecture

After the model is complete, the whole monitoring item needs to solve the monitoring item delivery, data collection, data analysis alarm, storage and other problems. Here we introduce a distributed monitoring framework to tie together the entire model. The frame is shown as follows:



The basic functions of each module are described as follows: ○ Agent: collects node monitoring data ○ Master: the control center of the Agent that configures monitoring items for the Agent. ○ Monitor: Receives monitoring data collected by the Agent and stores the data in the Kafka message queue. ○ Analyser: Subscribe To Kafka for column messages, data analysis and processing, storage and alarm. ○ Web: monitor the management of the model and display the view. ○ Kafka: message queue, cache data collection, subscription to other modules. ○ DB/HBase: configures storage models and monitors data. This architecture is a common monitoring model architecture and is relatively easy to get through with the operation and maintenance system. This model can be used as we implement container monitoring.

Collect container monitoring data

Data acquisition is the most difference between Docker monitoring and general monitoring system implementation process. Because in Docker container, agent module without data collection cannot directly rely on agent for collection.

1. Node data

On the container host, we can get a lot of basic data about the container. There are generally the following methods.

Using the Docker command

The method of Docker Stats is relatively simple, but the data is not comprehensive, we can see the following effect.




Based on the Linux file system

This is a recommended and high-performance data acquisition method. In Linux, /proc, /sys and other system directories record useful monitoring data. This is where we get most of the system-level, process-level running data, including CPU, disk IO, etc. For example, if we want to get the CPU usage of a process, we can calculate it in the following way.




2. data collection

The data of the cluster is calculated according to the raw data on each node. It is a kind of aggregation operation, which generally includes calculation scenarios such as SUM and AVG.

3. Application and product data

Similarly, application and product data can be calculated from child node data.

Automation of monitoring

Due to the nature of containers, the destruction and creation of containers is a common scenario. After a container is started, how to detect the monitoring system, and what data models need to be collected for it, these problems are the monitoring automation process needs to solve.

1. Self-discovery of containers

Container creation, stop, or destruction is perceptible on the host. It can be obtained from the following directory. Due to different Docker installation and configuration, or different Docker file systems, some directories may be inconsistent, but the actual acquisition strategies are similar.




2. Automatically associate containers with monitored objects

Containers, as nodes, need to be associated with the cluster before they can be integrated into the monitoring system. Here we can automatically associate containers with clusters using a mapping match between the mirror name and the cluster name. Through the configuration file in the following container directory, we can obtain the details of the container, which contains Image is the Image name used by the container.



When a container is associated with a cluster, item configurations can be monitored automatically. After the configuration is delivered to the Agent on the container host through the master, data collection and reporting for the container can be performed to automatically monitor the container.

conclusion

This paper presents a model container monitoring scheme. By modeling the monitoring object and monitoring process, the whole monitoring scene is driven based on the model, and the main implementation method of the scheme is described. Compared with the existing container monitoring implementation, this scheme has better flexibility and scalability. Through the improvement and expansion of the model, the monitoring of Docker container can be easily integrated into the existing monitoring, operation and maintenance system. Monitoring system itself is a very complicated system. Many details of the scheme described in this paper have not been fully developed, and there may be some limitations and inadequacies in the establishment of the model, which need to be improved gradually. It is hoped that this paper can provide some references for readers in the process of developing monitoring system and constructing operation and maintenance system.


Understand netease Cloud:

The official website of netease Cloud is www.163yun.com/

New user package: www.163yun.com/gift

Netease Cloud community: sq.163yun.com/