Container Cluster Management

  • choreography
  • scheduling
  • Access infrastructure management
  • Computing resources
  • Network resources
  • The powerful storage resource K8S relies on its good design concept and abstraction, attracting more and more developers to invest in the K8S community, and the number of companies running K8S as infrastructure services is also gradually increasing.

In terms of design philosophy, k8S only communicates with ETCD (storage) through APIServer, while other components maintain state in memory and persist data through APIServer. Management component actions are triggered based on level-based rather than edge-based, and actions are performed according to the current and expected state of resources. K8s adopts a layered design, based on various abstract interfaces, with different plug-ins to meet different needs.

On the abstract side, different workload serves different applications, such as Deployment for stateless applications, StatefulSet for stateful applications, etc. In terms of access management, services decouple Service visitors and providers within the cluster, and Ingress provides access management from outside the cluster to within the cluster.

Although k8S has a good design concept and abstraction, the steep learning curve and incomplete development data greatly increase the difficulty of application development.

This sharing will be based on the author’s development practice, take MySQL on K8S as an example, describe how to develop highly reliable applications based on K8S, as far as possible to abstract the best practices, reduce the cost of developing highly reliable applications based on K8S.

MySQL on k8s

The design and development of applications cannot be separated from business requirements. The requirements for MySQL applications are as follows:

  1. High data reliability
  2. High service availability
  3. Easy to use
  4. Easy operations

In order to achieve the above requirements, k8S and applications need to work together, that is, to develop highly reliable applications based on K8S, both K8S-related knowledge and knowledge in the application field are needed.

The solutions are analyzed below based on the above requirements.

1. Highly reliable data

The reliability of data generally depends on the following aspects:

  • redundant
  • Backup/Restore

We use Percona XtraDB Cluster as the MySQL Cluster solution, which is a multi-master MySQL architecture and realizes real-time data synchronization between instances based on Galera Replication technology. This clustering solution avoids data loss that may occur during the master/slave switchover in a master-slave cluster, and further improves data reliability.

In terms of backup, we use Xtrabackup as abackup/recovery scheme to achieve hot backup of data, which does not affect users’ normal access to the cluster during backup.

In addition to providing scheduled backup, we also provide manual backup to meet service data backup requirements.

2. The service is highly available

Data link and control link are analyzed here.

“Data link” is the link for users to access MySQL service. We use the MySQL cluster scheme of three master nodes to provide access to users through TLB (the four-layer load balancing service developed by Qiniu). TLB not only implements load balancing on the access layer for MySQL instances, but also implements health check on services. The TLB automatically removes abnormal nodes and adds the nodes when the nodes are recovered. The diagram below:

Based on the above MySQL cluster solution and TLB, the failure of one or two nodes does not affect users’ normal access to the MySQL cluster, ensuring the high availability of MySQL services.

The control link is the management link of the MySQL cluster. It is divided into two layers:

  • Global control management
  • Control management of each MySQL cluster Global control management is responsible for creating/deleting a cluster, managing the state of all MySQL clusters, etc., based on the concept of Operator. Each MySQL cluster has a controller that is responsible for “task scheduling”, “health check”, and “automatic troubleshooting” of the cluster.

This disassembly transfers the management of each cluster to each cluster, reduces the interference of control links between clusters, and reduces the pressure on the global controller. The diagram below:

Here is a brief introduction to the concept and implementation of Operator.

Operator is a concept developed by CoreOS to create, configure, and manage complex applications. It consists of two parts: Resource

  • Custom Resources
  • Provide a simple way for users to describe the expected Controller of the service
  • To create the Resource
  • Listen for Resource changes to fulfill user expectations for the service

The working process is as follows:

That is:

  1. Registers a CustomResource (CR) resource
  2. Listen for changes to CR Objects
  3. The user performs CREATE, UPDATE, and DELETE operations on the CR resource
  4. Trigger the corresponding handler for processing

Based on practice, we abstract the development Operator as follows:

On the basis of the above abstraction, Qiniu provides a simple Operator framework, which transparently creates CR and listens to CR events, making the development of Operator much easier.

We developed MySQL Operator and MySQL Data Operator to handle “create/delete cluster” and “manual backup/restore” tasks, respectively.

Because each MySQL cluster has multiple types of task logic, such as “data backup”, “data recovery”, “health detection” and “automatic fault handling”, the concurrent execution of these logics may cause exceptions, so the task scheduler is needed to coordinate the execution of tasks, and the Controller plays the role of this aspect:

Through Controller and all kinds of workers, each MySQL cluster realizes self-operation and maintenance.

In terms of “health detection”, we have implemented two mechanisms:

  • Passive detection
  • Active check Passive check is when each MySQL instance reports its health status to the Controller. Active check is when the Controller requests the health status of each MySQL instance. These two mechanisms complement each other to improve the reliability and timeliness of health testing.

For health test data, both Controller and Operator are used, as shown in the following figure:

The Controller uses health check data to detect and troubleshoot MySQL cluster exceptions in a timely manner. Therefore, the Controller needs accurate and timely health status information. It maintains the state of all MySQL instances in memory, updating the instance state based on the results of “active” and “passive” checks and processing accordingly.

The Operator uses the health check data to report the running status of the MySQL cluster to the outside world and to intervene in the troubleshooting of the MySQL cluster when the Controller is abnormal.

In practice, the frequency of health checks is relatively high, resulting in a large number of health states. If each health state is persisted, both Operator and APIServer will be under great access pressure. Since these health states are meaningful only for the latest data, the health states to be reported to the Operator are inserted at the Controller level into a Queue of limited capacity. When the Queue is full, the old health states are discarded.

When the Controller detects that the MySQL cluster is abnormal, it automatically handles the fault.

Define troubleshooting principles:

  • Don’t lose data
  • As little impact on usability as possible
  • Automatically handle known and manageable faults
  • Unknown faults that cannot be handled are not automatically handled. Manual intervention During troubleshooting, the following key problems occur:
  • What are the fault types
  • How to detect and sense faults in a timely manner
  • Whether the current fault occurs
  • Indicates the type of the fault
  • To address the above key issues, we have defined three levels of cluster state:

Green

  • Available for external service
  • The number of running nodes meets the expected Yellow number
  • Available for external service
  • The number of running nodes does not meet the expectation Red
  • No external service

For each mysqld node, the following states are defined:

Green

  • The node is running
  • Yellow node in MySQL cluster
  • The node is running
  • The node is not red-clean in the MySQL cluster
  • The node gracefully exits red-unclean
  • The node exits Unknown ungracefully
  • The node status is unknown

After collecting the status of all MySQL nodes, the Controller calculates the status of the MySQL cluster based on the status of these nodes. If the status of the MySQL cluster is not Green, the troubleshooting logic is triggered and the troubleshooting logic is based on the known troubleshooting solution. If the fault type is unknown, handle the fault manually. The whole process is shown as follows:

Because each application has different fault scenarios and solutions, the specific troubleshooting methods are not described here.

3. Easy to use

We implemented a highly reliable MySQL service based on the Operator concept, defining two types of resources for the user, namely QiniuMySQL and QiniuMySQLData. The former describes user configuration of the MySQL cluster, while the latter describes manual data backup/recovery tasks. QiniuMySQL is used as an example.

You can use the following simple YAML files to trigger the creation of MySQL clusters:

After a cluster is created, you can view the status field of the CR Object to obtain the cluster status:

Here’s another concept: Helm.

Helm is a package management tool for K8S that standardizes the delivery, deployment and use of K8S applications by packaging them as Chart.

Chart is essentially a collection of K8S YAML files and parameter files so that the application can be delivered through a Chart file. On the Helm, you can use Chart to deploy and upgrade applications with one click.

Due to space reasons and the universality of Helm operation, the specific use process is not described here.

4. Easy operations

In addition to the above implementation of “health detection” and “automatic fault handling” and the delivery and deployment of applications managed by Helm, there are also the following issues to be considered in the o&M process:

  • Monitoring/Alarm
  • Log management

We use Prometheus + Grafana for monitoring/alerting services, which expose metric data to Prometheus via the HTTP API and pull it periodically from Prometheus Server. Developers visualized monitoring data from Prometheus in Grafana, set alarm lines in the monitoring graph according to the understanding of monitoring charts and applications, and realized the alarm by Grafana.

This way of visualizing the monitoring before alarms greatly enhances the understanding of application operating characteristics, identifies indicators and alarm lines that need to be paid attention to, and reduces the number of invalid alarms.

In the development, we realize the communication between services through gRPC. In the gRPC ecosystem, an open source project called Go-GrPC-Prometheus enables monitoring of all RPC requests from the gRPC Server by inserting a few simple lines of code into the service.

For containerized services, log management includes log collection and log rolling.

We print the service logs to syslog, and then somehow transfer the syslogs to the container’s STdout /stderr, so that external logs can be collected in a normal way. In addition, the Logrotate function is configured in syslogs to automatically roll logs, preventing service exceptions caused by logs occupying the disk space of containers.

In order to improve development efficiency, we use https://github.com/phusion/baseimage-docker as a base image, including built-in a syslog and lograte services, application only care about the log into the syslog can, Do not worry about log collection and log scrolling.

summary

Based on the above description, the complete MySQL application architecture is as follows:

In the process of developing highly reliable MySQL applications based on K8S, with the in-depth understanding of K8S and MySQL, we continue to abstract, and gradually implement the following general logic and best practices in the way of modules:

  • Operator Development Framework
  • Health testing service
  • Automatic fault handling service
  • Task scheduling service
  • Configuration Management Service
  • Monitoring service
  • The log service
  • Etc. With the modularization of these common logic and best practices, developers can quickly “build” k8S-related interactions when developing new k8S-BASED highly reliable applications that are highly reliable from the start because they have already applied best practices. At the same time, developers can shift their focus from the steep learning curve of K8S to the application itself, enhancing the reliability of services from the application itself.

People say

This column is dedicated to the discovery of the minds of technologists, including technical practices, technical artifacts, technical insights, growth tips, and anything worth discovering. We hope to gather the best technical people to dig out the original, sharp and contemporary sound.