A,background

Recently, I implemented the process of DevOps(integration of development, operation and maintenance) in the company and migrated the original Docker Swarm cluster to K8S cluster. In case of any inconsistency between the health check of the original cluster and the health check of the existing cluster, here is a summary.

Second,Principle and Necessity

Application health check, just as its name implies, is to the application current situation (including database, caching, and Socket connection) for inspection, so that the cluster scheduling manager found application situation of survival, to control for the application (rescheduling distribution, can restart, scheduling, etc.) to other nodes, especially for high stability requirements (business interruption), Flexible deployment mechanism (support for rolling release, without interrupting the application during deployment, thus not affecting business usage)

Three,practice

1. The original DockerSwarm cluster health check mechanism worked by specifying the Docker Health Check address in the Dockerfile as follows:

Health check

HEALTHCHECK –interval=120s –timeout=5s CMD curl –fail http://localhost:8080/api/pub… || exit 1

2. Application/API/Public/Health/Check logic

(1) Controller



If the test fails, you need to return a non-200 error code in HttpServletResponse for the cluster to determine the HTTP call exception

(2) Service processing



2. Now new K8S cluster health check configuration



(1) Delete the health check calling code in the original Dockerfile, this method will not take effect in the new K8S cluster

(2) Configuration as shown in the figure aboveThe ready stateCheck the strategy

This policy can be used to determine whether the application has been started and can handle business normally. It can be configured in seconds according to the application startup situation

(3) Configure the survival status check policy as shown in the figure above. This policy configuration can be used for cluster to judge whether the application is alive and can handle business normally. It can be configured in seconds according to the actual situation of the application