This is the seventh day of my participation in the August More Text Challenge. For details, see: August More Text Challenge

1 introduction

This article describes how to configure liP, Readiness, and Startup probes for containers.

Kubelet uses the LiVENESS detector to know when to restart the container. For example, the LiVENESS detector can catch deadlocks (where the application is running but cannot continue with the next step). Restarting the container in such cases helps make the application more usable in case of a problem.

Using the Readiness detector, Kubelet can know when a container is ready and can start receiving request traffic. A Pod can only be considered ready when all the containers in a Pod are ready. One use of this signal is to control which Pod is used as the back end of the Service. Pods will be removed from Service load balancers before they are ready.

Kubelet uses the Startup probe to know when the application container is started. When such probes are configured, you can control how containers perform LIVENESS and Readiness checks after successful startup to ensure that these alive, ready probes do not affect the startup of the application. This can be used to detect the viability of slow start containers, preventing them from being killed before the start run.

Define a LIVENESS probe

Many long-running applications eventually transition to a disconnected state that cannot be recovered unless restarted. Kubernetes provides liVENESS detectors to detect and remedy this situation.

Create a Pod that runs a container based on the k8s.gcr. IO/BusyBox image. The configuration file is as follows. File name: exec-lipS.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
Copy the code

In the configuration file, you can see that there is only one container in the Pod. The periodSeconds field specifies that Kubelet should perform a survival check every 5 seconds. The initialDelaySeconds field tells Kubelet to wait 5 seconds before executing the first probe. Kubelet runs the cat/TMP /healthy command in the container to check. If the command succeeds and returns a value of 0, Kubelet considers the container alive and healthy. If this command returns a non-zero value, Kubelet will kill the container and restart it. Run the following command

/bin/sh -c "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"
Copy the code

For the first 30 seconds of the container’s life, the/TMP /healthy file exists. The cat/TMP /healthy command returns a success code. 30 seconds later, the cat/TMP /healthy command returns a failure code.

Create a Pod

# kubectl apply -f /root/k8s-example/probe/exec-liveness.yaml
Copy the code

Within 30 seconds, view the Pod’s events

kubectl describe pod liveness-exec
Copy the code

The output shows that no survival detectors have failed yet

Events:
  Type    Reason     Age        From                 Message
  ----    ------     ----       ----                 -------
  Normal  Scheduled  <unknown>  default-scheduler    Successfully assigned default/liveness-exec to k8s-node04
  Normal  Pulled     22s        kubelet, k8s-node04  Container image "k8s.gcr.io/busybox" already present on machine
  Normal  Created    22s        kubelet, k8s-node04  Created container liveness
  Normal  Started    22s        kubelet, k8s-node04  Started container liveness
Copy the code

30 seconds later, let’s look at the Pod event:

kubectl describe pod liveness-exec
Copy the code

At the bottom of the output, there is information that the survival detector failed, the container was killed and rebuilt.

Events:
  Type     Reason     Age               From                 Message
  ----     ------     ----              ----                 -------
  Normal   Scheduled  <unknown>         default-scheduler    Successfully assigned default/liveness-exec to k8s-node04
  Normal   Pulled     47s               kubelet, k8s-node04  Container image "k8s.gcr.io/busybox" already present on machine
  Normal   Created    47s               kubelet, k8s-node04  Created container liveness
  Normal   Started    47s               kubelet, k8s-node04  Started container liveness
  Warning  Unhealthy  5s (x3 over 15s)  kubelet, k8s-node04  Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    5s                kubelet, k8s-node04  Container liveness failed liveness probe, will be restarted
Copy the code

Wait another 30 seconds and check that the container is restarted:

kubectl get pod liveness-exec
NAME            READY   STATUS    RESTARTS   AGE
liveness-exec   1/1     Running   2          3m10s
Copy the code

Look again at Pod resource details:

kubectl describe pod liveness-exec
Copy the code

The following output indicates that the container is successfully restarted.

Events:
  Type     Reason     Age                 From                 Message
  ----     ------     ----                ----                 -------
  Normal   Scheduled  <unknown>           default-scheduler    Successfully assigned default/liveness-exec to k8s-node04
  Warning  Unhealthy  35s (x6 over 2m)    kubelet, k8s-node04  Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    35s (x2 over 110s)  kubelet, k8s-node04  Container liveness failed liveness probe, will be restarted
  Normal   Pulled     5s (x3 over 2m32s)  kubelet, k8s-node04  Container image "k8s.gcr.io/busybox" already present on machine
  Normal   Created    5s (x3 over 2m32s)  kubelet, k8s-node04  Created container liveness
  Normal   Started    5s (x3 over 2m32s)  kubelet, k8s-node04  Started container liveness
Copy the code

Define a viable HTTP request interface

Another type of LIVENESS detection is using HTTP GET requests. Here is a Pod configuration file that runs a container based on the K8s.gcr. IO/liVENESS image.

Create a Pod

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: X-Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3
Copy the code

In the configuration file, there is only one container in Pod. The periodSeconds field specifies that Kubelet performs the detection every three seconds. The initialDelaySeconds field tells Kubelet to wait 3 seconds before performing the first probe. Kubelet performs the probe by sending an HTTP GET request to the service running in the container that is listening on port 8080. If the handler on the service /healthz path returns a success code. Kubelet believed that the container was alive and well. If the handler returns a failure code, Kubelet kills the container and restarts it.

Any return code greater than or equal to 200 and less than 400 indicates success. Any other return code indicates failure.

You can see the source code for the service, server.go, here.

For the first 10 seconds of container life, the/HEALTHz handler returns a status code of 200. The handler then returns a status code of 500.

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    duration := time.Now().Sub(started)
    if duration.Seconds() > 10 {
        w.WriteHeader(500)
        w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    } else {
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
})
Copy the code

Kubelet starts performing health checks 3 seconds after the container starts. So the first few health checks were successful. But after 10 seconds, the health check will fail, and Kubelet will kill the container and restart it.

# kubectl apply -f /root/k8s-example/probe/http-liveness.yaml
Copy the code

After 10 seconds, the survival detector has failed and the container has been restarted by looking at the Pod event.

Events:
  Type     Reason     Age              From                 Message
  ----     ------     ----             ----                 -------
  Normal   Scheduled  <unknown>        default-scheduler    Successfully assigned default/liveness-http to k8s-node01
  Normal   Pulled     17s              kubelet, k8s-node01  Container image "k8s.gcr.io/liveness" already present on machine
  Normal   Created    17s              kubelet, k8s-node01  Created container liveness
  Normal   Started    16s              kubelet, k8s-node01  Started container liveness
  Warning  Unhealthy  1s (x2 over 4s)  kubelet, k8s-node01  Liveness probe failed: HTTP probe failed with statuscode: 500
Copy the code

4 Define TCP survival detection

The third type of LIP detection is using TCP sockets. Through configuration, Kubelet will attempt to establish a socket link between the specified port and the container. If a link can be established, the container is considered healthy; if not, the container is considered problematic.

Create a Pod. File name: TCP-liP-Readiness. Yaml

apiVersion: v1 kind: Pod metadata: name: goproxy labels: app: goproxy spec: containers: - name: goproxy image: K8s.gcr. IO/goProxy :0.1 ports: -ContainerPort: 8080 readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20Copy the code

The configuration of TCP detection is very similar to that of HTTP detection. The following example uses both readiness and survival detectors. Kubelet sends the first ready probe 5 seconds after the container starts. This will attempt to connect to port 8080 of the GoProxy container. If the probe succeeds, the Pod is marked ready and Kubelet continues to run the probe every 10 seconds.

In addition to the Readiness probe, this configuration includes a LiVENESS probe. Kubelet performs the first LIVENESS detection 15 seconds after the container starts. Like the Readiness probe, an attempt is made to connect to port 8080 of the GoProxy container. If the survival probe fails, the container is restarted.

# kubectl apply -f /root/k8s-example/probe/tcp-liveness-readiness.yaml
Copy the code

After 15 seconds, detect the survival detector by looking at the Pod event:

# kubectl describe pod goproxy
Copy the code

Using named ports

Named container ports can be used for HTTP or TCP survivability checks.

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080
​
livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port
Copy the code

5 Use the startup probe to protect the slow start container

Sometimes, there are existing applications that require more initialization time to start up. Setting the LIVENESS probe parameter is tricky in this case without compromising the quick response to the probe deadlock. The trick is to set the startup probe with a command. For HTTP or TCP detection, you can set the failureThreshold * periodSeconds parameter to give it enough time to startup in bad case.

So, the previous example becomes:

ports:
- name: liveness-port
  containerPort: 8080
  hostPort: 8080
​
livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 1
  periodSeconds: 10
​
startupProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 30
  periodSeconds: 10
Copy the code

Thanks to the Startup probe, the application will have up to 5 minutes (30 * 10 = 300s) to complete its startup. Once the startup probe succeeds, the survival probe takes over probing the container, and can respond quickly to container deadlocks. If the startup probe is not successful, the container is killed after 300 seconds and the Pod state is set according to the restartPolicy.

6 definitionreadlinessThe detector

Sometimes, the application will be temporarily unable to provide communication services. For example, an application might need to load large data or configuration files at startup, or rely on waiting for external services after startup. In this case, you neither want to kill the application nor send requests to it. Kubernetes provides ready probes to detect and mitigate these situations. The Pod where the container resides reports unready information and does not accept traffic through the Kubernetes Service.

Note: The ready detector remains in operation for the entire life of the container.

The configuration of the ready detector is similar to that of the alive detector. The only difference is that you use the readinessProbe field instead of the livenessProbe field.

readinessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5
Copy the code

The READliness detector configuration for HTTP and TCP is the same as the configuration for the LiVENESS detector.

Readliness and liVENESS probes can be used in parallel on the same container. Using both ensures that traffic is not sent to unready containers and that containers are restarted if they fail.

7 Configuring a Detector

Probes have a number of configuration fields that can be used to precisely control the behavior of survivability and readiness detection:

  • InitialDelaySeconds: How many seconds to wait after the container starts before the alive and ready probe is initialized. The default is 0 seconds and the minimum is 0.
  • PeriodSeconds: Indicates the interval in seconds for performing the probe. The default is 10 seconds. The minimum is 1.
  • TimeoutSeconds: How many seconds to wait after the probe timeout. The default value is 1 second. The minimum is 1.
  • SuccessThreshold: The minimum number of consecutive successes at which a probe can be considered successful after a failure. The default value is 1. This value for the survival probe must be 1. The minimum is 1.
  • FailureThreshold: The number of Kubernetes retries when a Pod is started and a failure is detected. Abandonment in the case of survival detection means restarting the container. Aborted Pods in the case of ready probes are tagged as unready. The default value is 3. The minimum is 1.

HTTP probes can be configured with additional fields on httpGet:

  • Host: The host name used for the connection. The default is the IP of the Pod. You can also set “Host” in the HTTP header instead.
  • Scheme: Used to set the connection mode (HTTP or HTTPS) to the host. The default is HTTP.
  • Path: indicates the path for accessing the HTTP service.
  • HttpHeaders: User-defined HTTP headers in a request. HTTP header fields are allowed to repeat.
  • Port: specifies the port number or port name for accessing the container. If the number ranges from 1 to 65535.

For AN HTTP probe, Kubelet sends an HTTP request to the specified path and port to perform the probe. By default, Kubelet sends probes to the IP address of the Pod unless the host field in httpGet is set. If the Scheme field is set to HTTPS, Kubelet will send an HTTPS request without certificate verification. In most cases, you do not need to set the host field. Here’s a scenario where you need to set the host field. Suppose the container listens for 127.0.0.1 and the Pod’s hostNetwork field is set to true. Then the host field in httpGet should be set to 127.0.0.1. Perhaps more commonly, if the Pod depends on a virtual host, you should not set the host field, but instead set host in httpHeaders.

For a probe, Kubelet establishes a probe connection on the node (not in a Pod), which means you can’t configure the service name on the host parameter, because Kubelet can’t resolve the service name.