Author: Ding Yuan RadonDB test principal

Responsible for RadonDB cloud database, containerized database quality performance test, iterative verification. In-depth research on performance and high availability solutions including cloud databases and containerized databases.

Following the introduction and Node of the Chaos Engineering Tool ChaosBlade Opeator series. This issue will test the application scenarios of Pod resources, including:

  • Resource scenarios, such as removing a Pod
  • Network resource scenarios, such as network latency
  • File system exception scenario
  • Exception scenarios are not available

| experimental environment

The test object

RadonDB MySQL container database based on KubeSphere platform was tested.

For details about how to deploy RadonDB MySQL, see Deploying the RadonDB MySQL Cluster in KubeSphere.

Environmental parameters

The name of the cluster The host type CPU Memory Total Disk Node Counts Replicate counts Shard counts
KubeSphere High availability type 8C 16G 500GB 4
RadonDB MySQL 4C 16G POD: 50G DataDir: 10 G 3 2 1

After the test environment is deployed, you can perform verification in the following five scenarios.

1. Pod Resource Deletion scenario

1.1 Test Objectives

Delete Pod from ChaosBlade namespace labeled ChaosBlade -tool -NHzds.

1.2 Starting tests

Check the Pod status.

$ kubectl get pod chaosblade-tool-nhzds -n chaosblade  -w
Copy the code

View parameter information in delete_pod_by_allages. yaml.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: delete-two-pod-by-labelsspec:
  experiments:
  - scope: pod
    target: pod
    action: delete
    desc: "delete pod by labels"
    matchers:
    - name: labels
      value:
      - "demo-radondb-mysql-1"
    - name: namespace
      value:
      - "chaosblade"
    - name: evict-count
      value:
      - "2"
Copy the code

Create a terminal to delete the Pod.

$ kubectl apply -f delete_pod_by_labels.yaml
Copy the code

1.3 Test and Verification

View test status.

$ kubectl get blade delete-pod-by-labels -o json
Copy the code

View test results.

You can see that the Pod is deleted and restarted, as expected.

2. Pod network delay scenario

2.1 Test Objective

Pod network resource scenarios, such as network latency.

Add a 3000 ms access delay to the local 3306 port of demo-Radondb-mysqL-0 Pod in the ChaosBlade namespace, up or down by 1000 ms.

2.2 Starting the Test

Set parameter information in delay_pod_network_by_names.yaml.

apiVersion: chaosblade.io/v1alpha1 kind: ChaosBlade metadata: name: delay-pod-network-by-names spec: experiments: - scope: pod target: network action: delay desc: "delay pod network by names" # - "radondb-g4r992-radondb-postgresql-0" # test object pod name - name: namespace value: - "chaosblade" #namespace name: 6379-name: interface value: ["eth0"] # eth0 - name: time value: Name: offset value: ["1000"] # Change the delay time by 1000 msCopy the code

Save it as a file and deploy the application.

$ kubectl apply -f delay_pod_network_by_names.yaml
Copy the code

View the deployment status.

$ kubectl get blade delay-pod-network-by-names -o json
Copy the code

2.3 Test and Verification

Get the test Pod IP.

$ kubectl get pod -l app=redis,role=master -o jsonpath={.items.. status.podIP} $ kubectl get pod kubectl get pod demo-radondb-mysql-0 -o wideCopy the code

Enter the observation Pod.

$  kubectl exec -ti demo-radondb-mysql-1 /bin/bash
Copy the code

Install Telnet in Pod.

$ apt-get update && apt-get install -y telnet
Copy the code

Obtain test times and analyze test results.

$time echo "" | Telnet 10.10.131.182 3306Copy the code

It can be seen that the delay of accessing the experimental Pod 3306 port is about 3s, and the result is in line with expectations.

3. Pod network packet loss scenario

3.1 Test Objective

In the ChaosBlade namespace, the 100% injection packet loss rate for Demo-radondb-mysqL-0 Pod is only valid for pods with IP address 192.168.0.18. All pods except 192.168.0.18 can access demo-radondb-mysql-0.

3.2 Starting a Test

Run the command to deploy the application.

$ kubectl apply -f loss_pod_network_by_names.yaml
Copy the code

View the deployment status.

$ kubectl get blade loss-pod-network-by-names -o json
Copy the code

3.3 Test and Verification

Get the test Pod IP.

$ kubectl get pod -l app=redis,role=master -o \ jsonpath={.items.. Status. 10.42.69.44 podIP}Copy the code

Enter the observation Pod, IP is 10.42.69.42, and set the packet loss rate to 100%.

$ kubectl exec -it redis-slave-6dd975d4c8-lm8jz bash
Copy the code

Ping Tests Pod IP.

$ping 10.42.69.44 ping 10.42.69.44 (10.42.69.44) 56(84) bytes of dataCopy the code

The Ping response is not displayed.

Enter the observing Pod, which is not specified to drop packets.

$ kubectl exec -it redis-slave-6dd975d4c8-2zrkb bash
Copy the code

Ping the Pod IP again.

$ping 10.42.69.44 ping 10.42.69.44 (10.42.69.44) 56(84) bytes of data.64 bytes from 10.42.69.44: Icmp_seq =1 TTL =63 time=0.128 ms64 bytes from 10.42.69.44: icmp_seq=2 TTL =63 time=0.128 ms64 bytes from 10.42.69.44: Icmp_seq = 3 TTL = 63 time = 0.092 ms...Copy the code

The Ping response is normal. The test results are in line with expectations.

4. Pod FILE system I/O failure scenario

4.1 Preparing for the Test

  • Deployed chaosblade – admission – webhook
  • The faulty volume has been injectedmountPropagationHostToContainer.
  • Annotations the following annotations have been added to the Pod:
    • chaosblade/inject-volume: "data"Indicates the volume name of the fault to be injected
    • chaosblade/inject-volume-subpath: "conf" //volumeIs the mounted subdirectory

4.2 Test Objectives

Failed to inject file system I/O into Kubernetes’ Pod.

Note: This scenario requires the –webhook-enable parameter to be enabled. You can add –webhook-enable to the ChaosBlad Operator parameter or specify –set webhook.enable=true when deploying the database.

The ChaosBlade Webhook will inject fuse’s Sidecar container based on the Pod annotation:

  • chaosblade/inject-volumeSpecify the volume name for which the fault needs to be injected, such as data in the example
  • chaosblade/inject-volume-subpathSpecifies the subdirectory of the volume mount path
    • In the preceding example, the mount path of volume is/dataIn pod, the I/O exception is injected into/data/conf.
  • Specify the volume for which faults need to be injectedMountPropagation: HostToContainer

4.3 Starting the Test

Deploy the test Pod.

$ kubectl apply -f io-test-pod.yaml
Copy the code

Check whether sidecar is successfully injected.

$ kubectl get pod test-7c9fc6fd88-7lx6b -n chaosblade NAME READY STATUS RESTARTS AGE test-7c9fc6fd88-7lx6b 2/2 Running 0  4m8sCopy the code

View parameter information in pod_io.yaml.

apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
  name: inject-pod-by-labels
spec:
  experiments:
  - scope: pod
    target: pod
    action: IO
    desc: "Pod IO Exception by labels"
    matchers:
    - name: labels
      value:
      - "app=test"
    - name: namespace
      value:
      - "chaosblade"
    - name: method
      value:
      - "read"
    - name: delay
      value:
      - "1000"
    - name: path
      value:
      - ""
    - name: percent
      value:
      - "60"
    - name: errno
      value:
      - "28"
Copy the code

Run the command to deploy the application.

$ kubectl apply -f pod_io.yaml
Copy the code

4.4 Test and Verification

Enter the test Pod.

$ kubectl exec -it test-7c9fc6fd88-7lx6b bash
Copy the code

Reads files in the specified directory in Pod.

$ time cat /data/conf/test.yaml cat: read error: No space left on device real 0m3.007s user 0m0.002s sys 0m0.002s $time cat /data/conf/test.yaml 123 real 0m0.004s user 0m0.002s sys 0m0.000sCopy the code

The analysis file is incorrectly read. The result is as expected. Two exceptions are injected into the Read operation in the scenario, with an exception rate of 60%.

  • Add 1s delay to Read operation
  • Error 28 is returned for the Read operation

5. The Pod domain name access is abnormal

5.1 Test Objectives

The Pod cannot access the specified domain name.

5.2 Starting the Test

Obtain the Pod name and run the command to deploy the application.

$ kubectl apply -f dns_pod_network_by_names.yaml
Copy the code

View test status.

$ kubectl get blade dns-pod-network-by-names -o json
Copy the code

5.3 Test and Verification

Enter the test Pod.

$ kubectl exec -ti demo-radondb-mysql-0 bin/bash
Copy the code

Ping a domain name www.baidu.com

$ ping www.baidu.com
Copy the code

View and analyze test results.

The ping response is not displayed. You can see that the access to the specified domain name www.baidu.com is abnormal and the result is as expected.

| epilogue

By using ChaosBlade Operator to conduct chaos engineering tests on Kubernetes Pod resources, the following conclusions can be drawn:

For Pod resources, ChaosBlade’s operation is easy to understand and powerful. By simulating different failures, it can test the timeliness of system monitoring and alarm, and also test the system in case of failure, so as to improve the system architecture and increase availability.

This article simply tests each scenario, and each scenario has more than one way to test, and users can adjust the parameters for different tests.