Author: Ding Yuan RadonDB test principal

Responsible for RadonDB cloud database, containerized database quality performance test, iterative verification. In-depth research on performance and high availability solutions including cloud databases and containerized databases.

Following up “Chaos Engineering Tool ChaosBlade Opeator Series Introduction”, this installment will use ChaosBlade Opeator to test the application scenarios of Node class resources, including:

  1. CPU Load Scenario
  2. Network Delay scenario
  3. Network packet loss scenario
  4. Kill Specifies the process
  5. Stop the specified process

| experimental environment

The test object

RadonDB MySQL container database based on KubeSphere platform was tested.

For details about how to deploy RadonDB MySQL, see Deploying the RadonDB MySQL Cluster in KubeSphere.

Environmental parameters

The name of the cluster The host type CPU Memory Total Disk Node Counts Replicate counts Shard counts
KubeSphere High availability type 8C 16G 500GB 4
RadonDB MySQL 4C 16G POD: 50G DataDir: 10 G 3 2 1

After the test environment is deployed, you can perform verification in the following five scenarios.

1. CPU load scenario

1.1 Test Objectives

Specify a node to perform 80% CPU load verification.

1.2 Starting tests

Set yamL test parameter values.

apiVersion: chaosblade.io/v1alpha1 kind: ChaosBlade metadata: name: cpu-lode spec: experiments: - scope: node target: CPU action: fulllode desc: "increase node CPU load by names" # - "worker-s001" # Test object Node name - name: cpu-percent value: "80" # node load percentage - name: IP value:192.168.0.20 # Node load percentageCopy the code

Select a node and modify itnode_cpu_load.yamlThe names value in.

1.3 Test and Verification

On the Node Node, run the top command to see that the CPU of the Node reaches 80% of the load.

2. Network delay scenario

2.1 Test Preparations

Log in to the Node, run the ifconfig command to view the nic information, and specify the default nic name to eth0.

2.2 Test Objectives

Add 3000 ms access latency to the specified node, worker-s001, and the latency will fluctuate by 1000 ms.

2.3 Starting tests

Select a node and modify itdelay_node_network_by_names.yamlThe names value in. rightworker-s001The packet loss rate of node access is 100%.

Start testing.

kubectl apply -f delay_node_network_by_names.yaml
Copy the code

View experimental status.

kubectl get blade delay-node-network-by-names -o json
Copy the code

2.4 Test and Verification

Access the Guestbook from the node.

$ time echo "" | telnet 192.168.0.18
echo ""  0.00s user 0.00s system 35% cpu 0.003 total
telnet 192.168.1.129 32436  0.01s user 0.00s system 0% cpu 3.248 total
Copy the code

Stop testing. You can delete the test process or simply delete the Blade resource.

kubectl delete -f delay_node_network_by_names.yaml

kubectl delete blade delay-node-network-by-names
Copy the code

3. Network packet loss scenario

3.1 Test Objective

The packet loss rate of the specified node is 100%.

3.2 Starting a Test

Select a node and change the names value in loss_node_network_by_names.yaml.

Run the following command to start the test.

$ kubectl apply -f loss_node_network_by_names.yaml
Copy the code

Run the following command to view the experiment status.

kubectl get blade loss-node-network-by-names -o json
Copy the code

3.3 Test and Verification

The port is the Guestbook NodePort port. The port does not respond to the request for accessing experiments, but the port that is not enabled for accessing experiments can be used normally.

Obtain the node IP address.

$ kubectl get node -o wide
Copy the code

Access to Guestbook from the experimental node – Inaccessible.

$Telnet 192.168.0.20Copy the code

Access Guestbook from a non-experimental node – Normal access.

$Telnet 192.168.0.18Copy the code

In addition, you can access the address directly from the browser and verify the test results.

Stop testing. You can delete the test process or simply delete the Blade resource.

kubectl delete -f delay_node_network_by_names.yaml

kubectl delete blade delay-node-network-by-names
Copy the code

4. Kill Specifies the process

4.1 Test Objective

Example Delete the MySQL process on the specified node.

4.2 Starting a Test

Select a node and modify itkill_node_process_by_names.yamlThe names value in.

Run the following command to start the test.

$ kubectl apply -f kill_node_process_by_names.yaml
Copy the code

Run the following command to view the experiment status.

kubectl get blade kill-node-process-by-names -o json
Copy the code

4.3 Test and Verification

Enter the experimental node.

$SSH 192.168.0.18Copy the code

Check the mysql process number.

$ ps -ef | grep mysql
root     10913 10040  0 14:10 pts/0    00:00:00 grep --color=auto mysql
Copy the code

You can see that the process number has changed.

$ ps -ef | grep mysql
Copy the code

The MySQL process number changes, indicating that it is restarted after being killed.

Stop testing. You can delete the test process or simply delete the Blade resource.

kubectl delete -f delay_node_network_by_names.yaml
kubectl delete blade delay-node-network-by-names
Copy the code

5. Stop the specified process

5.1 Test Objectives

Suspends the MySQL process on the specified node.

5.2 Starting the Test

Select a node and change the names value in stop_node_process_by_names.yaml.

Run the following command to start the test.

$ kubectl apply -f stop_node_process_by_names.yaml
Copy the code

Run the following command to view the experiment status.

kubectl get blade stop-node-process-by-names -o json
Copy the code

5.3 Test and Verification

Enter the experimental node.

$SSH 192.168.0.18Copy the code

Check the mysql process number.

$ ps -ef | grep mysql
root     10913 10040  0 14:10 pts/0    00:00:00 grep --color=auto mysql
Copy the code

You can see that the process number has changed.

$ ps -ef | grep mysql
Copy the code

The MySQL process number changes, indicating that it is restarted after being killed.

Stop testing. You can delete the test process or simply delete the Blade resource.

kubectl delete -f delay_node_network_by_names.yaml
kubectl delete blade delay-node-network-by-names
Copy the code

| epilogue

By using ChaosBlade Operator to conduct chaos engineering experiment on KubeSphere Node resources, the following conclusions can be drawn:

For Node nodes, ChaosBlade still has simple configuration and operation to complete complex experiments. It can realize complex failures of various Node levels through free combination to verify the stability and availability of Kubernetes cluster. At the same time, when the real fault comes, because it has already simulated a variety of fault situations, you can quickly locate the source of the fault, do not panic, easy to deal with the fault.

Next up

The next installment will use the deployed ChaosBlade Opeator tool to test and validate various scenarios for poD-class resources.