Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

What is a ChaosToolkit?

Today we are going to play ChaosToolkit, an open source tool for Chaos Engineering, whose goal is to provide a free, open, community-driven toolset and API.

Official source link: github.com/chaostoolki…

To understand this tool, you must know the main points mentioned in chaos engineering principles. As follows:

Remember the first point here, establishing the steady-state hypothesis.

Before running the tool, let’s take a look at its architecture.

ChaosToolkit operates your system under test via Drivers.

Its function points include the following:

The experiment to prepare

Let’s set up the tools and play with them.

Environment Description:

  1. CentOS7.8
  2. K8s 1.19.5
  3. The sample application

Install python3

sudo yum install python3 python3-venv
Copy the code

Install pipenv

gaolou@GaoMacPro ~ % pip3 install pipenv
Copy the code

Install the K8S extension and reporting module of chaos-Toolkit

pip3 install -U chaostoolkit
pip3 install -U chaostoolkit-kubernetes
pip3 install -U chaostoolkit-reporting
Copy the code

If you need to operate on other platforms, you can also install extensions.

Creating a Virtual Environment

python3 -m venv .bundler
source .bundler/bin/activate
Copy the code

In order not to affect other environments, we will use Python’s virtual environment here.

Note: the above installation process is performed on k8s master machine, if you are not on the k8s installed, you can configure the corresponding k8s context, the specific operation, please reference: chaostoolkit.org/drivers/kub… .

The experiment in field

Chaos Discover experiment

Json file, which contains a collection of all actions that can be performed on K8s./kube/config. The result is as follows:

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos discover chaostoolkit-kubernetes
[2021-06-23 12:18:07 INFO] Attempting to download and install package 'chaostoolkit-kubernetes'
[2021-06-23 12:18:08 INFO] Package downloaded and installed in current environment
[2021-06-23 12:18:09 INFO] Discovering capabilities from chaostoolkit-kubernetes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.pod.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.pod.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.replicaset.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.statefulset.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.statefulset.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.crd.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.crd.probes
[2021-06-23 12:18:09 INFO] Discovery outcome saved in ./discovery.json
(.bundler) [root@s5 chaostoolkit_scenarios]#
Copy the code

Chaos Init generation test

Execute the initialization command to create a chaos experiment as prompted.

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos init
You are about to create an experiment.
This wizard will walk you through each step so that you can build
the best experiment for your needs.

An experiment is made up of three elements:
- a steady-state hypothesis [OPTIONAL]
- an experimental method
- a set of rollback activities [OPTIONAL]

Only the method is required. Also your experiment will
not run unless you define at least one activity (probe or action)
within it
ExperimentA steady state hypothesis defines what is known between men and womennormality' looks like in your system
The steady state hypothesis is a collection of conditions that are used,
at the beginning of an experiment, to decide if the system is in a recognised
'normal' state. The steady state conditions are then used again when your experiment is complete to detect where your system may have deviated in an interesting, weakness-detecting way Initially you may not know what your steady state hypothesis is and so instead you might create an experiment without one This is why the stead state hypothesis is optional. Do you want to define a steady state hypothesis now? [y/N]: y # creates the steady-state Hypothesis. Note that this is an important concept in chaos engineering, but this step is not seen in most other chaos tools.s title: H2

You may now define probes that will determine
the steady-state of your system.
Add an activity
1) all_microservices_healthy
2) deployment_is_fully_available
3) deployment_is_not_fully_available
4) microservice_available_and_healthy
5) microservice_is_not_available
6) read_microservices_logs
7) service_endpoint_is_initialized
8) count_pods
9) pod_is_not_available
10) pods_in_conditions
11) pods_in_phase
12) pods_not_in_phase
13) read_pod_logs
14) statefulset_fully_available
15) statefulset_not_fully_available
16) get_cluster_custom_object
17) get_custom_object
18) list_cluster_custom_objects
19) list_custom_objects
Activity (0 to escape): 1 # Select the steady-state hypothesis. In a nutshell, this is to create an expected result!!!!!!!!! DEPRECATED!!! 1) kill_microservice 2) remove_service_endpoint Do you want to use this probe? [y/N]: yDetermine whether to use the probe selected above

A steady-state probe requires a tolerance value, within which
your system is in a reognised `normal` state.

What is the tolerance forthis probe? : normal You now need to fill the argumentsfor this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.
Argument's value for 'ns'[default]: chaosNamespace # Enter k8s namespace to operate Do you want to select another activity? [y/N]: Add an activity 1) all_microServices_HEALTHY 2) deployment_is_FULly_available 3) deployment_is_not_fully_available 1) kill_microservice 4) microservice_available_and_healthy 5) microservice_is_not_available 6) read_microservices_logs 7) service_endpoint_is_initialized 8) count_pods 9) pod_is_not_available 10) pods_in_conditions 11) pods_in_phase 12) pods_not_in_phase 13) read_pod_logs 14) statefulset_fully_available 15) statefulset_not_fully_available 16) get_cluster_custom_object 17) get_custom_object 18) List_cluster_custom_objects 19) list_Custom_Objects Activity (0 to escape): 1 # Select specific action!! DEPRECATED!!! Do you want to use this probe? [y/N]: You now need to fill the arguments for this activity. Default values will be shown between brackets  may simply press return to use it or not set any value. Argument's value for 'ns' [default]:
Do you want to select another activity? [y/N]: N # Whether to add another experimental action, I won't add it here

An experiment's method contains actions and probes. Actions vary real-world events in your system to determine if your steady-state hypothesis is maintained when those events occur. An experimental method can also contain probes to gather additional information about your system as your method is executed. Do you want to define an experimental method? [y/N]: Y # Add an activity 1) kill_microservice 2) remove_service_endpoint 3) scale_microservice 4) start_microservice 5) all_microservices_healthy 6) deployment_is_fully_available 7) deployment_is_not_fully_available 8)  microservice_available_and_healthy 9) microservice_is_not_available 10) read_microservices_logs 11) service_endpoint_is_initialized 12) create_deployment 13) delete_deployment 14) scale_deployment 15) deployment_available_and_healthy 16) deployment_fully_available 17) deployment_not_fully_available 18) cordon_node 19) create_node 20) delete_nodes 21) drain_nodes 22) uncordon_node 23) get_nodes 24) delete_pods 25) exec_in_pods 26) terminate_pods 27) count_pods 28) pod_is_not_available 29) pods_in_conditions 30) pods_in_phase 31) pods_not_in_phase 32) read_pod_logs 33) delete_replica_set 34) create_service_endpoint 35) delete_service 36) service_is_initialized 37) create_statefulset 38) remove_statefulset 39) scale_statefulset 40) statefulset_fully_available 41) statefulset_not_fully_available 42) create_cluster_custom_object 43) create_custom_object 44) delete_cluster_custom_object 45) delete_custom_object 46) patch_cluster_custom_object 47) patch_custom_object 48) replace_cluster_custom_object 49) replace_custom_object 50) get_cluster_custom_object 51) get_custom_object 52) List_cluster_custom_objects 53) list_Custom_objects Activity (0 to escape): 24 # DEPRECATED!!! Do you want to use this action? [y/N]: Y # confirm select You now need to fill the arguments for this activity. Default values will be shown between brackets simply press return to use it or not set any value. Argument's value for 'name': DeleteRedisPOD Give this method a name
Argument's value for 'ns'[default]: chaosNamespace # specify k8S namespace Argument's value for 'label_selector' [name in ({name})]: app=redis Enter the label of the object to operate on so that you can find the object to operate on
Do you want to select another activity? [y/N]: N # Whether to add another action, I won't add it here

An experiment may optionally define a set of remedial actions
that are used to rollback the system to a given state.

Do you want to add some rollbacks now? [y/N]: N I need to remove the POD from Redis, because K8s will pull up automatically, so I don't need to scroll back

Experiment created and saved in './experiment.json' # Generated the test file

(.bundler) [root@s5 chaostoolkit_scenarios]#
Copy the code

Chaos Run execution cases

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos run experiment.json
[2021-06-28 23:03:23 INFO] Validating the experiment's syntax [2021-06-28 23:03:24 INFO] Experiment looks valid [2021-06-28 23:03:24 INFO] Running experiment: E2 [2021-06-28 23:03:24 INFO] Steady-state strategy: default [2021-06-28 23:03:24 INFO] Rollbacks strategy: default [2021-06-28 23:03:24 INFO] Steady state hypothesis: H2 [2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy [2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead [2021-06-28 23:03:24 INFO] Steady state hypothesis is met! [2021-06-28 23:03:24 INFO] Playing your experiment's method now...
[2021-06-28 23:03:24 INFO] Action: delete_pods
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next         releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Let's rollback...
[2021-06-28 23:03:24 INFO] No declared rollbacks, let's move on.
[2021-06-28 23:03:24 INFO] Experiment ended with status: completed
(.bundler) [root@s5 chaostoolkit_scenarios]#
Copy the code

Check the result

Before the test: [root@s5 ~]# kubectl get pods -n chaosnamespace -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ........................... Redis-master-b96c9795b-nqzmr 1/1 Running 0 3d9h 10.100.220.84s6 <none> <none> Redis-slave-6b8d456947-6r42K 1/1 Running 0 3d9h 10.100.220.86s6 <none> <none> Redis-slave-6b8d456947 -z55m5 1/1 Running 0 3d9h 10.100.53.206s7 <none> <none> After the test: [root@s5 ~]# kubectl get pods -n chaosnamespace -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ............................... redis-master-b96c9795b-92rc6 0/1 ContainerCreating 0 3s <none> s6 <none> <none> redis-master-b96c9795b-nqzmr 0/1 Terminating 0 3d9h 10.100.220.84s6 <none> < None > Redis -slave-6b8d456947-5m2xt 0/1 ContainerCreating 0 2s <none> s6 <none> <none> Redis-slave-6b8d456947-6r42k 1/1 Terminating 0 3d9h 10.100.220.86s6 <none> <none> redis-slave-6b8d456947-fj4xc 0/1 ContainerCreating 0 3s <none> s7 <none> <none> redis-slave-6b8d456947-z55m5 1/1 Terminating 0 3d9h 10.100.53.206s7 < None > < None > POD fully started: [root@s5 ~]# kubectl get pods -n chaosnamespace -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ....................... Redis-master-b96c9795b-92rc6 1/1 Running 0 5m43s 10.100.220.89s6 <none> <none> redis-slave-6b8d456947-5m2xt 1/1 Running 0 5m43s 10.100.220.89s6 <none> <none> Redis-slave-6b8d456947-5m2xt 1/1 Running 0 5m42s 10.100.220.90s6 <none> <none> redis-slave-6b8d456947-fj4xc 1/1 Running 0 5m43s 10.100.53.11s7 <none> <none> [root@s5 ~]#
Copy the code

As you can see from the above results, the test was successfully executed and several Redispods were killed and pulled up by k8S.

summary

Today we write this one experiment, and you can follow the same steps to generate other experiments.