Today we will play chaostoolKit, an open source tool for chaos engineering.

Its goal is to provide a free, open, community-driven toolset and API.

Official source link: github.com/chaostoolki…

To understand this tool, you must know the main points mentioned in chaos engineering principles. As follows:

Remember the first point here, establishing the steady-state hypothesis.

Before running the tool, let’s take a look at its architecture.

ChaosToolkit operates your system under test via Drivers.

Its function points include the following:

Let’s set up the tools and play with them.

Context: CentOS7.8, k8s 1.19.5, example application

Python3 sudo yum install python3 python3-venv install pipenv gaolou@GaoMacPro ~ % pip3 install Pipenv install chaos toolkit Pip3 install -u Chaostoolkit pip3 install -u Chaostoolkit -kubernetes Pip3 install -u Chaostoolkit -reporting If you need to operate on other platforms, you can also install extensions.

Python3 -m venv. bundler source-bundler /bin/activate Python3 -m venv. bundler source-bundler /bin/activate

Above the installation process is performed on k8s master machine, if you are not on the k8s installed, you can configure the corresponding k8s context, the specific operation, please reference: chaostoolkit.org/drivers/kub…

The Chaos Discover test starts with the Discover command, chaostoolKit will generate a discovery.json file from the contents of./kube/config, which contains a collection of all the actions that can be performed on K8s. The result is as follows:

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos discover chaostoolkit-kubernetes [2021-06-23 12:18:07 INFO] Attempting to download and install package ‘chaostoolkit-kubernetes’ [2021-06-23 12:18:08 INFO] Package downloaded and installed in current environment [2021-06-23 12:18:09 INFO] Discovering capabilities from chaostoolkit-kubernetes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.actions [2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.probes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.actions [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.probes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.actions [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.probes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.pod.actions [2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.pod.probes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.replicaset.actions [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.actions [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.probes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.statefulset.actions [2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.statefulset.probes [2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.crd.actions [2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.crd.probes [2021-06-23 12:18:09 INFO] Discovery outcome Saved in./ Discovery.json (.bundler) [root@s5 Chaostoolkit_scenarios]# Chaos init generation test

Execute the initialization command to create a chaos experiment as prompted.

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos init You are about to create an experiment. This wizard will walk you through each step so that you can build the best experiment for your needs.

An experiment is made up of three elements:

  • a steady-state hypothesis [OPTIONAL]
  • an experimental method
  • a set of rollback activities [OPTIONAL]

Only the method is required. Also your experiment will not run unless you define at least one activity (probe or action) Within it Experiment’s title: E2 #

A steady state hypothesis defines what ‘normality’ looks like in your system The steady state hypothesis is a collection of conditions that are used, at the beginning of an experiment, to decide if the system is in a recognised ‘normal’ state. The steady state conditions are then used again when your experiment is complete to detect where your system may have deviated in an interesting, weakness-detecting way

Initially you may not know what your steady state hypothesis is and so instead you might create an experiment without one This is why the stead state hypothesis is optional. Do you want to define a steady state hypothesis now? [Y /N]: Y # creates the steady-state Hypothesis. Please note that this is an important concept in chaos engineering, but it is not seen in most other chaos tools

You may now define probes that will determine the steady-state of your system. Add an activity

  1. all_microservices_healthy
  2. deployment_is_fully_available
  3. deployment_is_not_fully_available
  4. microservice_available_and_healthy
  5. microservice_is_not_available
  6. read_microservices_logs
  7. service_endpoint_is_initialized
  8. count_pods
  9. pod_is_not_available
  10. pods_in_conditions
  11. pods_in_phase
  12. pods_not_in_phase
  13. read_pod_logs
  14. statefulset_fully_available
  15. statefulset_not_fully_available
  16. get_cluster_custom_object
  17. get_custom_object
  18. list_cluster_custom_objects
  19. list_custom_objects

Activity (0 to escape): 1 # Select the steady-state hypothesis. In short, this is to create an expected result

!!!!!!!!! DEPRECATED!!!

  1. kill_microservice
  2. remove_service_endpoint

Do you want to use this probe? [y/N]: y # Determines whether to use the probe selected above

A steady-state probe requires a tolerance value, within which your system is in a reognised normal state.

What is the tolerance for this probe? : normal

You now need to fill the arguments for this activity. Default values will be shown between brackets. You may simply press return to use it or not set any value. Argument’s value for ‘ns’ [default]: Chaosnamespace # Do you want to select another activity? [y/N]: y # Add an activity

  1. all_microservices_healthy
  2. deployment_is_fully_available
  3. deployment_is_not_fully_available
  4. kill_microservice
  5. microservice_available_and_healthy
  6. microservice_is_not_available
  7. read_microservices_logs
  8. service_endpoint_is_initialized
  9. count_pods
  10. pod_is_not_available
  11. pods_in_conditions
  12. pods_in_phase
  13. pods_not_in_phase
  14. read_pod_logs
  15. statefulset_fully_available
  16. statefulset_not_fully_available
  17. get_cluster_custom_object
  18. get_custom_object
  19. list_cluster_custom_objects
  20. list_custom_objects

Activity (0 to escape): 1 # Select specific action

!!!!!!!!! DEPRECATED!!! Do you want to use this probe? [y/N]: y # confirm to use the action selected above

You now need to fill the arguments for this activity. Default values will be shown between brackets. You may simply press return to use it or not set any value. Argument’s value for ‘ns’ [default]: Do you want to select another activity? [y/N]: N # Whether to add another experimental action, I won’t add it here

An experiment’s method contains actions and probes. Actions vary real-world events in your system to determine if your steady-state hypothesis is maintained when those events occur.

An experimental method can also contain probes to gather additional information about your system as your method is executed. Do you want to define an experimental method? [y/N]: y # select a test specific method to Add an activity

  1. kill_microservice

  2. remove_service_endpoint

  3. scale_microservice

  4. start_microservice

  5. all_microservices_healthy

  6. deployment_is_fully_available

  7. deployment_is_not_fully_available

  8. microservice_available_and_healthy

  9. microservice_is_not_available

  10. read_microservices_logs

  11. service_endpoint_is_initialized

  12. create_deployment

  13. delete_deployment

  14. scale_deployment

  15. deployment_available_and_healthy

  16. deployment_fully_available

  17. deployment_not_fully_available

  18. cordon_node

  19. create_node

  20. delete_nodes

  21. drain_nodes

  22. uncordon_node

  23. get_nodes

  24. delete_pods

  25. exec_in_pods

  26. terminate_pods

  27. count_pods

  28. pod_is_not_available

  29. pods_in_conditions

  30. pods_in_phase

  31. pods_not_in_phase

  32. read_pod_logs

  33. delete_replica_set

  34. create_service_endpoint

  35. delete_service

  36. service_is_initialized

  37. create_statefulset

  38. remove_statefulset

  39. scale_statefulset

  40. statefulset_fully_available

  41. statefulset_not_fully_available

  42. create_cluster_custom_object

  43. create_custom_object

  44. delete_cluster_custom_object

  45. delete_custom_object

  46. patch_cluster_custom_object

  47. patch_custom_object

  48. replace_cluster_custom_object

  49. replace_custom_object

  50. get_cluster_custom_object

  51. get_custom_object

  52. list_cluster_custom_objects

  53. list_custom_objects

Activity (0 to escape): 24 # Here I select the 24th method: Delete a POD

!!!!!!!!! DEPRECATED!!! Do you want to use this action? [y/N]: y # confirm selection

You now need to fill the arguments for this activity. Default values will be shown between brackets. You may simply Press return to use it or not set any value. Argument’s value for ‘name’: DeleteRedisPOD

Argument’s value for ‘ns’ [default]: Argument’s value for ‘label_selector’ [name in ({name})]: App =redis # Enter the tag of the object to operate, so that you can find the object to operate Do you want to select another activity? [y/N]: N # Whether to add another action, I won’t add it here

An experiment may optionally define a set of remedial actions that are used to rollback the system to a given state. Do you want to add some rollbacks now? [y/N]: N # delete redis POD, because k8s will automatically pull up, so I don’t need to scroll back

Json ‘# generated test file (.bundler) [root@s5 chaostoolkit_scenarios]#

Chaos Run Example (.bundler) [root@s5 Chaostoolkit_scenarios]# Chaos Run Experiment. Json [2021-06-28 23:03:23 INFO] Validating the experiment’s syntax [2021-06-28 23:03:24 INFO] Experiment looks valid [2021-06-28 23:03:24 INFO] Running experiment: E2 [2021-06-28 23:03:24 INFO] Steady-state strategy: default [2021-06-28 23:03:24 INFO] Rollbacks strategy: default [2021-06-28 23:03:24 INFO] Steady state hypothesis: H2 [2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy [2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead [2021-06-28 23:03:24 INFO] Steady state hypothesis is met! [2021-06-28 23:03:24 INFO] Playing your experiment’s method now… [2021-06-28 23:03:24 INFO] Action: delete_pods [2021-06-28 23:03:24 INFO] Steady state hypothesis: H2 [2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy [2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead [2021-06-28 23:03:24 INFO] Steady state hypothesis is met! [2021-06-28 23:03:24 INFO] Let’s rollback… [2021-06-28 23:03:24 INFO] No declared rollbacks, let’s move on. [2021-06-28 23:03:24 INFO] Experiment ended with status: Completed (.bundler) [root@s5 Chaostoolkit_scenarios]# Check results before performing tests:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ………………………

Redis-master-b96c9795b-nqzmr 1/1 Running 0 3d9h 10.100.220.84s6 redis-slave-6b8d456947- 6r42K 1/1 Running 0 3d9h 10.100.220.86s6 redis-slave-6b8d456947-z55m5 1/1 Running 0 3d9h 10.100.53.206s7

After the test:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ………………………….

redis-master-b96c9795b-92rc6 0/1 ContainerCreating 0 3s s6

Redis-master-b96c9795b-nqzmr 0/1 Terminating 0 3d9h 10.100.220.84s6 Redis-slave-6b8d456947 -5m2xt 0/1 ContainerCreating 0 2s s6 redis-slave-6b8d456947-6r42K 1/1 Terminating 0 3d9h 10.100.220.86s6 redis-slave-6b8d456947-fj4xc 0/1 ContainerCreating 0 3s s7 Redis-slave-6b8d456947-z55m5 1/1 Terminating 0 3D9h 10.100.53.206s7

When POD is fully started:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

.

Redis-master-b96c9795b-92rc6 1/1 Running 0 5m43s 10.100.220.89s6

Redis-slave-6b8d456947-5m2xt 1/1 Running 0 5m42s 10.100.220.90s6

Redis-slave-6b8d456947 – fj4xC 1/1 Running 0 5m43s 10.100.53.211s7

[root@s5 ~]#

As you can see from the above results, the test was successfully executed and several Redispods were killed and pulled up by k8S.

Today we write this one experiment, and you can follow the same steps to generate other experiments.