Source | alibaba cloud native public number

Alibaba’s open source chaos engineering project ChaosBlade passed THE CNCF TOC vote and smoothly promoted CNCF Sandbox. CNCF stands for Cloud Native Computing Foundation, which aims to build a sustainable ecosystem for Cloud Native software and serve the rapid growth of vendor-neutral open source projects. Kubernetes, Prometheus, Envoy, etc.

ChaosBlade Github github.com/chaosblade-…

Project introduction

ChaosBlade is alibaba’s 2019 open source chaos engineering project, including chaos engineering experimental tool ChaosBlade and chaos engineering platform ChaosBlade-Box, which aims to help enterprises solve the problem of high availability in cloud native process through chaos engineering. Chaosblade, an experimental tool, supports three system platforms and four programming languages, involving more than 200 experimental scenarios and more than 3,000 experimental parameters, enabling fine control of the experimental scope. Chaos engineering platform chaosblade-box supports experimental tool hosting, in addition to chaosblade hosting, also supports Litmuschaos experimental tool. More than 40 companies have registered to use it, including INDUSTRIAL and Commercial Bank of China, China Mobile, Xiaomi, JINGdong and other enterprises.

Core competence

ChaosBlade has the following features:

  • Rich experimental scenes: Including basic resources (CPU, memory, network, disk, process, kernel, and file), multi-language application services (Java, C++, NodeJS, and Golang), and Kubernetes platform (covering Container, Pod, and Node resource scenarios, including the above experiment scenarios).
  • Diversified execution methods: In addition to using the platform blank screen operation, it can also be executed by the blade tool, Kubectl, and coding.
  • Convenient scene extension ability: All experimental scenes follow the chaotic experimental model to achieve, and different levels of scenes corresponding to different actuators, simple implementation, easy to expand.
  • Automatic deployment of experimental tools: Experimental tools can be automatically deployed on hosts or clusters without manual deployment.
  • Open source experimental tool hosting: The platform can host mainstream experimental tools in the industry, such as chaosBlade and external LitmusChaos, etc.
  • Unified chaos experiment user interface: Users do not need to care about the use of different tools, chaos experiment in the unified user interface.
  • Multi-dimensional experiment: support from the host to Kubernetes resources, and then to the application dimension for experimental arrangement.
  • Integrated cloud native ecosystem: Helm deployment management, integrated Prometheus monitoring, support cloud native experimental tool hosting, etc.

Architecture design

Chaosblade-box architecture is as follows:

The console page can realize the automatic deployment of managed tools such as ChaosBlade and LitmusChaos, unify the experimental scene according to the chaos experimental model established by the community, divide the target resources according to the host, Kubernetes and applications, control by the target manager, create the page in the experiment, You can select a target resource with a blank screen. The platform called chaos experiment execution to execute experiment scenarios of different tools, and with Prometheus monitoring, the metric indicators of experiments could be observed, and abundant experiment reports would be provided later.

Chaosblade-box is also easy to deploy: _github.com/chaosblade-…

Customer case

The future planning

Based on cloud native, ChaosBlade will provide chaos engineering platform and experiment tool for multi-cluster, multi-environment and multi-language chaos engineering. The experimental tool will continue to focus on the richness and stability of experimental scenes, support more Kubernetes resource scenes and standard application service experimental scene standards, and provide multi-language experimental scene standard implementation. The chaos engineering platform focuses on simplifying the deployment and implementation of chaos engineering. In the future, it will host more chaos experimental tools and platforms compatible with mainstream, realize scene recommendation, provide business and system monitoring integration, output experimental reports, and complete the closed loop operation of chaos engineering on the basis of ease of use. We welcome you to join the community and jointly promote the development of chaos engineering field, and build highly available distributed systems in enterprises.