• Top 10 Open Source Projects for SREs and DevOps
  • Written by Nir Sharma
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: keepmovingljzy
  • Proofreader: Kamly, 1autodidact

SREs and DevOps top 10 Open Source projects

Building scalable and highly available software systems is the ultimate goal of every SRE. You can follow the path of continuous learning with the help of our latest blog, which Outlines the most sought-after open source projects in monitoring, deployment, and maintenance. The way to be a successful SRE is to keep learning. SRE/DevOps has a number of outstanding open source projects, each with new and exciting implementations and often addressing unique challenges. These open source projects do the heavy lifting, so you can get things done more easily.

In this blog, we examine some of the most popular open source projects in terms of monitoring, deployment, and maintenance. Among the projects we’ve worked on, there are projects that simulate network traffic and allow you to model unpredictable (chaotic) events to develop reliable systems.

1. Cloudprober

Cloudprober is an active trace and monitoring application that detects faults before customers do. It uses the “active” monitoring model to check that components are performing as expected. For example, it actively runs probes to ensure that the front end can access the back end. Similarly, you can run a probe to verify that your internal system can actually access an in-cloud virtual machine. This tracing approach makes it easy to track your application configuration independently of implementation, and allows you to easily identify failures in your system.

Features:

  • Open source monitoring with native integration of Prometheus and Grafana. Cloudprober can also export probe results.
  • For cloud targets, automatic target discovery. Out-of-the-box support for GCE and Kubernetes; Other cloud services can be easily configured.
  • Focus on simplifying deployment. Cloudprober is written entirely in Go and compiled into binary static files. It can be deployed quickly through the Docker container. With the exception of most updates, there is usually no need to redeploy or reconfigure Cloudprober due to automatic target tracking.
  • Cloudprober Docker’s image footprint is small, containing only a statically compiled binary, and requires very little CPU and RAM to run even a large number of probes.

2. Cloud Operations Sandbox (Alpha)

Cloud Operations Sandbox is an open source platform that allows experts to learn about Google’s service reliability engineering practices and adapt them to their Cloud systems using Ops management (formerly Stackdriver). It’s based on Hipster Shop, a cloud-based local microservices platform. Note: This requires a Google Cloud services account.

Features:

  • Demo Services – Applications designed based on the modern cloud native microservices architecture.
  • One-click Deployment – Scripts handle deploying services to the Google Cloud Platform.
  • Load generator – a component that generates simulated traffic on the demo service.

3. Version Checker for Kubernetes

A Kubernetes utility allows you to observe existing versions of images running in a cluster. This tool also allows you to view the current image version in tabular format on the Grafana dashboard.

Features:

  • Multiple self-managed registries can be set up at once.
  • This utility lets you view version information in the form of Prometheus Metrics.
  • Support registration, such as ACR, DockerHub, ECR.

4. Istio

Istio is an open framework for integrating microservices, monitoring traffic movement through microservices, enforcing policies, and aggregating telemetry data in a standardized manner. Istio’s control plane provides an abstraction layer on top of the underlying platform for cluster management such as Kubernetes.

Features:

  • Automatic load balancing of HTTP, gRPC, WebSocket, and TCP communication.
  • Fine-grained control of traffic behavior through rich routing rules, retry, failover, and fault injection.
  • Pluggable policy layers and configuration apis that support access control, rate limiting, and quotas.
  • Automatically measure, log, and track all traffic within a cluster, including cluster entry and exit.
  • Secure inter-service communication in a cluster through powerful identity-based authentication and authorization.

5. Checkov

Checkov is an infrastructure-as-code static inspection tool. It scans Terraform, Cloud Details, Cubanet, serverless or ARM model Cloud infrastructures, and detects security and compliance misconfigurations.

Features:

  • Over 400 built-in rules cover best protection and security practices for AWS, Azure, and Google Cloud.
  • Evaluate Terraform vendor Settings to monitor IaaS, PaaS or SaaS development, maintenance and updates managed by Terraform.
  • Detect AWS credentials in EC2 Userdata, Lambda context variables, and Terraform providers.

6. Litmus

Cloud-native Chaos Engineering Litmus is a cloud-based Chaos modeling toolkit. Litmus provides tools to coordinate the mess on Kubernetes to help SRE find holes in its deployment. SRE first uses Litmus for chaos testing in the staging area and finally uses it to find faults and vulnerabilities during development. Addressing these defects can improve the resiliency of the system.

Features:

  • Developers can run chaos tests during application development as an extension to unit tests or integration tests.
  • For CI pipe builder: Run CHAOS as a pipe phase to find bugs when the application encounters failed paths in the pipe.

7. Locust

Locust is an easy to use, scriptable, and flexible performance testing application. Instead of clunky UI or domain-specific languages, you can define user behavior using standard Python code. This makes Locust scalable and developer friendly.

Features:

  • Locust is distributed and scalable – it’s easy to support hundreds or thousands of users.
  • Web-based UI to display progress in real time.
  • You can test any system with minor modifications.

8. Prometheus

Prometheus, a project of the Cloud Native Computing Foundation, is a systems and services monitoring system. It extracts metrics from the configured target at a specific time, tests the rules, and displays the results. If the specified condition is violated, it triggers notification.

Features:

  • Multidimensional data model (time series defined by measure name and key/value dimension set).
  • Discover targets through service discovery or static configuration.
  • Does not rely on distributed storage; Individual server nodes are autonomous.
  • PromQL, a powerful and flexible query language, takes advantage of this dimension.

9. Kube-monkey

Kube-monkey is a Kubernetes clustering implementation of Netflix’s Chaos Monkey. Randomly removing Kubernetes Pods helps create fail-safe resources and validate them at the same time.

Features:

  • Kube-monkey runs in select mode and terminates only for Kubernetes (K8S) users who specifically accept Kube-Monkey terminating their pods.
  • Highly customizable planning features are available according to your requirements.

10. PowerfulSeal

PowerfulSeal injects faults into the Kubernetes cluster to help you identify problems as quickly as possible. It makes it possible to create schemes that describe complete chaos experiments.

Features:

  • Compatible with Kubernetes, OpenStack, AWS, Azure, GCP and local computers.
  • Connect with Prometheus and Datadog to collect metrics.
  • Custom use cases allow for multiple patterns.

conclusion

One of the great strengths of open source technology is its extensibility. If needed, you can add features to the tool to better fit your custom architecture. These open source projects have extensive supporting documentation and user communities. As microservices architectures dominate the cloud computing landscape, reliable tools for monitoring and eliminating these instances will surely be part of every developer’s toolkit.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.