Recently, after cloud Intelligence launched FlyFish, an open source Operation Management Platform OMP (Operation Management Platform). By wisdom, independent design and r&d of cloud, lightweight, convergent and intelligence operations as one integrated management platform, nanotubes, deployment, monitoring, inspection, self-healing, backup, restore, and other functions, can provide users with convenient operational ability and business management, while enhances the working efficiency of the operations staff, etc, greatly improved the business continuity and security.

GitHub address: github.com/CloudWise-O…

Gitee address: gitee.com/CloudWise/O…

Why does cloud Wisdom launch this open source operation and maintenance management platform OMP with such powerful functions? Let’s listen to Simon, technical director of Cloud Intelligence, who is in charge of the project: “We want to bring the innovative and practical experience accumulated by cloud intelligence in intelligent operation and maintenance field for more than 10 years to the vast number of developers in an open source way. OMP can really solve the pain points of operation and maintenance personnel, so that operation and maintenance work can be simpler and more efficient. In the future, we look forward to working with everyone in the industry to promote the development of AIOps community.”

The original intention of OMP is to solve operational pain points

With digital transformation in full swing, the company’s projects and products are being upgraded rapidly. This brings new requirements and challenges to software developers and customer engineers in terms of quick installation, quick location, automatic analysis, alarm monitoring, and self-healing of faults.

For example, when the host login is not unified, some customers allow SSH direct connection, some customers need to jump board, some customers only allow display operation. After a product is launched, it often lacks a mature guarantee mechanism. If there is no accurate monitoring, alarm and self-healing system, it will be very passive and difficult to solve problems quickly in case of abnormalities or faults. Even after the product is deployed according to the preliminary planning, due to the lack of regular inspection and analysis, it is difficult for the operation and maintenance personnel to quickly grasp the operating status and service processing capacity of the current service system and provide optimization solutions.

The above problems are the work scenarios commonly encountered by operation and maintenance personnel through multiple investigations. The following are the pain points of operation and maintenance that we simply summarize:

  • Host environments are diverse and difficult to manage in a unified manner. Such as hybrid cloud, private cloud, cross-IDC, virtualization, container, etc.
  • The difficulty of business change is great and the ability of automatic arrangement is low.
  • Multi-platform service monitoring is difficult to realize data linkage;
  • Fault self-healing is difficult when services are abnormal.
  • It is difficult to evaluate and analyze the operating state of the business;
  • Lack of operation and maintenance knowledge, lack of expert guidance and expert solutions.

In order to effectively help o&M personnel solve the above o&M pain points, Cloud Intelligence has created OMP (Operation and Maintenance management platform) in order to reduce the delivery difficulty and improve the maintainability of products. The platform currently has host management, application management, application monitoring, status inspection and other core features.

OMP core features

The host nanotubes

Manages all host resources, monitors the running status of hosts in real time, and supports online management.

Application management

It provides common basic components, application services, and standard-compliant self-developed products, and supports service status management such as installation, deployment, change release, flexible capacity expansion, and online configuration optimization.

Application of monitoring

It covers various service scenarios, such as standard monitoring, customized monitoring, link monitoring, and intelligent monitoring. It detects future trends through intelligent measurement of big data and controls exceptions before they occur.

State inspection

Periodically summarize service indicators and running status, and automatically execute and send reports as required.

The above features are derived from the innovative ideas of Cloud wisdom for many years of deep cultivation in the field of operation and maintenance, and the practical experience of providing professional services for our customers. OMP is outstanding not only for its features of cloud intelligence, high-precision technology and algorithms, but also for its core technology architecture.

OMP core technology architecture

As shown in the OMP architecture diagram below, the OMP front-end is based on the React framework designed by Ant Design, and the back-end uses Django framework to integrate SaltStack and other components to achieve basic functions. The Agent implements the installation and control of services through the SaltStack Agent, and develops the Monitor Agent to collect data indicators.

The monitoring component adopts popular open source products such as Prometheus, Grafana, AlertManager and Loki. In terms of data storage, OMP uses MySQL to store persistent data, while Redis is used to store temporary data, cache and simple message queues.

OMP Future Open source plan

In further perfecting the OMP technology architecture and the core feature of innovation practice, we also found that only the features and functions, or not enough to support all the majority of developers of the operational requirements, therefore we continues to open source the OMP other functions, such as the following these modules, please wait at the same time, welcome to our valuable development Suggestions are put forward.

Self-healing: When an exception or fault occurs in the service system, the system handles the fault according to the preset self-healing policy to greatly reduce the impact of the fault on services and reduce enterprise losses.

Backup/recovery: Backup core data remotely and automatically execute and send the backup to achieve the effect of remote + remote storage, so that the user data is secure enough.

Simplified tools: You can build common o&M tools, commands, scripts, and SQL to reduce misoperations and technical thresholds. You can maintain and expand tools as required to facilitate routine O&M.

Knowledge library: accumulate knowledge of common operation and maintenance technologies, solutions and business functions, and maintain and expand knowledge content as required.

Small wisdom answer: when the need such as operation documents, solutions, common technology and other content can be quickly retrieved, when the need for technical support, can apply for manual support.

Open source communities accelerate innovation

Since AIOps community was established in August this year, the community has shared FlyFish, Moore platform, Hours algorithm and other products. Among them, the GAIA data set, the first open source intelligent operation and maintenance data set in the industry, fills the blank in the field of AIOps open source data set. FlyFish won the Award of 2021 Excellent Open Source Project of China Open Source Cloud Alliance in one week.

In the future, Cloud Wisdom will pay more attention to the innovation and promotion of OMP in the AIOps community, and build a harmonious, inclusive and open OMP developer community together with the majority of users, researchers and developers.

If you want to communicate with the maintainers of OMP project face to face and know the information of OMP open source for the first time, scan the QR code below and add the wechat account of AIOps Community Assistant (note OMP)