On April 23, 2017, Tong Huaquan, director of Youyun Software Solution Center, delivered a speech titled “Youyun new generation intelligent Operation and Maintenance Management Solution” in “Operation and Maintenance Management Practice in the Cloud Era”. IT big said as the exclusive video partner, by the organizers and speakers review authorized release.

Read, read the word count: 3981 | 6 minutes

Guest lecture video review and PPT: Suo.im /NM8OI

Abstract

Tong Huaquan, director of Youyun Software Solution Center, brought us some insights on operation and maintenance management of Youyun, as a domestic manufacturer with profound experience in the field of operation and maintenance.

O&m challenges

Data center enters “two transformation”

Data centers are shifting to “two-oriented transformation”. With the widespread use of cloud computing, big data, Internet of Things, micro-services, containers and other new technologies, the technology architecture is showing a trend of “hybrid”. At the level of operation and maintenance mode, the concept of DevOps is rapidly promoted and CI/CD is deeply rooted in people’s hearts, especially Internet practices such as GoogleSRE and BATJDevOps in China. The operation and maintenance mode also presents obvious Bimodal, Gartner characteristics, the fusion business mode of steady state and sensitive state. Operation and maintenance mode also changes, and operation and maintenance management faces challenges brought by two-state IT.



Software defined Data Center (SDDC) challenges to operations

Software-defined data Centers (SDDCS) require streamlined and automated operations and maintenance management and support for application and infrastructure delivery automation.



Above is a model of a software-defined data center. Software data centers have several new requirements for operations and management. It requires more simplified and automated operations management capabilities, as well as application and infrastructure delivery automation capabilities. These are some of the new demands for operations and peacekeeping management in the new software-defined data center environment.

O&m challenges posed by connected technology architecture applications

In the construction of enterprise informatization, the wide application of the new open technology architecture of the Internet has made the operation and maintenance support of various new technologies an urgent demand.

Devops challenges to operations

The rapid promotion of DevOps concept promotes the accelerated integration of business and technology, development and operation and maintenance, and puts forward higher requirements for operation and maintenance management, especially automated operation and maintenance.

Challenges of operation and maintenance business model transformation

Under the two-state background, the data center operation and maintenance business model has changed significantly, and the integration of technology and business, development and operation has accelerated.

At the development level, we need to pay more attention to the ability of continuous delivery; At the level of operation and maintenance, higher level of automatic management ability and more agile operation and maintenance management process are required.

To better serve the business, we also require better management and support at the user and business levels. More and more attention is paid to the analysis of user experience and user behavior, so as to guarantee and promote the development of business.

Two-state operation and maintenance management concept

Operation and maintenance management concept sharing under the new background

We came up with a concept called Software Definition Ops. The operation and maintenance business is defined and implemented quickly, which can be put into the daily operation and maintenance process.

PaaS of operation and maintenance software platform is the key to operation and maintenance vitality and the best technical practice of two-state operation and maintenance. Based on the content within the definable category, platform them to the ground. Based on the OPERATION and maintenance PaaS platform, we can sort out the operation and maintenance scenes, confirm the standardization of the operation and maintenance scenes, make them through automatic operation and maintenance in a larger scope, and make the operation and maintenance visualization and continuously improve to the direction of intelligence.

Two-state Operations – Software defines operations and Maintenance (SDO) practice strategy

Dual-mode can quickly define o&M scenarios on the O&M PaaS platform to put them into operation, realizing agile O&M support for changing services, including o&M scenario analysis, scenario definition, scenario operation, and continuous optimization.

Data center operation and maintenance services

Data center o&M service sorting is the basis of O&M scenario analysis, the premise of standardization, and the basis of automation.



In the figure above, you can sort out all the contents to be managed by a data center and summarize them into four aspects.

The first is research and assessment, such as demand management, risk analysis, capacity analysis and a series of work. Then there are routine operations, response support and optimization improvements.



The O&M scenario can be roughly divided into several parts. Asset file management, asset file management is now a very hot CONCEPT of CMDB, clear how much they have IT assets, form the basis of operation and maintenance. There are also all-round monitoring management, operation on duty and fault disposal, change and proofreading management and inspection and operation management.

Construction of two-state operation and maintenance platform

Youyun Full stack Internet + operation and maintenance platform

The new generation of Internet technology architecture of micro-services and big data is positioned at the PaaS platform of operation and maintenance. The unified platform + product APP mode is adopted. The platform provides a unified collection operation layer and resource library.



1. Asset archives management

Faqs on Asset Allocation Management (1)

The asset allocation management mentioned by Excel has scattered information and lack of global management, which consumes a lot of labor costs and has low timeliness.

Data is prone to arbitrary modification, lack of version control and low data accuracy.

Faqs on Asset Allocation Management (2)

Traditional o&M tools have large resource management scope and complex maintenance process, but increase the workload of O&M personnel and lower willingness to use them.

Always in the data maintenance dilemma, never spare to think about how to play the value of configuration data.

Data center IT asset archive management solution scenario

The system promotes automatic construction, agile maintenance and scenario-based application of CMDB through whole-network scanning, automatic collection and social maintenance.



It is well known that automated scanning is very valuable. It can help us find out what IP and resources are in the network environment, so that there is no missing.

Then through the refined configuration collection, we can find out the resource details in the data center we need to build a complete configuration information.

The maintenance of configuration data requires the respective teams to maintain their own data. It is appropriate for the most familiar person to be responsible for their own data, which is a team-oriented maintenance circle.

In the process of data maintenance, we should pay attention to the feedback mechanism. It is a very good feedback mechanism for users to make corresponding responses during the use of data. So we used some social concepts like comments, likes and subscriptions to make CMDB feel like it’s not a throwback to the last century, but a modern one.

Build a consumer circle of configuration data, in which people can share the same data, which can also be graphically displayed and applied to a variety of analytical scenarios.

2. All-round monitoring and management

Monitoring FaQs (1)

Business applications and user experience are down, and the IT infrastructure is fine.

What we see is a routine maintenance perspective, and what users see is the cumulative result of all our problems.

Monitoring FaQs (2)

We don’t have a sense of what the end user is really experiencing, we don’t have a sense of what we need to improve, and we don’t have a sense of what they want us to do.

According to IDC, about 40 percent of failures are first discovered by end users and notified to the service desk.

Monitoring FaQs (3)

Troubleshooting and locating faults takes a lot of energy, requires the participation of network, system, application and development teams, and a lot of labor costs.

Data center comprehensive monitoring solution scenario

To solve the above problems, we provide monitoring capabilities for basic resources, application background and application front end. End-to-end application performance and failure monitoring from application user experience to application code, and support business transactions and user experience monitoring.

Large-scale cloud monitoring

The system supports traditional architecture and Internet architecture monitoring. The system also supports resource monitoring in non-proxy mode.

Supports the second-level monitoring of resources of ten thousand nodes; Support more than 6000 metrics and script level expansion; Support label management and display of monitoring resources; Supports customization of monitoring display dashboards.



3. Operation duty and fault disposal

O&m duty and troubleshooting common problems

Operation duty is the guardian of IT operation. Do we really have the ability to clearly grasp the operation situation, quickly analyze and locate faults, and trigger fault disposal measures?

Common problems are: whether the operation situation display is intuitive and clear, the top, middle and executive layers can not get what they need; Monitoring alarms cannot be displayed and processed in a centralized manner. Do not have the ability of fault analysis and location; The fault handling process is not continuous.



Flow alarm handling and root cause analysis

Supports centralized alarm management, a high-performance alarm association engine, and automatic alarm handling, ensuring that alarms are reported, reported accurately, and found correctly.

ECC large screen visualization display

“Clearly visible and well managed” is the essence of management. Visualization of operation and maintenance is a safe and reliable dashboard for data center operation, supporting on-demand design.

ECC large screen design has three modes: operation situation display (visit mode), operation and maintenance command and scheduling (command mode), operation and duty mode (duty mode).

4. Inspection and job management

Common problems during O&M operations

Under the background of data center software definition, Internet technology architecture, rapid business development and continuous application delivery, especially the comprehensive promotion of Devops concept in the field of operation and maintenance, automated operation and maintenance has become the “touchstone” for the improvement of operation and maintenance management ability.

High operation and maintenance pressure and low efficiency; Operation and maintenance standardization is difficult to really fall into the ground; There is a safety hazard in the operation, according to “Murphy’s Law” — mistakes happen, so the pot is there; Continuous delivery pressure from business changes; Low level of automated delivery of IT services.

Inspection and Job management scenario analysis

The standardization and automation of automatic operations are the key to the standardization of operation and maintenance, the key to improving operation and maintenance efficiency and reducing operational risks, and the important means of rapid fault handling and emergency response.



Automatic inspection management

Inspection capability: Automatic inspection for Windows, Linux, and AIX indicators. Flexibly add system inspection items.

Various inspection items include system parameters, service status, error logs, abnormal logins, key processes, and compliance checks.

Automatic operation and maintenance

The system supports automatic operation scenarios such as environment preparation, system patch upgrade, system parameter modification, compliance check, service start and stop, data backup, and emergency switchover.

Continuous application delivery with Devops

Rapid application deployment, including environment preparation, basic software deployment, application deployment, and parameter configuration, supports continuous application delivery.

Job scene arrangement and job scheduling management

The system supports script library of operation and maintenance operation best practices, supports flexible job scheduling and job scheduling capabilities, and realizes automatic operation of data centers. Operations personnel from cannon fodder to battlefield commanders.

5. Change and delivery management

Change and delivery FaQs

The operation and maintenance department has clear service characteristics, but whether we can satisfy users in terms of service convenience, service efficiency and service level.

It is not clear what services are provided externally and how the operation and maintenance team needs to support; The way to provide services for end users is single, often relying on telephone, mail; Internal operation efficiency and coordination level is low; Lack of tools and methods to automate the flow of external services.



Operation and maintenance service process ITSM

ITIL/DevOps processes are fully supported

Support ITIL V3 / ISO20000 concept related operation and maintenance process; Support the operation and maintenance business through the service catalog and drive the process by the service catalog; Support flexible drag-and-drop design of forms and processes; Adopt social, agile process interaction mode, support work order comment, dynamic, etc.

Social support

Provide work order concern function; Provide work order comment function, comment support reply; Comment support circle people function; Comments support instant in-site notification by mail or other means.

Real-time o&M collaboration with ChatOps

ChatOps concept definition

ChatOps is a real-time chat-driven operation and maintenance model. Through the implantation of automated robots into chat sessions, the automated and transparent linkage of human, machine, and data is formed, enabling operation and maintenance teams to effectively communicate, collaborate, and execute tasks. ChatOps is a practical evolution method of DevOps.



ChatOps helps smooth the organization’s evolution to DevOps

ChatOps is a unified o&M portal for both internal and external o&M users. The organization O&M mode is transparent to users and helps smooth the evolution to a higher stage of DevOps.

The origin and application of ChatOps on GitHub

Manage Github internal server, start and stop, upgrade, patch; Email management, sending and receiving personal emails; Code submission notification; Code construction, deployment on-line; Database management, deleting data, backing up data…

Rules for construction of intelligent operation and maintenance platform

Youyun intelligent operation and maintenance management platform can help all kinds of enterprise users gradually move to a higher level of “integration”, “automation” and “intelligent” operation and maintenance management.

1. Integration: Traditional operation and maintenance tools are scattered, and there is a lack of effective operation and maintenance data and scene integration among tools. Therefore, OPERATION and maintenance PaaS platform based on new technology architecture becomes the mainstream choice.

2. Automation: Transfer the operation and maintenance of human workers to the automatic operation and maintenance of changes, gradually realize the automatic implementation of operation and maintenance scenes such as daily operation and maintenance operation and continuous application delivery, improve operation efficiency and process standardization, and reduce the risk of manual operation;

3, intelligent, through big data analysis, operation and maintenance intelligent robot technology to achieve intelligent operation and maintenance management, support fault self-healing, capacity expansion, emergency support, etc.



I hope youyun can help you transform from integrated operation and maintenance to automated operation and maintenance, and see the dawn of automation in the future. Thank you.

That’s all for today’s sharing, thank you!