Abstract:This paper attempts to derive the possible application scenarios of database intelligent monitoring system based on user role from concept and logic.

This article is shared from the Huawei cloud community “GausSDB (DWS) database intelligent monitoring system application scenario analysis”, the original author: Master Lu.

Similar to the project approval mode of Internet products, when we define and design a new product, we first need to analyze the needs of users, summarize, organize and comprehensively analyze the needs of users, and define our product positioning, functions, business logic, user interface and so on. Therefore, in order to design a good database intelligent monitoring system, we need to do demand analysis on the target users of the database monitoring system, collect user demands, excavate user potential needs, and draw typical user portraits. Finally, the implementation architecture of the database monitoring system is designed, and the various needs of typical users are incorporated into the design architecture pipeline of the product.

Database intelligent monitoring system for users

In a practical application scenario, users of a database monitoring system may have many different roles. Different companies may have more segmented or aggregated user roles due to different organizational structures. But in general, it can be summarized into the following three user types:

  • App Dev
  • Operations Engineer (SRE)
  • Database Administrator (DBA)

Role of Application Development Engineer: Mainly responsible for developing business SQL in cloud applications, responsible for the functionality and performance of cloud services. At the same time, you need to ensure that the SQL written is efficient and good, and does not cause additional resource consumption and time consumption for the cluster. Therefore, application development engineers need to be able to monitor the newly developed SQL and understand the execution efficiency and resource consumption of the new query.

Role of operation and maintenance engineer: mainly responsible for ensuring the long-term stable operation of database cluster. The database system needs to be evaluated in terms of resource consumption and system load. You need to be able to configure alarm scenarios for your database, see real-time or predictive database alarm information, and report problems found to the database administrator role for further processing. In general, the role of Operations Engineer will monitor a large number of database clusters, and he will not do a very in-depth analysis of each cluster, but will more often act as a problem finder.

The role of database administrator: It is primarily responsible for locating the root causes of database problems and providing solutions. Database administrators need to be experts in the field of databases, familiar with all aspects of databases, who can analyze database monitoring data from multiple dimensions, locate database failures, and provide solutions.

It should be noted that the above three roles do not refer to positions in the actual production environment, but are typical role symbols summarized to facilitate the analysis of user needs. In a real production environment, there may be a scenario in which the three roles are the same person, or a scenario in which the SRE position doubles as the SRE and DBA roles. Here, we divide users into three roles, mainly to facilitate us to do demand analysis and build the corresponding character portrait, so as to further identify the tools needed by the corresponding roles and people. Finally, present to everyone a clear idea of database monitoring system development concept context.

Database intelligent monitoring system tools and application scenarios

Through the above abstraction and carding, we find that in the process of database monitoring operation and maintenance, the three roles respectively correspond to different needs, and different needs will lead to different tools or different emphases of the same tool. Here’s a detailed description of the tools that will be used by each of the three roles:

Application development roles, who only care about whether the SQL they write is efficient, uses the cluster’s various optimization features, and consumes too much of the cluster’s resources? Therefore, he needed a tool that would allow him to evaluate the execution efficiency of the SQL he was writing, namely WebSQL tools that would allow users to simply connect to the database and execute SQL statements. WebSQL can return the results of the execution of the SQL statements, as well as their execution plans, to help the application development role understand how efficiently their SQL statements are executed. At the same time, the user’s SQL statements are not simply executed in a single statement, but need to be executed in the entire job flow. A baseline that measures its execution time and resource consumption in the job flow becomes very important. Therefore, we need query monitoring to record the execution time and resource consumption of the SQL for the feature, and to calculate the maximum, minimum, and average values as a baseline to further help users assess their SQL execution efficiency. In the user site, because of resource isolation requirements, the user’s job needs to be bound to a workload queue for execution, so the resource allocation of the work in the queue, as well as the load level of the workload queue and other data become very important. Will the newly added SQL statement cause workload queue overload? It is important for the application development role to have a visual understanding of the resources currently on the queue before the newly developed application goes live.

System Operations Roles (SRE), who are concerned with the long-term stability of a large number of database systems in the cloud, are going to provide three facets of tools to solve this problem.

The health index index is a composite index, which is mainly supported by two indexes, resource consumption index and database system load index. And these two kinds of indexes have their further atomic index and extension index support. The calculation of cluster health index needs to design a set of corresponding mathematical model, based on which we can quantify the health index of the system, so that the system administrator can find the problem database quickly from hundreds of databases on the cloud very easily.

In addition to passive metrics such as health, which require the system administrator to look at in person, DMS further provides a comprehensive alerting capability. DMS will provide the warning capability of the database from three levels. (1) At the DMS-Agent end, through the means of log analysis, the logs of the operating system and database on the node of DMS-Agent are analyzed in real time. When the threat keywords are found, the alarm will be triggered immediately and reported to the warning platform through the corresponding channels. (2) In the DMS server, because DMS has all the monitoring data of the database cluster, we can design the corresponding alarm rules through data analysis methods and professional knowledge of the database, periodically check the database cluster, and directly trigger the alarm when problems are found; (3) As for the database cluster index data collected by DMS, it can be used as the index of threshold alarm, and all of them can be connected to CES, and threshold alarm can be done through CES service. All three of these alarms need to be configured and displayed on the front page of the DMS.

There is a natural connection between artificial intelligence and cloud computing. When the database is in the cloud, AIOPS, the intersection node between artificial intelligence and database operation and maintenance, naturally appears. Since the DMS owns all the monitoring data of the data cluster, the historical monitoring data is used to distinguish the working mode of the cluster and the optimal configuration parameters are recommended. Forecast the growth trend of database disk space, inform the user in advance of capacity expansion or operation requirements, etc. All this is possible with the help of artificial intelligence.

The role of a database administrator (DBA) is that of a database administrator. In a traditional data center, a database administrator is responsible for optimizing the performance of the database, maintaining the long-term stability of the database, and sometimes even helping application development engineers optimize the SQL. However, in the cloud era, the division of work of database administrators will become more detailed. Application development and system administrators share part of the work of database management, which makes the role of database administrators become more pure. As an expert in the database domain, the database administrator is responsible for locating the root cause of a database problem and providing solutions to the problem. The two roles of system administrator and database administrator eventually form a task loop of problem discovery, problem analysis and problem solving. Therefore, on the cloud, SRE positions tend to include the responsibilities of both SRE and DBA roles.

A DBA is a database expert and a master of using database tools to locate various database problems. To locate the root cause of the problem, he will need two types of tools: fault analysis tools and fault self-healing tools. Among them, the fault analysis tool, will provide a variety of monitoring data and data visualization forms, for the database administrator to quickly locate the root cause of the problem to provide assistance. Fault self – healing tools, is the database administrator in the past to locate problems, solve the problem of the experience of solidification. As we learn more about how DBAs work, there will be more and more self-healing tools.

Another kind of important responsibility of the database administrator is to provide solutions to problems, which is a very important part of the operation system. No matter how good the troubleshooting tool is, if there is no solution to the problem, it will not really help users in the end. Therefore, we need to build a professional search engine for root causes and solutions to help users and help us speed up the problem solving process and ease the workload of front-line customer support staff.

This paper is the second of three articles that introduce the core concepts of database monitoring operation and maintenance system design on the cloud. It tries to deduce the possible application scenarios of database intelligent monitoring system based on user roles from the concept and logic. With this basic framework in place, it becomes clear what we need to do next and what tools we need to use. We hope that our expectations will soon become a display, so that the cloud database operation and maintenance work will become easier and more intelligent.

Want to know more information about GUASSDB (DWS), welcome WeChat search “GAUSSDB DWS” pay attention to WeChat public number, and you share the latest and most complete PB series silo black technology, background can also obtain many learning materials Oh ~

Click on the attention, the first time to understand Huawei cloud fresh technology ~