Brief introduction:DAS introduces the global Workload optimization function, which can detect database load changes in a timely manner, identify new SQL, SQL executing changes, and SQL with poor performance, and make optimization suggestions based on the SQL execution frequency and relevant SQL information.


In daily database optimization, creating an appropriate index on the table of the database is a very important and common solution to the problem of slow SQL query. In the process, DBAs or developers will often optimize based on slow SQL information on the instance, and the DAS automatic SQL optimization feature has implemented automatic diagnosis based on slow SQL and the creation of appropriate indexes. However, the plan will face the following challenges:

  1. Data collection problem: some business SQL does not reach the threshold of slow SQL collection (such as 1s), and these SQL queries themselves do not make good use of indexes, the query efficiency is not high, and there is still a lot of room for optimization. In the case of increased concurrency or table data, these queries can easily cause a sudden deterioration in the performance of the instance and cause a failure.
  2. Write cost problem: When creating indexes, we usually pay more attention to improving the efficiency of reading data, but ignore the impact of index maintenance on write performance and the cost of space consumption. For a table with many writes and few reads, creating too many indexes will affect system throughput instead.
  3. Problem: Once an index is created, it usually changes little, whereas the business is always changing dynamically. As the business iterates and changes, some indexes may no longer be used in SQL, or may be used infrequently, and a better index design may need to be introduced to improve the database’s processing performance.

To solve this problem, DAS introduced the Global Workload Optimization function, which can detect database load changes in a timely manner, identify new SQL, SQL executing changes, and SQL performing poorly, and make optimization recommendations based on SQL execution frequency and relevant SQL information.

Solution Introduction

The overall Workload optimization consists of three parts.

Workload detection: Train the data model based on the Workload performance metrics on the database instance (RT, CPU, etc.) and the full SQL metrics (execution times, execution time, scan rows, etc.) to monitor the execution of the Workload SQL in real time to identify new SQL and execute changed SQL. And the entire cycle of load changes.

As shown in the figure below, the full-volume SQL execution health indicator is in a periodic state between PERIOD1 and PERIOD2, and at PERIOD3, the execution health changes. Global Workload Optimization, based on the data training model, enables easy identification of time intervals for load changes.

Global diagnosis: Global diagnosis optimization recommends the optimal index combination according to all SQL execution of the database in a certain time range, considering SQL query and write performance and space occupation, so as to maximize the performance of the database from the perspective of SQL and reduce the probability of problems caused by the database.

Intelligent pressure measurement: intelligent pressure measurement can playback example in a certain period of time all SQL (the function will be detailed in the relevant article reading), combines global diagnosis and intelligent pressure measurement, the system can automatically according to diagnosis suggestion on test case create indexes, playback history flow and compared before and after the advice of SQL execution, generate test report.

The specific implementation


Global Workload Diagnostic supports user-defined triggers and system auto-detection triggers. User triggers a time interval based on business requirements and triggers global diagnosis to obtain optimization suggestions. The automatic check will monitor the Workload of the instance in real time and trigger a global Workload Diagnosis when abnormal SQL is detected in the database or if the overall Workload trend changes are detected. Exception SQL includes :(1) New SQL; (2) SQL whose execution times account for more than 20% of the fluctuation; (3) Perform average RT floating more than 20% SQL, etc.

Through the automatic detection mechanism, it can help users find the scene when the structural design lags behind the business change in time, and reduce the probability of failure and resource waste.

The data source

The data source for the global Workload Diagnosis is SQL audits, including SQL type, SQL template, number of executions, and SQL performance information. SQL audits record all SQL executed during diagnostic time, so you can detect SQL problems that are not slow SQL but are performing poorly.

Relational SQL analysis

By parsing SQL templates and metadata, you can analyze the access relationships between SQL, tables, and columns, resulting in a set of SQL that may affect each other. The correlation analysis can effectively reduce the complexity of subsequent solving problems and provide basic data support for performance tracking service after the index goes online.

Candidate index generation and cost evaluation

This module and subsequent optimization is the core module of global Workload Optimization. You can recommend indexes based on rules or experience in a single SQL index recommendation and get some results, but a rule-based approach based on global Workload optimization is almost impossible and must be able to quantify the cost. Our external optimizer, based on the DAS implementation, can parse the syntax tree quickly and accurately, sample and collect statistics, generate candidate indexes, and calculate the cost of using an index.

Optimal solution

The process of selecting the most indexed set can be equivalent to a variant of the knapsack problem under the condition of determining the candidate index set and the index cost. The benefit of selecting an index is equivalent to the value of the item in the backpack. Since creating an index gives both a positive benefit to the query and a negative benefit to the write and space costs, the value can be either positive or negative. The capacity of the backpack is the maximum indexed threshold on a table (user set or system default, not a database storage constraint). Our goal is to maximize the value of the items in the backpack. Also note that when an index is selected, it affects the value of the other indexes, so each iteration of the item selection needs to update the value of the remaining selected indexes based on the existing indexes.

Index I cost = number of executions * (a* read cost – b* write cost – c* space usage)

Effect of validation

In order to ensure the effectiveness of the optimization recommendations, we have integrated with the intelligent pressure test function to provide a quick and convenient verification solution. The intelligent pressure test system will automatically set up test instances and synchronize real data, then automatically adopt optimization suggestions on the test, playback the full amount of SQL within the diagnosis period, collect the performance data of SQL execution, and finally generate the test report. The advantage of this approach is that it ensures the consistency of the test scenario with the online business without impacting the online business, and it also allows you to predict the impact of the proposal.

The sample

For example, there are 6 SQL indexes in Table 1. If you look at each SQL separately, you may get 4 indexes in Table 2. From the Workload dimension, the indexes can be merged into the two indexes in Table 3. Comparing the two results, overall RT decreased by 14.45% and index space was saved by 50%.


SQL4 :  idx\_name(name)  

SQL5:   idx\_name\_id\_birth\_date (name, id, birth\_date)   

SQL6: idx\_name\_nick\_name (name, Nick \_name)

idx\_is\_deleted\_gmt\_modified (is\_deleted, gmt\_modified)

idx\_name\_id\_birth\_date (name, id, birth\_date) 

Future plans

In the future, we will create a closed loop of automated optimization that includes Workload Exception Detection, Global Workload Diagnosis, Intelligent Performance Assessment, Automated Recommendation Adoption, Performance Tracking, and Exception Handling. In addition, the current global workload optimization takes into account the impact of SQL execution frequency, SQL queries, and writes, but not fixed parameters or parameter skew issues, which can be further factored into the business attributes.

** Related reading: **

DAS, the database autonomy service, has released the new version of the year: 1-5000, which marks the entry of “database autonomous driving” into the era of scale

Technology revelation | large stimulating behind, how to effectively assess computing resources and planning database?

Blockbuster | database autonomous service will DAS papers to be included in the global top SIGMOD, pilot “database automated driving” a new era

| SQL request behavior recognition new functionality online to help solve the problem of finding a needle in a haystack in abnormal SQL detection

How does Autoscaling work in Aliyun

Copyright Notice:The content of this article is contributed by Aliyun real-name registered users, and the copyright belongs to the original author. Aliyun developer community does not own the copyright and does not bear the corresponding legal liability. For specific rules, please refer to User Service Agreement of Alibaba Cloud Developer Community and Guidance on Intellectual Property Protection of Alibaba Cloud Developer Community. If you find any suspected plagiarism in the community, fill in the infringement complaint form to report, once verified, the community will immediately delete the suspected infringing content.