SQLAdvisor is an SQL optimization tool developed and maintained by Beijing DBA team of Meituan-Dianping: Input SQL and output index optimization suggestions. It is now open source. It is based on MySQL native lexical analysis, combined with SQL where conditions, field selection, aggregation conditions, multi-table Join relationship and other final output optimal index optimization recommendations. According to the development team, SQLAdvisor is widely used within Meituan. it is mature and stable, and the open source version and internal use version are completely consistent. We hope to build an excellent SQL optimization product together with teams with similar needs in the industry.
SQLAdvisor architecture flowchart:
Example of SQLAdvisor
sql: SELECT id FROM crm_loan WHERE id_card = '1234567' cmd: ./sqladvisor -h xx -pxx -u xx -pxx -d xx -q "SELECT ID FROM crM_loan WHERE ID_card = '1234567'" alter table crm_loan add index idx_id_card(id_card)Copy the code
SQLAdvisor quick start tutorial
The advantages of SQLAdvisor
-
Based on MySQL native lexical analysis, the performance, quasi-determination and stability of lexical analysis are fully guaranteed.
-
Support common SQL (Insert/Delete/Update/Select);
-
Support multi-table Join and automatically select the driver table logically;
-
Support aggregation conditions Order by and Group BY;
-
Filter existing indexes in the table.
Introduction to SQLAdvisor
The Join processing
-
Join syntax is divided into two types: Join on and Join using, and Join ON sometimes exists in where conditions.
-
By analyzing Join conditions, a table list of nested_join will be obtained. Join on and Join using can be distinguished by judging whether its join_using_fields field is empty.
-
The generated table list is stored in the form of a binary tree, and then the binary tree is traversed in sequence.
-
When the internal parse tree is generated, the right Join is converted to the left Join.
-
The Join condition will exist on the leaf node of the same layer. If the left and right nodes are leaf nodes, the right leaf node will exist.
-
Each non-leaf node represents the result of a Join.
Mysql_sql_parse_join (TABLE_LIST join_TABLE) mysQL_SQL_PARse_JOIN (Item join_condition). The main flow chart is as follows:
Where processing
-
This is mainly to extract where conditions of SQL statements. Where conditions are usually joined by AND AND OR connectors. Since OR is more difficult to process, only the AND connectors are processed.
-
Because a Join condition can exist in a WHERE condition, the distinction needs to be made.
-
Gets the WHERE condition in turn, discarding the condition if the operator in the condition is like and not a prefix match.
-
The discriminations of fields calculated based on conditions are sorted in reverse order. If the discriminations are smaller than 30, the fields are discarded. Also use the leftmost principle to order where conditions.
Computational distinction
-
Run “show table status like” to obtain the total number of rows table_count.
-
Select best_index (Primary key > Unique key > general index) by computing.
-
Obtain the starting value of data sampling offset and sampling range rand_rows by calculating:
-
offset = (table_count / 2) > 10W ? 10W : (table_count / 2)
-
rand_rows =(table_count / 2) > 1W ? 1W : (table_count / 2)
-
Select count(1) from (select field from table force index(best_index) order by cl.. Desc limit rand_rows) where field_print obtains the rows that meet the condition.
-
cardinality = rows == 0 ? rand_rows : rand_rows / rows;
-
After the selectivity is calculated, the condition is added to the alternative index in the table based on the selectivity.
The main function involved is mysql_SQL_parse_field_cardinality_new () to calculate the selectivity.
Add alternate indexes
-
Mysql_sql_parse_index () adds criteria to the alternative indexed list by degree of selection.
-
The flow chart of the above two functions is as follows:
Group and Order processing
-
Whether the Group and Order fields can be indexed must meet the following conditions:
-
The fields involved must come from the same table, and the table must be the identified driver table.
-
Group by is better than Order BY. Only one Group by can exist at the same time.
-
The Order BY field must be sorted in exactly the same direction, otherwise the entire Order BY field column is discarded.
-
When the Order BY condition contains a primary key, if the primary key field is Order by. At the end of the field column, ignore the primary key or discard the entire Order by field column.
The indexed column sorting priority: the equivalent > (group by | order by) > contour.
The functions designed in this process mainly include:
-
Mysql_sql_parse_group () checks whether the fields after the Group are all from the same table.
-
Mysql_sql_parse_order () checks whether the condition after Order can be used.
-
Mysql_sql_parse_group_order_add () adds the fields in order to the alternate indexed list.
Driver table selection
-
After the previous WHERE and Join analysis, the table association relationship in SQL has been stored, and the candidate driver table has been determined according to certain logic.
-
In the candidate driver table, the result set size in the table is calculated according to the first field in the candidate index field of each table.
-
Use Explain SELECT * from table where field to calculate the result set in the table.
-
The smallest result set is identified as the driver table.
-
The function involved in this step is final_table_drived(), where the get_join_table_result_set() function is called to get the number of rows for each driver candidate table.
Add the driven table alternative index
-
Through the above procedure, the driver table has been selected and the conditions in the statement have been saved by parsing.
-
Since the driven table is selected, the index of the driven table needs to be added according to the Join condition.
-
Mysql_index_add_condition_field () mysql_index_add_condition_field()
Output advice
-
Through the previous steps, all alternative index keys for each table have been saved. At this point, you just need to determine whether the candidate index key in each table already exists in the actual table. If there is no index, add the corresponding index.
-
The function involved in this step is print_index(). The main flow chart is as follows:
Source: Technical team of Meituan Dianping