preface

Case analysis

Let’s take a quick look at the difference between a non-relational database and a relational database.

MongoDB is one of the NoSQL types. NoSQL stands for Not only SQL, non-relational database. It is characterized by high performance, strong expansibility and flexible mode, especially in high concurrency scenarios. But at present it is only the supplement of the relational database, it in the consistency of data, data security, query complexity and relational database still have a certain gap. MySQL > MySQL > MySQL > MySQL > MySQL > MySQL

MySQL is a relational database, strong query function, high data consistency, high data security, support secondary index. However, the performance is not as good as that of MongoDB, especially for data with a level of more than one million, which is prone to slow queries. At this time need to analyze the reasons for the slow query, generally is the programmer SQL write bad, or there is no key index, or index failure and other reasons.

The company’s ERP system database is mainly MongoDB (NoSQL, which is closest to relational data), followed by Redis, and MySQL only occupies a small part. MySQL is now being used again thanks to Alibaba’s Qimen and Jushita systems. Considering that the number of orders has been more than a million levels, the performance analysis of MySQL is particularly important.

Let’s start with two simple examples. The functions and meanings of each parameter are described in details later. Note: the SQL you need to use has been put on Github, like students can click star, haha. Github.com/ITDragonBlo…

Scenario 1: Order import, using the transaction number to avoid repeated orders

Business logic: When importing an order, in order to avoid repeated lead orders, the transaction number is generally used to query the database to determine whether the order already exists.

The most basic SQL statement

mysql> select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +-------+--------------------+-------+------+----------+--------------+----------+------------------+-------------+----- --------+------------+---------------------+ | id | transaction_id | gross | net | stock_id | order_status | descript | finance_descript | create_type | order_level | input_user | input_date | +-------+--------------------+-------+------+----------+--------------+----------+------------------+-------------+----- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 10000 | 81 x97310v32236260e 10 | | | | | 1 6.13 6.6 ok ok | | auto | | 1 itdragon | 2017-08-18 17:01:49 | +-------+--------------------+-------+------+----------+--------------+----------+------------------+-------------+----- --------+------------+---------------------+ mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ -------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | itdragon_order_list | NULL | | NULL ALL | NULL | NULL | NULL | 3 | | 33.33 Using the where | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ -------------+Copy the code

There was nothing wrong with the query itself, nor was there anything wrong with the offline test environment. However, once the function is online, the problem of slow query is oncoming. Millions of orders, full table scan? Ah? Hum! How do I know that this SQL is a full table scan? The explain command shows how MySQL processes SQL statements. The printed contents respectively represent:

id

Query the serial number 1.

select_type

The query type is simple, and simple SELECT statements do not have union and subqueries.

table

Itdragon_order_list are listed in the table.

partitions

No partitions.

type

Connection type: all indicates that full table scan is adopted.

possible_keys

Index null may be used.

key

The actual index is null.

key_len

The index length is also null.

ref

No columns or arguments are used with the key.

Extra

The WHERE query was used.

Because there are only three pieces of data in the database, the rows and filtered information doesn’t matter much. If type is ALL, the performance of the full table scan is the worst. If there are millions of data in the database, it will be unusually slow without the help of the index.

Preliminary optimization: Create an index for transaction_id

mysql> create unique index idx_order_transaID on itdragon_order_list (transaction_id); mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------+ | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | NULL | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------+Copy the code

The index created here is a unique index, not a normal index.

The only index that prints the type value is const. Indicates that it can be found by indexing once. That is to find the value of the end of the scan to return the query results.

A normal index prints a type value of ref. Represents a non-unique index scan. Find the value and continue scanning until the index file has been scanned. (No code posted here)

Obviously, const performs much better than ref. And based on the business logic, it makes sense to create a unique index.

Optimize again: overwrite indexes

mysql> explain select transaction_id from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref  | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------------+ | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | Using index | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+----- --+------+----------+-------------+Copy the code

Select * from select transaction_id from Extra (select transaction_id from Extra) Using index (select transaction_id from Extra) Using filesort(Using internal sort) and Using temporary(Using temporary table) indicate that the SQL needs to be optimized immediately. According to the business logic, the query structure return transaction_id can satisfy the business logic requirements.

Scenario 2: Order management page, sorted by order level and order entry time

Business logic: Priority processing orders with high order level and long input time. Order by, order by, order by, order by, order by, order by

The most basic SQL statement

mysql> explain select * from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered  | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort  | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+Copy the code

First of all, Using full table scan is not reasonable, and Using file sorting Using filesort further slows down performance. MySQL before version 4.1 uses the dual-path sorting algorithm to sort files. As a result, it takes a long time to scan disks twice. Finally, a single sorting algorithm is optimized. The essence of this is to trade space for time, but if the amount of data is too large, the buffer space is insufficient, resulting in multiple I/ OS. The effect is worse. Rather than ask your operations colleagues to change the MySQL configuration, build the index yourself.

Preliminary optimization: create composite index for order_level,input_date

mysql> create index idx_order_levelDate on itdragon_order_list (order_level,input_date); mysql> explain select * from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered  | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort  | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+ ----------------+Copy the code

After creating a composite index, you will be surprised to find that it is the same as if you did not create the index. Both are full table scans, both use file sort. Index failure? Failed index creation? Let’s try to see what the print looks like

mysql> explain select order_level,input_date from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------------+ | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | Using index | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------------+Copy the code

Select order_level,input_date from select * from Type changed from all to index, indicating full index scan. Extra also indicates that overwrite index is used. But wrong !!!! Order_level and input_date are the only two fields returned. Should we create a composite index for each field? MySQL is not so stupid as to use force index to force an index. Change force index(idx_order_levelDate) on the original SQL statement.

mysql> explain select * from itdragon_order_list force index(idx_order_levelDate) order by order_level,input_date; +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------+ | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | NULL | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+-- ----+----------+-------+Copy the code

Optimize again: Should order levels really be sorted?

It doesn’t make much sense to sort an order level, nor does it make much sense to index an order level. Because order_level can only have low, medium, high, and rush. Sorting and indexing are of little use for fields that are repetitive and evenly distributed. Can we fix order_level first and then sort input_date? If the query is effective, you can recommend this query method to your colleagues.

mysql> explain select * from itdragon_order_list where order_level=3 order by input_date; +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---- ---+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---- ---+------+----------+-----------------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ref | idx_order_levelDate | idx_order_levelDate | 5 | const | 1 | 100 | Using index condition | +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---- ---+------+----------+-----------------------+Copy the code

Type is upgraded from index to ref(non-unique index scan) compared to the previous SQL. Index length changed from 68 to 5, indicating that only one index was used. Ref is also a constant. Extra: Using index condition: automatically selects index scan or full table scan based on the critical value. Overall, the performance is much better than the previous SQL.

The above two examples are just a quick start, but it’s important to remember that optimization is based on business logic. Business logic should never be modified for optimization. It would be best if it could be modified.

The index profile

Official definition: An Index is a data structure that helps MySQL efficiently retrieve data. Why is an index a data structure and how does it speed up queries? We take the most common binary tree to analyze how indexing works. Take a look at the picture below:

Advantages of creating indexes

1. Improve the speed of data retrieval and reduce database IO costs: the meaning of using indexes is to speed up the search by reducing the number of records that need to be queried in the table.

2. Reduce the cost of sorting data, reduce CPU consumption: the index is fast, because the data is sorted first, if the field exactly needs to be sorted, it really reduces the cost of sorting.

Disadvantages of creating indexes

1. Storage space: An index is actually a table that records primary keys and index fields. It is usually stored on disk as an index file.

2. Slow down the update speed of the table: when the data of the table changes, the corresponding index also needs to be changed, thus reducing the update speed. Otherwise, the index may point to the wrong physical data, which is one of the causes of index failure.

3. Creating good indexes is difficult: Creating indexes doesn’t happen overnight and doesn’t always work. The best indexes need to be created frequently based on user behavior and specific business logic.

The index classification

When we say index, we usually refer to the index of the BTree structure. There are aggregated index, secondary index, composite index, prefix index, unique index, general index, of course, in addition to B+ tree, there are hash index, etc.

Single value index

An index contains only a single column, and a table can have multiple single-column indexes

The only index

The value of the indexed column must be unique, but null values are allowed

The composite index

An index contains multiple columns, which is recommended for development

In practice, compound indexes are recommended and the number of indexes created for a single table should not exceed five

Basic syntax:

Create:

create [unique] index indexName on tableName (columnName...) alter tableName add [unique] index [indexName] on (columnName...)

Delete:

drop index [indexName] on tableName

To view:

show index from tableName

Which cases need to be indexed:

1. Primary key, unique index

2. Create indexes for frequently used fields

3. Create indexes for fields that need to be sorted, grouped, and counted frequently

4. Query the fields associated with other tables and create indexes based on the foreign key relationship

When not to index:

1. There are too few records in the table, so there is no need to create indexes for data below millions

2. Do not create indexes for tables that are frequently added or deleted

3. Do not create indexes for duplicate and evenly distributed fields, such as true and false.

4. Frequently updated fields are not suitable for creating indexes

5. Do not create indexes for columns that are not used in the WHERE condition

Performance analysis

MySQL bottlenecks

The disk space is insufficient, the disk I/O is too large, and the server hardware performance is low.

1.CPU: The CPU is saturated when data is loaded into memory or read from disk

IO: Disk I/O bottleneck occurs when much more data is loaded than memory capacity

3. Performance bottlenecks of server hardware: Top,free,iostat, and vmstat You can view the system performance status

Explain analyze SQL statements

Use the Explain keyword to simulate the optimizer’s execution of SQL queries to see how MySQL processes SQL statements.

+----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-------+ | id |  select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-----+---------+------+------+----------+-------+Copy the code

id

Select The serial number of a query, containing a repeatable set of numbers, indicating the order in which SQL statements are executed in the query. Generally, there are three cases: first, all ids are the same, SQL execution order is from top to bottom; Second: all IDS are different, SQL execution order is based on the id of the first execution; The third type: id exists both identical and different. The command is executed based on a large ID first, and then the command is executed based on the same ID from top to bottom.

select_type

Select query type, mainly used to distinguish ordinary query, federated query, nested complex query

Simple: A simple select query that does not contain subqueries or unions

Primary: If the query contains any complex subqueries, the outermost query is marked as primary

Subquery: Contains subqueries in a SELECT or WHERE list

Derived: Subqueries contained in the FROM list are labeled as derived, and MySQL performs these subqueries recursively, putting the results in temporary tables.

Union: If the second select appears after union, it is marked as union, and if union is included in the subquery of the FROM clause, the outer select is marked as: Derived

Union result: Select the result from the union table

partitions

The table uses partitions, if you want to count the amount of company orders for ten years, you can divide the data into ten zones, each year represents a zone. This can greatly improve query efficiency.

type

This is a very important parameter, connection type, common are: all, index, range, ref, eq_ref, const, system, null eight levels.

System > const > eq_ref > ref > range > index > all

It is a good and responsible Java programmer to ensure that the query is at least range level or preferably ref level.

All (Full table scan) is the worst. If there are millions of data, the full table scan will be very slow.

A full index file scan is much better than an ALL file scan. After all, it is faster to find data from an index tree than from a full table.

Range: Retrieves only rows in a given range, using indexes to match rows. The scope is narrowed, certainly faster than full table and full index file scans. SQL statements generally contain between, in, >, and < queries.

Ref: Non-unique index scan, which is essentially an index access, returns all rows that match a single value. For example, if you query all the colleagues belonging to the r&d team in the company, the matching result will be multiple values, not unique values.

Eq_ref: Unique index scan, for each index key, there is one record in the table that matches it. If you look up the CEO of a company, for example, you can only get one record.

Const: indicates that it can be found through the index once, and const is used to compare primary key or unique indexes. Because only one row of data is matched, MySQL will soon be able to convert the query to a constant if the primary key is placed in the WHERE list.

System: The table has only one record (equal to the system table), which is a const column

possible_keys

Displays indexes (one, more, or null) that may be used by the query but may not be actually used by the query. For reference only.

key

Displays the actual index used by the query statement. If null, no index is used.

key_len

Displays the number of bytes used in the index. The length of the index used in the query can be calculated by key_len. The shorter the index length, the better, without sacrificing accuracy. The value key_len displays is the most likely length of the index field, not the actual length, i.e. key_len is calculated from the table definition, not retrieved from the table.

ref

Shows which column or constant of the index is used to find the value on the index column.

rows

Based on table statistics and index selection, the number of rows that need to be read to find the desired record is roughly estimated. The larger the value is, the worse it is.

extra

Using filesort: indicates that MySQL uses an external index sort for data, instead of reading data in the order of the indexes in the table. A sort operation in MySQL that cannot be done with an index is called “file sort”. Optimize SQL immediately when this occurs.

Using temporary: A temporary table is used to hold intermediate results. MySQL uses temporary tables when sorting query results. Common in sort order by and group by queries. This makes it even more important to optimize SQL immediately.

Using index: the corresponding select operation Using index (Covering index), avoid access to the table row, effect is good! Using WHERE indicates that the index is used to perform a lookup of the index key value. If Using WHERE is not present at the same time, the index is used to read data rather than perform lookup actions.

Covering Index (Covering Index) : also called Index Covering, is the select data column only from the Index can be obtained, do not have to read the data row, MySQL can use the Index to return the select list of fields, and do not have to read the data file again according to the Index.

Using Index condition: A new feature added since version 5.6 allows the optimizer to select whether to use an index or perform a full table traversal based on the ratio of the number of items in the RANGE to the total number in the presence of an index.

Using WHERE: Indicates that where filtering is used

Using join buffer: Indicates that the join buffer is used

Impossible WHERE: The value of the where statement is always false and cannot be used to get any elements

Distinct: Optimizes the DISTINCT operation to stop finding the same value once the first matching tuple is found.

filtered

A value of a percentage, used with the value of the Rows column, estimates the result set of the previous table in the query execution plan (QEP) to determine the number of join loops. Small tables drive large tables, reducing the number of connections.

Through the parameter introduction of EXPLAIN, we can know that:

1. Table read order (ID)

2. Operation type of data read operation (type)

3. Which indexes are actually used (key)

4. References between tables (ref)

5. How many rows per table are queried by the optimizer

Causes of performance degradation

From a programmer’s point of view

1. The query statement is not written properly

2. No index is created. The index is improperly created or invalid

3. There are too many joins in associated query

From the server point of view

1. The disk space of the server is insufficient

2. The server tuning parameters are incorrectly set

conclusion

1. An index is a well-ordered and quickly searchable data structure. Its purpose is to improve the efficiency of query.

2. After indexes are created, data is queried faster but updated slower.

3. The performance degradation is most likely caused by index failure.

4. Principles for creating indexes. Frequently queried fields are suitable for creating indexes, but frequently updated data is not suitable for creating indexes.

5. Frequent update of index fields or physical deletion of table data may cause index failure.

6. Use Explain to analyze SQL statements

7. In addition to optimizing SQL statements, you can also optimize table design. As far as possible to make a single table query, reduce the association between tables. Design archiving forms, etc.

This is the end of the MySQL index optimization analysis, if there are any errors, you can point out. Click on the recommendation if you think it’s good.

The last

I here organized a MySQL database related information documents, Spring series of family barrel, Java systematic information (including Java core knowledge points, interview topics and the latest 20 years of the Internet, e-books, etc.) friends who need to pay attention to the public number can be obtained.