MySQL use optimization mainly from the optimization of library table structure, the use of appropriate indexes, optimization of query and other aspects.

Optimize the library table structure

Talk about data types

String type

There are many different types of strings represented in MySQL. Common ones are Char and VarChar,BLOB and TEXT, and ENUM with some optimizations. We need to recognize the differences in storage and use of different data types, and then use them wisely to achieve optimization.

For an introduction to data types in MySQL, see my other blog post: Common Operations in MySQL

Char or VarChar

Char is a fixed-length type. MySQL allocates sufficient space based on the defined length of the string, and MySQL removes all trailing Spaces when storing Char values. For data that changes frequently, a CHAR is used to store the data because the CHAR type is less likely to fragment when it changes.

VARCHAR is a variable length type, which is more space-efficient than CHAR, but VARCHAR is prone to fragmentation when data changes, so it is generally used as a data type for data that does not change often. VARCHAR uses one or two extra bytes to record the length of the string, one extra byte if the maximum column length is 255 bytes or less, and two bytes otherwise. For example,VARCHAR(10) columns require 11 bytes of storage space, and VARCHAR(1000) columns require 1002 bytes.

Note that in versions 5.0 and higher,MySQL stores or retrieves VARCHAR data types with trailing Spaces, but in versions 4.1 and older,MySQL stores or retrieves VARCHAR data types with trailing Spaces as well as CHAR

BLOB or TEXT

BLOB is a SMALLBLOB, and TEXT is a SMALLTEXT. BLOB and TEXT are string data types designed to store very large amounts of data, stored in binary and string formats, respectively.

Use ENUM instead of string type

MySQL is very compact when storing enumerations, condensing them into one or two bytes depending on the number of list values. MySQL internally stores each value’s position in the list as an integer, and stores a “lookup table” of the “digital-string” mapping in the table’s.frm file.

For example :create table enum_test(e enum(‘apple’,’banana’,’pear’)); insert into enum_test(e) values(‘apple’),(‘banana’),(‘pear’); And then the data that we insert is actually stored as integers in the table.

Numeric types

MySQL can specify widths for integer types, such as INT(1),INT(20), but this doesn’t make sense for most applications. The width of the integer type specified by MySQL is only used to set the number of characters that some MySQL clients can display. For storage and computation,INT(1) and INT(20) do not limit the legal range of values. Both types are the same.

Date and time types

DateTime and TimeStamp

DATETIME and TIMESTAMP are two date types that store data in exactly the same format in MySQL (both yyyY-MM-DD HH: MM: SS), but there are some differences.

DATETIME can hold a wide range of values, from 1001 to 9999 years, with an accuracy of seconds. MySQL uses 8 bytes to store values contained in the DATETIME data type. By default,MySQL displays DATETIME values in a sortable, unambiguous format.

TIMESTAMP holds the number of seconds since midnight (GMT) on January 1, 1970, and is the same as the UNIX TIMESTAMP. TIMESTAMP uses only four bytes of storage space, so it can represent a smaller time range than DATETIME, only from 1970 to 2038. TIMESTAMP also has special properties that DATETIME does not have. By default,MySQL sets the value of the first TIMESTAMP column to the current time if no value is specified at insert time.

Use appropriate indexes

The index optimization

Index optimization is a big area, and this is a brief introduction to some basic uses, followed by a topic on index optimization and design.

The index basis

In MySQL, indexing is implemented at the storage engine layer rather than the server layer. The Mysql index structure includes b-tree index, hash index, R-tree index, full-text index, etc. Different storage engines implement the above index structure differently, and not all storage engines have these five index structure types.

Index type:

There are 5 types of indexes in MySQL:

  1. Plain index: The most basic index, without any restrictions. Default index of type BTREE in MyIASM. Such asALTER TABLE article ADD INDEX index_title_name ON title(100);,CREATE INDEX index_name ON table(column(100)), or simply define the index when the table is createdindex index_title_name(title(100))
  2. Unique index: The value of the index column can be null. Similar to a normal index, except that the value of the index column must be unique. Such asALTER TABLE article ADD UNIQU index_title_name ON title(100);,CREATE UNIQUE INDEX index_name ON table(column(100))Or you can define indexes directly at table creation timeUNIQUE index_title_name(title(100))
  3. Full-text indexes: Mainly used to find keywords in text, not to compare directly with values in the index. Can only be used for MyISAM tables, generating full-text indexes for large data is time-consuming and space-consuming. Such asALTER TABLE article ADD FULLTEXT index_content(content),CREATE FULLTEXT INDEX index_content ON article(content)andFULLTEXT (content).
  4. Primary key index: this is a special unique index that does not allow null values.
  5. Left-most index (composite index): Composite indexes improve MySQL efficiency. Left-most indexes follow the “left-most index” principle. When creating a composite index, place the most frequently used (frequency) column as a constraint on the leftmost column in descending order.

Index method:

Queries that can use b-tree indexes:

  1. Full value matching query
  2. The query that matches the left-most prefix
  3. A query that matches the column prefix
  4. Queries that match range values
  5. Matches the left front row exactly and the range matches the other column
  6. Overwrite the index (just access the index without querying the row)

Limitations on using B-tree indexes

  1. In a multi-column index, the search must start at the left-most column of the index; otherwise, the index cannot be used
  2. In a multi-column index, columns in the index cannot be skipped. For example, a three-column joint index cannot be queried using only the first and third columns and skip the second column.
  3. Indexes cannot be used for not in and <> operations
  4. If there is a range query for a column in the query, all columns to the right of it cannot use the index.

Features of Hash indexes:

A Hash index is implemented based on a Hash table. A Hash index can be used only when the query condition exactly matches all the columns in the Hash index. A Hash index is only applicable to equivalent query and is not applicable to fuzzy query and range query.

For all columns in the Hash index, the storage engine computes a Hash code for each row of the column. The Hash code is stored in the Hash index.

Limitations on using Hash indexes:

  1. The Hash index must be searched twice.
  2. The Hash index cannot be sorted.
  3. Hash indexes do not support partial index lookup or range lookup.
  4. Hash The Hash calculation in the Hash index may have Hash conflicts.

B-tree indexes differ from Hash indexes in many ways. B-tree index can not only speed up the search speed of data, but also achieve sorting and grouping. The leaf node of b-tree index stores the value of the key word of the index, which can directly search the information of the key word through the index to avoid accessing the data row. However, the leaf node of the Hash index stores the Hash code of the keyword information. We need to convert the query information into the Hash and find the corresponding data row in the table to find the data information. Therefore, Hash indexes cannot be used as override indexes.

Overwrite index:

If an index contains the values of all the fields to be queried (the parameters of the WHERE statement, the parameters of the Order BY, the parameters of the group by), then the index is usually referred to as an overwriting index. Overwrite indexes cannot be used by Memory storage engines, nor are they suitable for queries that contain too many columns (such as SELECT *). There are also many advantages to using overridden indexes.

  • Optimize caching to reduce disk I/O operations.
  • Reduce random I/ OS and change random I/ OS to sequential I/ OS.
  • Secondary queries on Innodb primary key indexes can be avoided.

InnoDB stores indexes in the engine

InnoDB is the most famous storage engine of MySQL, so here is a special introduction. InnoDB storage engine supports B+ tree index, full-text index and hash index. The Hash engine supported by InnoDB storage engine is adaptive. InnoDB storage engine automatically generates hash indexes for tables based on their usage. You cannot manually generate hash indexes for a table.

The index in the traditional sense is the B+ tree index, which is the most common and effective index in the current relational database system. Its construction is to use the idea of binary tree, according to the key value pair to quickly find the data. Through the B+ tree index to find the page where the data row is located, and then the database reads the page into memory, and then searches in memory to find the corresponding data.

InnoDB uses row locks and locks rows only when they are modified. Using indexes enables data to lock fewer rows in the query process, increasing the concurrency of data processing and improving database performance.

Considerations for index use

  1. Ensure that the number of columns in the table is independent when searching for data in MySQL. Independent columns are indexes that cannot be part of an expression or arguments to a function. That does not allowselect id from article where id+1=5Otherwise, the index cannot be used.
  2. Indexing a long character column makes the index large and slow. In this case, the prefix index should be used, that is, the first part of the column is selected as the index, and the selection of the prefix index should ensure reasonable index selectivity (as close as possible to 1).
  3. If sorting and grouping are not a concern, in a federated index, you should place the most selective indexes at the top of the index, the most frequently used columns at the top of the index, and the narrowest columns at the top of the index.
  4. usePt - duplicate - key - the checker h = 127.0.0.1Find duplicate and redundant indexes, and then delete duplicate and redundant indexes.

Modify SQL query statements

MySQL connection process and status

MySQL connection status

The communication protocol between the MySQL client and server is “half duplex”. At any time, either the server sends data to the client or the client sends data to the server. The two actions cannot occur simultaneously. For each moment, you can check the current connection status of mysql by running the show full processList Command (the Command column represents the current state).

MySQL is in the following state:

Sleep: The thread is waiting for the client to send a new request

Query: The thread is executing a Query or sending the result to the client.

Locked: At the MySQL server layer, the thread is waiting for a table lock.

Sorting result: Threads are Sorting result sets.

Copying to TMP table [on disk]: Threads are executing queries and Copying their result sets into a temporary table, either doing GROUP BY, file sorting, or UNION operations. If the status is marked on Disk, then MySQL is placing a temporary table in memory on disk.

Analyzing and Statistics: The thread is collecting storage engine statistics and generating query execution plans.

Sending Data: This indicates that the thread is either Sending data between states, generating a result set, or Sending data back to the client.

MySQL > connect to MySQL

  1. The MySQL client sends a query to the server
  2. MySQL server checks query cache first if query cache is enabled. This is done through a size-sensitive hash lookup. If a cache hit, MySQL checks the user’s permissions before returning the query result. If the permissions are appropriate, MySQL returns the result directly in the cache.
  3. The server parses the SQL, preprocesses it, and the optimizer generates the corresponding execution plan.
  4. MySQL invokes the storage engine API to execute the query based on the execution plan generated by the optimizer.
  5. Returns the results of the query to the client.

Query optimization

In the above MySQL query execution process, the server has provided some SQL optimization measures, we also need to understand and use these optimization measures

  1. Use the Explain + SQL query statement to see the efficiency of SQL queries.
  2. Mysql uses a cost-based optimizer. useshow status like 'last_query_cost'You can query the current session’s last_query_cost value to get the cost of the current query calculated by mysql. Mysql will evaluate and get the lowest cost execution plan.
  3. MySQL comes with a “nested loop” that optimizes most of our queries, adjusting the correlation of associated tables for efficient queries.

Optimize data access

The key to optimizing data access is to reduce the number of visits to the data, to retrieve only the data that is needed, and to ensure that the amount of query data issued to the database is only the amount of data that is actually needed.

To optimize data access, you can use:

  1. Use limit to control the number of rows in SQL queries.
  2. In SQL queries with multiple table associations, only the columns of the required table are queried. Try not to use “SELECT *”
  3. Use third-party caching systems to cache frequently queried data.
  4. If the query is scanning a large amount of data but only returns a small number of rows, you can use an index override scan to place the rows that need data in the index.

Refactoring query

  1. Break down large queries into smaller ones. This is especially true for deleting unwanted data, which is typically done in batches with small amounts of data, which can greatly reduce the time that database locks are held.
  2. Decompose associated query reasonably. The decomposition of associated query into single table query can reduce lock competition. At the same time, the results of single table query are associated in the application layer, which can realize the splitting of the database and achieve high performance and scalability. In addition, caching repeated queries can improve efficiency.
  3. Some of MySQL’s subqueries are inefficient (such as subqueries using IN), we should use explain statements to test the cost of the current query, and then decide whether we should use inner join or left (right) outer join to rewrite MySQL’s IN () subquery. However, when we need to return some columns in a table, we can use the EXISTS keyword subquery, which is also more efficient. Note before MySQL5.6
  4. When using the UNION keyword for SQL queries, if you have operations such as limiting the amount of data and sorting, you should use these restrictions in each SQL statement.
  5. Use the sort effects and limit keywords of the primary key instead of the Max and min keywords for maximum and minimum values.
  6. MySQL can make full use of indexed columns by using the limit and offset clause when pagination is required. In addition, in paging, if the offset is too large, an index override scan should be used.
  7. Add indexes to the columns of the ON and using clauses in the associative query, and pay attention to the association order. Add indexes to the table in chapter 2 to improve efficiency.
  8. Ensure that only one table column is involved on the expression of the Group BY and Order BY clauses so that it is possible to optimize the process using indexes.