Author: IT Wang Xiaoer

Blog: itwxe.com

The last article to partners about SQL query performance optimization related skills, a good query SQL is inseparable from a reasonable index design. This small two to Lao Lao how reasonable design of an index to optimize our query speed, if there is unreasonable place… HMM..

Of course, just kidding, you are welcome to correct!

The foundation of index design

What are the cornerstones of index design?

  • Xiao Ji believes that the foundation of index design is the reasonable design of data table field type, that is, choosing the appropriate field type and setting the appropriate length.
  • If the correct data type is selected, more indexes can be stored in a data page during field index creation, and more indexes can be loaded into memory at a time. In addition, the height of B+tree and disk I/O are reduced, which is of great significance for improving MySQL performance.

In general, the choice of field type is determined by the business and generally follows the following points.

  • Determine the appropriate large types: numbers, strings, dates and times, binary, and so on.
  • Determine the specific type: unsigned, value range, variable length fixed length, etc.
  • Try to choose smaller data types because they generally have better performance and take up less hardware resources.
  • Try to define fields asNOT NULL, avoid usingNULL.

The following types of tables are taken from the beginner’s tutorial as a reminder.

1. Value type

type Size (bytes) Range (signed) Scope (unsigned) use
TINYINT 1 (128, 127) (0, 255) A small integer value
SMALLINT 2 (32768, 32767) (0, 65535) Large integer value
MEDIUMINT 3 (8388-8388608, 607) (0, 16777215) Large integer value
INT or an INTEGER 4 (2147483648, 2147483647) (0, 4294967295) Large integer value
BIGINT 8 (9233372036854775808, 9223372036854775807) (0, 18446744073, 709551615) Maximum integer value
FLOAT 4 (-3.402823466e +38, + 1.175494351E-38), 0, (1.175494351E-38, + 3.402823466351e) E-38 0, (1.175494351, 3.402823466 e) Single-precision floating point value
DOUBLE 8 (1.7976931348623157E+308, 2.2250738585072014E-308), 0, (1.7976931348623157E+308) E-308 0, (2.2250738585072014, 1.7976931348623157 e+308) Double – precision floating – point value
DECIMAL For DECIMAL(M,D), M+2 if M>D otherwise D+2 Depends on the values of M and D Depends on the values of M and D Small numerical

Optimization suggestions:

  • If integer data does not have negative numbers, such as id numbers, it is recommended to specify the type as UNSIGNED and double the capacity.
  • Integers are usually the best data type because they are fast and can use AUTO_INCREMENT.
  • You are advised to use TINYINT instead of ENUM, BITENUM, and SET.
  • It is recommended to use integer types to calculate and store real numbers. One way is to multiply the real number by the corresponding multiple before operating on it. Another approach is to use two fields to store integers and decimal places respectively.
  • DECIMAL is best for data that requires high accuracy and is used for computation, such as price, amount, and so on, but be careful with length Settings when using DECIMAL types.
  • Avoid using integer display widths, that is, do not specify field display widths like INT(5), use INT instead.

Note: INT(2) is set to display width, not integer length, and needs to be used with ZEROFILL.

CREATE TABLE user(
   id TINYINT(2) UNSIGNED
);
Copy the code

For example, if the id is set to TINYINT(2) UNSIGNED, which means UNSIGNED, the maximum value that can be stored is 255, where TINYINT(2) does not make any sense without ZEROFILL. For example, if the number 200 is longer than two digits, it can be successfully inserted. The query result is also 200. If 5 is inserted, the query result is also 5.

CREATE TABLE user(
   id TINYINT(2) UNSIGNED ZEROFILL
);
Copy the code

When TINYINT(2) is used with ZEROFILL, when inserting 5, the actual storage is still 5, but MySQL will add 0 in front of the query, that is, the actual query is 05.

2. The value is a string

type Size (bytes) use
CHAR 0-255. Fixed-length character string, char(n) If the number of characters inserted is less than N (n indicates the number of characters), space is inserted for supplementary saving. Trailing Spaces are removed during retrieval.
VARCHAR 0-65535. Variable length character string. N in varchar(n) indicates the maximum number of characters. If the number of characters is less than N, no space is added
TINYBLOB 0-255. A binary string of up to 255 characters
TINYTEXT 0-255. Short text string
BLOB 0-65535. Long text data in binary form
TEXT 0-65535. Long text data
MEDIUMBLOB 0-16777215. Medium length text data in binary form
MEDIUMTEXT 0-16777215. Medium length text data
LONGBLOB 0-4294967295. Very large text data in binary form
LONGTEXT 0-4294967295. Maximal text data

Optimization suggestions:

  • CHAR is used when the string is short and all values are of a fixed length or close to a length. VARCHAR is used when the string lengths differ greatly.
  • CHAR and VARCHAR are suitable for any combination of letters and numbers that cannot exceed 255 characters, such as the name of a person, telephone number, or code. Do not use the VARCHAR type to save the numbers used in the calculation, as this may cause computation-related problems and may affect the accuracy and completeness of the calculation.
  • VARCHAR(255) takes up a lot of storage space during index building, so prefix indexes can be used when data is not required to be completely accurate. For example,idx_name_age_position(name(20), age, position), take the first 20 characters as the index, but in this case because it is an incomplete field, soorder by name ascorgroup by nameThe sort process cannot use index sort. Of course, it is necessary to ensure the accuracy of data and search speed, and the optimal scheme is to use the full-text search engine ES.
  • Try not to use BLOB and TEXT. If you do, consider storing BLOB and TEXT fields in a separate table with primary key ids.
  • Neither BLOB nor TEXT can have default values. The BLOB series stores binary strings, independent of the character set; The TEXT series stores non-binary strings, which are associated with the character set.

3. Time type

type Size (bytes) The scope of format use
DATE 3 The 1000-01-01 to 1000-01-01 YYYY-MM-DD Date value
TIME 3 ‘- 838:59:59’ to ‘838:59:59’ HH:MM:SS Time value or duration
YEAR 1 From 1901 to 2155 YYYY Year value
DATETIME 8 1000-01-01 00:00:00 to 9999-12-31 23:59:59 YYYY-MM-DD HH:MM:SS Mixes date and time values
TIMESTAMP 4 1970-01-01 00:00:00 to 2038-01-19 03:14:07 (Greene Time) YYYYMMDD HHMMSS Mix date and time values, time stamps

Optimization suggestions:

  • The minimum time granularity that MySQL can store is seconds. Of course, if you need to be precise to the millisecond level, of course, you can do it by adding a new column and keeping the millisecond value in another column.
  • It is recommended to use MySQL’s built-in DATE, TIME, and DATETIME types to store TIME, rather than strings or timestamps, so that the logic can be handled through MySQL’s DATE function.
  • When you do not need a specific time, you are advised to use the DATE data type to save the DATE. The default DATE format in MySQL isYYYY-MM-DD.
  • This parameter is available when the data format is TIMESTAMP and DATETIMECURRENT_TIMESTAMPAs (after MySQL5.6), MySQL automatically returns the current exact time when the record was inserted. However, it is important to calibrate the time and time zone of the MySQL runtime environment, such as Linux time or docker container time and time zone.
  • TIMESTAMP is the UTC TIMESTAMP, time zone dependent; DATETIME is stored as an integer YYYYMMDD HH:MM:SS, regardless of the time zone.
  • TIMESTAMP is recommended for short term projects or small business projects, since such projects tend not to last until 2038 and DATETIME is also more space-efficient. But if it is Tencent, Ali, JINGdong will generally use DATETIME, because there is no need to consider the future time limit problem of TIMESTAMP.

Second, index design principles

1. Index not built, code first

Generally speaking, consider what type and length should be used for each field in the table. After building a table, you need to do not immediately build an index, but first complete the relevant subject business development, and then take out the SQL related to the table and then build an index.

2. Joint indexes cover conditions as far as possible

One or two or three joint indexes should be designed. Each joint index should contain where, ORDER BY, and group BY fields in SQL statements as much as possible. At the same time, the order of the columns in the joint index should meet the left-most prefix principle of SQL queries as much as possible.

3. Do not build indexes on small cardinality fields

The index cardinality refers to the number of different values in the table for this field. For example, if a table has a total of 1 million rows, and there is a gender field, the gender field has three values: male, female, and secret, the cardinality of this field is 3.

If you create an index for such a small cardinality field, because there are only three values in the index tree, male, female and secret, it is impossible to perform a fast binary search, and also need to query back table, rather than full table scan.

Generally build indexes, try to use those large cardinality of the field, so as to give play to the advantages of B+ tree fast binary search.

4. Where conflicts with order by

When there is a conflict between where and order by, which index should be designed for where first? Or should you design the index for Order BY first?

Usually, the index is designed for WHERE first, because the index is used to quickly filter out the data that meets the criteria, and then the filtered data is sorted and grouped. However, the data that is quickly filtered out by the WHERE criteria is usually not very much.

5. Optimize slow query SQL

Specific index optimization and code optimization strategies are carried out for slow query SQL found in the actual production operation process or in the testing environment with large data volume.

Three, index design combat

Finally, it’s time for real combat. Little Two loves real combat.

Write here have to make fun of, this golden three silver four job-hopping season, the year before leaving, the results have not finished on the closure of the village for two weeks, wu Wu…

In the last section, Xiao Er mentioned that there would be a very interesting small case, so in the current epidemic situation, the door can not go out, I feel this example is more interesting, let’s discuss how various social platforms do user information search.

Social media platforms have a favorite function, search for friends information, such as a skilled tap province… City.. Gender.. Age.. Height…

Cough cough cough… How could dick do something like this? Dick only cares about code. Yes, that’s it.

This can be said to be for the user information query screening, usually this kind of table is very large amount of data, in the case of not considering the sub-database sub-table, how to optimize through the index with SQL?

Usually when we write SQL is like the following SQL to execute, there are where, ORDER by, limit and other conditions to query.

select xx from user where xx=xx and xx=xx order by xx asc limit xx,xx;
Copy the code

Then xiao 2 slowly increase the fields one by one to analyze how to design the index according to the business scenario.

For example, people usually select their city and gender first, so how to design the index?

where province = xx and city = xx and sex = xx
Copy the code

In this case, it is simple to design a joint index (provice, city, sex).

So this time small partner affirmation should Chou Chou age group again, hey hey 😊

where province = xx and city = xx and sex = xx and age > = 18 and age < = 28
Copy the code

(provice, city, sex, age) (provice, city, sex, age) (provice, city, sex, age)

Well, yes, there’s nothing wrong with that, but have you ever thought of someone who likes both handsome men and beautiful women? Many girls like both handsome boys and pretty girls.

So this time the little sister will not search for gender, so this time the joint index can only use the first two fields, so does not meet our professional standards ah, how to do? At this time there is a way, we just need to move a small head to change the SQL on the line, in no choice of gender judgment, change to the following can be.

province=xx and city=xx and sex in ('male'.'female') and age > = 18 and age < = 28
Copy the code

So there are other equivalent fields like hobbies.

province=xx and city=xx and sex in ('male'.'female') and hobby = 1 and xx = xx and age > = 18 and age < = 28
Copy the code

(provice, city, sex, hobby, xx, age).

So if there are range queries, such as height, weight range, last logged in time, etc.

For this type of multi-range query, to make good use of the index, you can use a fixed range if the business allows, and then the database field stores the range identifier, which translates into equivalent matching, which makes good use of the index.

For example, the last login time field does not record the last login time. Instead, it records the setting field is_LOGin_WITHIN_seven_days. The final index design is (provice, city, sex, hobby, xx, is_LOGin_WITHin_seven_days, age).

So according to the final design of the scene of the index may have covered most of the query traffic, so if there are other parts of the heat of the query how to do, the way is very simple ah, and then add one or two indexes.

For example, you can query the popular city (score) by adding a synindex (provice, city, sex, score).

It can be seen that the index must be combined with the scene to design, the idea is to try not to use more than 3 complex joint index to resist most of the more than 80% of the common query traffic, and then use one or two secondary indexes to resist some of the very use query traffic.

The above is the index design to share with you, if you can move your rich hands to small two point a free praise is better ~

MySQL transaction and lock mechanism

Copy the code