Last time we knew what index is, today we are going to talk about the use and optimization of index

The index type

Clustering index

  1. PRIMARY KEY index: The PRIMARY KEY index is usually specified during the creation of the table. If this parameter is not specified, a (row_id) PRIMARY KEY index is generated by default.

  2. Non-empty unique index: If a non-empty unique index exists, row_ID will not be generated

Non-clustered index (secondary index)

  1. UNIQUE indexes: UNIQUE indexes are UNIQUE and can be specified at table creation time or created after the table is created.

  2. “INDEX” : The only function of a normal INDEX is to speed up queries.

  3. Composite INDEX: A composite INDEX creates a ** “multiple field INDEX”. The concept is that a composite INDEX query follows the “leftmost prefix rule” ** as opposed to a single column INDEX above.

The full text indexing

The underlying data structure: Full-text index The underlying level is implemented using inverted indexes, which are an index structure like B+ tree indexes. This index structure creates a secondary table that stores the mapping between the word and one or more documents in which the word resides. This secondary table is usually implemented using associative arrays.

Let’s say we create a table

column

Product description


id

The serial number

Since the primary key

age

age


name

The name of the


Primary key index: complete data on leaf nodes, non-leaf nodes only primary key and page number related information

Use the name to createSecondary indexes: A leaf node contains name and ID, while a non-leaf node contains only name and page number

Create it with name and ageComposite index: Leaf nodes contain name, age, and ID. Non-leaf nodes contain only name, age, and page number

On the back to the table

What is a table-back query? In simple terms, “when you query data through secondary indexes, you cannot obtain complete rows. You need to query primary key indexes again to obtain rows.”

For example, query:

select * from t where name =’aa’;

We use the secondary index query index_name with two characteristics:

  • Two B+ tree indexes are used, one secondary index and one clustered index.

  • Sequential I/O is used to access secondary indexes, and random I/O is used to access clustered indexes.

The more records that need to be returned to the table, the lower the performance of using secondary indexes, even leading some queries to use full table scans rather than secondary indexes. Mysql > select * from idx_name; mysql > select * from idx_name; mysql > select * from idx_name; mysql > select * from idx_name;

The index optimization

  • Using federated indexes
  1. This reduces the number of returns to the table, which increases random I/O. In the example above, we use the secondary index of name to query id and name, and do not return the table. Because the leaf node already has the data we want.
  • Leftmost matching principle
  1. If we use the name age syndication index, the index will not work if only age is in the WHERE condition. Must have a name.
  • Matching column prefix
  1. To index a column is to sort the value of that column in the records of the corresponding B+ tree, and the strings are also sorted.
  • Do not use expressions on indexes
  1. In this case, the unused index storage engine will iterate through all the records and evaluate the expression

  2. Select * from t where age/2>22

  • Keep index columns small
  1. Use a char instead of a varchar and an INT instead of a BIGINT for two reasons:
    1. The smaller the data type, the faster the comparison at query time (this is CPU level stuff)
    1. The smaller the data type, the less storage space the index occupies, and the more records you can fit into a data page, which reduces disk I/O performance costs. This means that more data pages can be cached in memory, which increases read and write efficiency.

  • Index columns should be highly differentiated
  1. Because all values are the same, there is no way to sort, no way to do quick lookups, and if a column with a secondary index has a particularly high number of duplicate values, records retrieved using that secondary index may have to be performed back to the table, which can lead to even greater performance losses.

Welcome to pay attention to, like, leave a message, pat brick, pay attention to the public number [Xiao Wang Ge code] to get more dry goods