Clustered index, also known as clustered index

Definition: The physical order of the data rows is the same as the logical order of the column values (typically the primary key column). Only one clustered index can exist in a table.

SQL > select * from table where id is the primary key, and select * from table where id is the primary key.

The physical order of the rows is the same as the order of the column values. If we query for data with lower ids, the physical address of the row is lower on disk. And because the physical arrangement is in the same order as the clustered index, only one clustered index can be created.

A clustered index can be used to query all columns of a clustered index (except for MyISAM, which has a unique constraint). Non-clustered indexes require a secondary query if the index does not cover the corresponding column, as discussed later. As a result, the speed of clustered indexes tends to be more advantageous in queries.

Create clustered indexes

If no index is created, the system automatically creates an implied column as the clustered index of the table.

  • MySQL > create table with primary key (*); SQL server create table with primary key (*);
create table t1(
    id int primary key.name nvarchar(255))Copy the code
  • Add the clustered index after the table is created

MySQL

alter table table_name add primary key(colum_name)
Copy the code

It is worth noting that it is best to add the clustered index when creating the table. Because of the special physical order of the clustered index, if the index is created again, the order of all the data rows will be moved according to the order of the index column, which will be very time consuming and performance consuming.

Unclustered index

Definition: The logical order of the indexes in this index is different from the physical storage order of the disk. A table can have multiple non-clustered indexes.

In fact, all indexes except clustered indexes are by definition non-clustered indexes, but people want to subdivide non-clustered indexes into normal indexes, unique indexes, and full-text indexes. If you have to compare the non-clustered index to something in real life, then the non-clustered index is like the dictionary of Xinhua dictionary, his structure order and the actual storage order may not be consistent.

Secondary query problems for non-clustered indexes

A non-clustered index leaf node is still an index node, but it has a pointer to the corresponding data block. Therefore, if a non-clustered index query is used, and the query column contains other columns not covered by the index, it needs to conduct a second query to query the data of the corresponding data row on the node.

Have the t1 table:

Clustered index clustered index(ID) and non-clustered index index(username).

Use the following statement to retrieve the column data from non-clustered index nodes without the need for a second query.

select id, username from t1 where username = 'Ming'

select username from t1 where username = 'Ming'
Copy the code

However, when using the following statement, a second query is required to obtain the score of the original data row:

select username, score from t1 where username = 'Ming'
Copy the code

The query efficiency in SQL Server is shown as follows: Index Seek is the time taken by Index Seek, and Key Lookup is the time taken by secondary query. It can be seen that the secondary query cost accounts for a large proportion, up to 50%.

This blog is a simple example: blog.csdn.net/jiadajing26…

The summary is as follows:

Action description Use clustered indexes Use non-clustered indexes
Columns are often sorted by group Should be Should be
Returns data in a range Should be Should not be
One or very few different values Should not be Should not be
Different values of decimals Should be Should not be
Different values of large numbers Should not be Should be
Frequently updated columns Should not be Should be
A foreign key column Should be Should be
The primary key column Should be Should be
Modify index columns frequently Should not be Should be

We need to clarify the following questions

First: the clustered index constraint is unique, does it require that the field also be unique? Don’t ask for uniqueness!

Analysis: SQL > select * from table where primary key (s) are unique; select * from table where primary key (s) are unique; select * from table where primary key (s) are unique; select * from table where primary key (s) are unique; So it is natural to assume that the fields that create the clustered index also need to be unique.

Conclusion: Clustered indexes can be created on any column you want. This is theoretically impossible to specify, or it would be a performance nightmare.

Second: why can a clustered index be created on any column, if the table has no primary key constraint, that is, duplicate row data is possible?

At first glance, this does contradict the clustered index constraint, but it is possible to create a clustered index.

If the UNIQUE attribute is not used to create a clustered index, the database engine automatically adds a 4-byte UniqueiFIER column to the table. If necessary, the database engine automatically adds a Uniqueifier value to the row, making each key unique. This column and column values are for internal use and cannot be viewed or accessed by users.

Third: is a clustered index necessarily better than a non-clustered index?

If you want to query the credits and names of students with credits between 60 and 90, is it optimal to create a clustered index on credits?

A: no. Since only two columns are output, we can create a joint non-clustered index on credits and students’ names. The index at this time forms a covered index, that is, the content stored in the index is the final output data. This index has better query performance than the clustered index based on credits.

Fourth: what describes clustered and non-clustered indexes in a database?

Indexes are described in the form of binary trees, and we can distinguish between clustered and non-clustered indexes by saying that the leaf node of a clustered index is the final data node, while the leaf node of a non-clustered index is still the index node, but it has a pointer to the final data.

Fifth: why is it slower to insert data into a table with a clustered index on the primary key than to create a non-clustered index on the primary key?

Insert rows into a table with a primary key. The constraint of primary key uniqueness makes it necessary to ensure that the inserted rows are not duplicated. Let’s compare the lookup of a clustered index with a non-clustered index: Clustered index because of the leaf nodes is data pages indexed, so if you want to check the uniqueness of the primary key, need to traverse all the data node, but non clustered index is different, because the clustered index on already contains the primary key, so to find the primary key and uniqueness, just need to traverse all the index page (index of the storage space is less than the actual data), This is a lot less IO than iterating through all the rows. This is the real reason why creating a non-clustered index on a primary key is faster than creating a clustered index on a primary key when inserting data.

Refer to the address

  • Blog.csdn.net/qq_29373285…

If you like my article, you can follow the individual subscription number. Welcome to leave messages and communicate at any time. If you want to join the wechat group to discuss with us, please add the administrator to simplify the stack culture – little Assistant (lastpass4U), he will pull you into the group.