Normal index and unique index, can't tell the difference

This article is shared in huawei cloud community “What is the difference between a common index and a unique index in MySQL?” , by JavaEdge.

1 Concept Differentiation

Normal index V.S unique index

A normal index can be repeated, but a unique index cannot be repeated like a primary key.

A unique index can be used as a legitimate means of data verification, such as the id number field of a student table, which is artificially specified not to be repeated, then the unique index is used. (Generally set the primary key of student ID field)

Primary key V.S unique index

The primary key ensures that each row of DB is unique and does not duplicate, such as id card, student number, etc. A unique index acts like a primary key. However, there can only be one primary key in a table, not empty, and there can be multiple unique indexes. A unique index may have one record that is null.

For example, the student table:

In the school, student number is generally used as the main key, id number as the only index
In the education bureau, the id number becomes the primary key and the student number serves as the unique index

So who you choose as your primary key depends on your business needs.

2 cases

A resident system where everyone has a unique ID number. If the system wants to check the name by id number:

select name from CUser where id_card = 'ooxx';
Copy the code

The ID_card field is large and the primary key is not recommended. The following options are available:

inid_cardCreate a unique index
Create a normal index

Assuming that the business code ensures that no duplicate id numbers are written, both choices are logically correct. But from a performance perspective, which one?

Assume that none of the values on field K are duplicated.

InnoDB index structure:

3 Querying Performance

select id from T where k=4
Copy the code

Through B+ tree, the sequence is traversed from root to leaf node, and binary search is performed inside the data page:

After a normal index finds the first record (4,400) that meets the criteria, it continues to look for the next record until it hits the first one that does not meet the criteriak=4The record of
When the first unique index meets the criteria, the search is stopped

It looks like the performance difference is small.

InnoDB data is read and written in data page units. That is, when a record is read, it is not read from disk, but read into memory as a whole on a page basis.

Therefore, a normal index has a “find and judge the next record” operation, that is, a pointer search and a calculation.

If k=4 is exactly the last record on the data page, then the next record must be fetched and the next data page must be read.

For integer fields, a data page can hold nearly a thousand keys, so this is actually quite rare. Therefore, when calculating the average performance difference, the cost of this operation can be considered to be negligible for CPU overhead.

4 Updating Performance

What happens when InnoDB inserts a new record (4,400) into a table? This depends on whether the target page of the record to be updated is in memory:

In the memory

The normal index finds the position between 3 and 5, inserts the value, and ends.
Unique index finds the position between 3 and 5,No conflict was judged, insert value, end.

It’s just a judgment difference that costs a tiny amount of CPU time.

Not in the memory

The unique index reads the data page into memory, determines that there is no conflict, inserts the value, and ends.
Normal indexes record updates in the change buffer, end.

Reading data from disk into memory involves random I/O access and is one of the most expensive operations in DB. However, change Buffer can reduce random disk access, so the update performance is significantly improved.

5 Index best practices

There is no difference in query performance between common indexes and unique indexes, and update performance is the main consideration. Therefore, it is recommended to choose normal indexes as much as possible.

If all updates are followed by a query for that record, you should turn off the change Buffer. In other cases, change Buffer can improve update performance. The combination of normal indexes and change buffer can optimize the update of large tables.

Change Buffer also benefits greatly when using mechanical hard drives. Therefore, if you have a “historical” database and are using a mechanical disk for cost reasons, you should pay attention to the indexes in these tables. Try to use normal indexes and increase the change buffer size to ensure the write performance of the “historical” table.

6 Change buffer and redo log

WAL’s core mechanism for improving performance is to minimize random reads and writes. What’s the difference?

6.1 Insertion Process

insert into t(id,k)
values (id1,k1),(id2,k2);
Copy the code

Assuming the current state of k index tree, after finding the location:

The data page where K1 is located is in the buffer pool.
The K2 data page is not in memory

Look at the following flow:

Update process with change Buffer

The arrows in the figure are background operations and do not affect the response to the update request.

This update does the following:

Page1 in memory, directly update memory
If Page2 is not in memory, cache a “insert a row to Page2” message in the change buffer
Log the first two actions to the redo log

At this point the transaction is complete. This update statement is very cheap, just write two places of memory, and then write one place to disk (the first two operations combined write one disk), again sequentially.

6.2 Processing Subsequent Read Requests

select * from t 
where k 
	in (k1, k2);
Copy the code

The read statement follows the update statement, and the data in memory is still there, so neither of these reads is related to the system tablespace or the redo log.

Read process with change buffer

When Page1 is read, it is returned directly from memory.

Does WAL have to read disks to read data later? Do I have to update data from the redo log? Not really. If you look at the state in the figure above, although the data is still on disk, the result is returned directly from memory, and the result is correct.

To read Page2, you need to read Page2 from disk into memory, and then apply the operation log in the change buffer to generate a correct version and return the result. So the data page is not read into memory from disk until it is needed to read Page2.

To sum up, the update performance of these two mechanisms:

Redo log saves I/O for random disk writes (convert to sequential writes)
The change buffer saves I/O consumption for random disk reads

7 summary

Since unique indexes do not use change Buffer, non-unique indexes are preferred from a performance perspective if acceptable to the business.

When to use unique indexes

The problem is that the business may not be able to ensure that duplicate data is written to the business code.

If the business is not guaranteed or the business requires the database to do the constraint

There is no retreat, you have to create a unique index. The significance of this paper lies in that, if a large number of slow insertion of data, low memory hit ratio, more than to provide a troubleshooting idea
For the archive repository scenario, consider using a unique index

For example, online data is only kept for half a year, and then historical data is stored in an archive. At this point, archive the data to ensure that there are no unique key conflicts. To improve archiving efficiency, consider changing the unique index of a table to a normal index.

Check whether the change buffer data is lost after the host restarts unexpectedly

Don’t! Although only memory is updated, the change buffer operations are recorded in the redo log at transaction commit time. So the change buffer can also be retrieved during crash recovery.

Whether data is directly written back to the disk during merge

The merge process

Read data pages from disk into memory (old version data pages);
Find out the change buffer records (possibly multiple) of the data page from the change buffer and apply them successively to get the new data page.
Write redo log: This redo log contains changes to data and changes to the change buffer.

At this point the merge ends.

At this time, the data page and the disk position corresponding to the change buffer in memory have not been modified, so they are dirty pages. After that, they will flush back their own physical data, which is another process.

During the construction of the first example, session A had session B delete the data and then insert the data again, and found that the rows fields in the Explain result had gone from 10001 to 37000.

If you execute delete from t, call idata() and explain alone without session A, you will see that the rows field is still around 10000. What is the reason for this?

If it does not recur, check

Is the isolation level RR (Repeatable Read)?
The table t created is not an InnoDB engine

Why is the explain result incorrect after this sequence of operations?

The delete statement deletes all the data and then inserts 100,000 rows via call idata(), seemingly overwriting the original 100,000 rows.

But session A opened the transaction and did not commit, so the 100,000 rows that were inserted before cannot be deleted. In this way, each row of the previous data has two versions. The old version is the data before the DELETE and the new version is the data marked deleted. Thus, there are actually two copies of the data on index A.

Select * from ‘force index’ where ‘force index’ is not used, ‘explain’ where 100000 rows are scanned. (Subliminal, if this were doubled, perhaps the optimizer would still consider selecting field A as the index more appropriate).

Yes, but this is the primary key, and the primary key is directly estimated by the number of rows in the table. For the number of rows in the table, the optimizer directly uses the show table status value.

Set innodb_flush_log_at_trx_COMMIT and sync_binlog to 0 if your machine is not capable of I/O.

reference

Dev.mysql.com/doc/refman/…

Click to follow, the first time to learn about Huawei cloud fresh technology ~

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Normal index and unique index, can’t tell the difference

1 Concept Differentiation

Normal index V.S unique index

Primary key V.S unique index

2 cases

3 Querying Performance

4 Updating Performance

In the memory

Not in the memory

5 Index best practices

6 Change buffer and redo log

6.1 Insertion Process

Update process with change Buffer

6.2 Processing Subsequent Read Requests

Read process with change buffer

7 summary

When to use unique indexes

Check whether the change buffer data is lost after the host restarts unexpectedly

Whether data is directly written back to the disk during merge

The merge process

Normal index and unique index, can’t tell the difference

1 Concept Differentiation

Normal index V.S unique index

Primary key V.S unique index

2 cases

3 Querying Performance

4 Updating Performance

In the memory

Not in the memory

5 Index best practices

6 Change buffer and redo log

6.1 Insertion Process

Update process with change Buffer

6.2 Processing Subsequent Read Requests

Read process with change buffer

7 summary

When to use unique indexes

Check whether the change buffer data is lost after the host restarts unexpectedly

Whether data is directly written back to the disk during merge

The merge process

Related Posts

Java implementation LeetCode title number: 201-210

Use Python to crawl the names of sponsored companies on the web

GitHub Submission Code essential Guide!