Continue answering questions from planet Water:

Shen, I heard on the Internet that the primary key of MySQL table should not be too long when the data volume is relatively large. Is that true? Why is that?

Well, you can’t generalize about this:

(1) If InnoDB storage engine, the primary key should not be too long;

(2) If it is MyISAM storage engine, there is little impact;

First take a simple chestnut to illustrate the prior knowledge.

Suppose there is a data table:

t(id PK, name KEY, sex, flag);

Among them:

(1) id is the primary key;

(2) Name has a common index;

Suppose there are four records in the table:

1, shenjian, m, A

3, zhangsan, m, A

5, lisi, m, A

9, wangwu, f, B

If the storage engine is MyISAM, its index and record structure looks like this:

(1) there is a separate area to store records;

(2) The structure of a primary key index is the same as that of a normal index, which stores Pointers to records.

Voice-over:

(1) Primary key indexes are not stored together with records, so they are Unclustered indexes;

(2) MyISAM can be without PK;

When MyISAM uses an index for retrieval, it first locates from the index tree to the record pointer, and then from the record pointer to the specific record.

Voiceover: Regardless of the primary key index, or normal index, the process is the same.

InnoDB is different. Its indexes and records are structured like this:

(1) Primary key indexes are stored together with records;

(2) Normal indexes store primary keys;

Voice-over:

(1) Primary key indexes are stored together with records, so they are called Clustered indexes;

(2) InnoDB must have clustered indexes;

InnoDB can directly locate row records when querying through primary key indexes.

However, if you use a common index, the primary key is first queried and the index tree is traversed from the primary key index.

Anyway, why shouldn’t InnoDB’s primary key be too long?

Consider a user-centric scenario that contains service attributes such as ID number, ID MD5, name, and date of birth, all of which have query requirements.

The easiest way to think about design is:

  • Id card as the primary key

  • Index other attributes

user(id_code PK.

id_md5(index),

name(index),

birthday(index));

The index tree and row record structure are as follows:

  • Id_code aggregates indexes and associates row records

  • Other indexes that store the value of the ID_code attribute

Id number ID_code is a relatively long string, each index stores this value, in the case of large amount of data, memory precious, MySQL limited buffer, storage index and data will be reduced, disk I/O probability will increase.

Voice-over: At the same time, the disk space taken up by the index also increases.

In this case, an id increment column with no service meaning should be added:

  • Clustered indexes with id increment columns associated with row records

  • Other indexes that store ID values

user(id PK auto inc.

id_code(index),

id_md5(index),

name(index),

birthday(index));

As a result, the limited buffer can buffer more index and row data, reducing the frequency of disk I/OS and increasing overall performance.

conclusion

(1) MyISAM indexes and data are stored separately, index leaves store Pointers, and primary key indexes are not much different from ordinary indexes;

(2) InnoDB’s clustered index and data row are stored in a unified way. Clustered index stores data row itself, and common index stores primary key.

(3) InnoDB does not recommend using too long fields as PK (you can add a self-increasing key PK in this case), MyISAM does not matter;

I hope that answers the question of our water friend.