To start a new project, a fierce operation like a tiger, combing the process and drawing. This is the beginning of the process and table structure.


Me: bar, bar, bar…





Eldest: this build table why also set a self-increment ID? Direct use serial number (user number/product number) as the primary key is not on the line?





Create table ID; create_time; update_time; The Java Development Specification does the same.





Partner :(agrees) yes, the rule is so!





Eldest brother: serial number in you this is the only index? Set it to the primary key, so that you don’t have to use the ID, and you don’t have to go back to the table. Okay?





I:… (It seems very reasonable, I dare not say.)





Boss: Since they stipulate, then you go back and find out why you want to design a self-increasing ID?





I: take out small notebook this (go back to check data ~).








[liuzhirichard] Record technology, development and source code notes in work and study. From time to time, share what you’ve seen and heard in your life. Welcome to guide!


Build table specifications

At work, when creating a table, the DBA also reviews the table SQL to see if it complies with the specification and if common fields are indexed.

CREATE TABLE `xxxx` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Increment primary key',
  `create_time` datetime(3) NOT NULL DEFAULT current_timestamp(3) COMMENT 'Creation time',
  `update_time` datetime(3) NOT NULL DEFAULT current_timestamp(3) ON UPDATE current_timestamp(3) COMMENT 'Update Time'.PRIMARY KEY (`id`) USING BTREE,
  KEY `idx_create_time` (`create_time`) USING BTREE,
  KEY `idx_update_time` (`update_time`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COMMENT='Table comment';
Copy the code

So in my use process, the serial number is set to a separate field, such as trans_NO, but this time I encountered a question: since trans_NO is unique, why not directly use trans_NO as id?

Let’s start to understand why step by step by referring to relevant materials.

A primary key

What is a primary key

Dev.mysql.com/doc/refman/…

Let’s focus on the last sentence:

When choosing primary key values, consider using arbitrary values (a synthetic key) rather than relying on values derived from some other source (a natural key).

Create a primary key using MySQL’s default primary key instead of the value generated by the service.

Characteristics of a primary key

In short:

Non-empty, unique, little or no change.

How do I add primary keys

You can specify it when create creates the table, or you can add a primary key after the ALTER statement, but it is officially recommended that you specify it when you create the table.

Why add a primary key

  1. The primary key uniquely identifies this row of data, ensuring that only this row of data is operated on during a delete update operation.
  2. Indexes are required. Each InnoDB table has a special index, called a clustered index, for storing row data. In general, cluster indexes and primary keys are synonymous.
    1. Declare the primary key. InnoDB uses the primary key as the cluster index.
    2. When undeclared, the first index is found at the location of all key columns UNIQUE, NOT NULL, and is used as the cluster index
    3. If the UNIQUE index is not declared and no suitable UNIQUE index can be found, a hidden cluster index GEN_CLUST_INDEX is generated internally. The hidden row ID is 6 bytes and monotonically increases.

The index

InnoDB engine is only introduced here, please refer to the official documentation, and the introduction is relatively simple.

The index classification

  1. Clustered indexes: Table stores are organized by the value of the primary key column to speed up queries and sorting involving primary key columns. Clustered indexes were introduced along with primary keys.
  2. Secondary index: also known as secondary index, in which the corresponding primary key column and secondary index column are recorded. When searching according to the secondary index, it first obtains the corresponding primary key column according to the secondary index, and then searches in the clustered index according to the primary key. A long primary key is generally not recommended, because a long primary key uses more space for secondary indexes.

Supplement:

Table back: query the primary key value in the secondary index, and then fetch the query from the clustered index according to the primary key. Index coverage: The secondary index records the primary key column and the secondary index column. If I only query the value of the primary key column and the value of the secondary index column, there is no need to return to the table.

The physical structure of the index

InnoDB uses a B+ number data structure, which builds a B+ tree based on the clustered index value (primary key /UNQIUE/ or self-generated). The leaf nodes hold row records, so each leaf node can also be called a data page. The default size of each data page is 16K and customization is supported.

Data insertion

When data is inserted, InnoDB leaves 1/16 of the page free for future insertion and update of index records.

  1. Sequential insertion (ascending or descending) : fills up about 15/16 of the remaining index page
  2. Random insertion: only 1/2 to 15/16 of the capacity will be used

In random inserts, there is frequent movement, paging, resulting in a lot of fragmentation, and the index tree is not compact enough. The sequential insertion method has more compact data and higher space utilization.

conclusion

Q&A

Q: What are back tables and index overwrites?

A:

  1. Table back: query the primary key value in the secondary index, and then fetch the query from the clustered index according to the primary key.
  2. Index coverage: The secondary index records the primary key column and the secondary index column. If I only query the value of the primary key column and the value of the secondary index column, there is no need to return to the table.

Q: Why is the primary key ID auto-increment set?

A:

  1. It uniquely identifies a row of data and uses primary keys when InnoDB builds index trees.
  2. The self-added ids are sequential, which ensures compact data in the index tree, higher space utilization, and less splitting and merging of data pages, improving efficiency.
  3. Generally, the use of mobile phone number and ID number as the primary key cannot guarantee the sequence.
  4. The serial number is generally relatively long, such as 28 bits, 32 bits, etc., too long will take up more space for the secondary index. At the same time for business needs, the serial number has a certain randomness.

conclusion

This article mainly through consulting the information, understand why to set a business irrelevant increment ID used as the main key, many simple content, such as InnoDB B+ tree, page splitting and page merging, insert process, etc., are not in-depth research, interested partners can be more in-depth research.

In addition to setting an increment ID as the main key when building a table, do you also encounter a situation in the process of business development: user logout and data deletion are logical deletion rather than physical deletion.

This article is relatively simple, inadequate, I hope we can point out.