preface

Last the VARCHAR (M) exactly how many bytes? | mysql series (2) share the VARCHAR (M) how many bytes, the largest VARCHAR can save how much a character? And how does understanding these help us in our daily development work? So we need to understand how the data is processed in the storage engine. Let’s use InnoDB as an example here.

InnoDB data record structure

What happened in the executing SQL statements? | mysql series (1) “the article says that the mysql server is responsible for the read data in the table and trading (i.e., written to disk) work is done by the storage engine. InnoDB is a storage engine that stores data from tables to disk. This is also the guarantee of mysql persistence. The way in which the inserted data records are stored on disk is called row format or record format. Mysql has four row formats: Redundant, Compact, Dynamic, and Compressed.

Among them:

  • Redundant seems to be pretty rare these days.

  • Version 5.6 uses Compact by default

  • Version 5.7 uses Dynamic by default

You can view the line format with the following statement.

The following uses Compact, Dynamic, and Compressed formats as examples. A complete record can be divided into two parts: the additional information recorded and the actual data recorded. The Compact, Dynamic, and Compressed row formats differ in the real data section of the record. Where we look for answers is in the additional information recorded.

Recorded additional information

Additional information for a record includes a list of variable-length fields, a list of NULL values, and record header information.

Lists of variable-length fields: VARCHAR(M), VARBINARY(M), various TEXT types, various BLOB types. Columns with these data types are called variable-length fields.

The storage space occupied by variable-length fields is divided into two parts:

  1. Real data content

  2. Number of bytes occupied.

The actual data content is the specific value stored. So what about the number of bytes occupied?

If the strings in the columns are all short, that is, the content is small in bytes, which can be represented by one byte, but if the content of the variable-length columns is large in bytes, it may be represented by two bytes. One byte or two bytes to represent the number of bytes used by the real data.

For VARCHAR(M) :

  • The maximum number of bytes that this type can store for a string is M×L, where L= sets the value of Maxlen in the character set.

  • Assume that the number of bytes it actually stores is Z.

Varchar (100) field name. Assume that our character set represents a character in one byte. Set the value of name to douglea. Then the actual stored bytes of name is 7.Copy the code

If M×L <= 255, then 1 byte is used to indicate the number of bytes occupied by the real string.

If M×L > 255, there are two cases:

  • If Z <= 127, 1 byte is used to indicate the number of bytes occupied by the real string.

  • If Z > 127, 2 bytes is used to indicate the number of bytes occupied by the real string.

So, if you have at most two bytes to represent the number of bytes that the real string takes. The maximum number of two words for energy saving is: 16 binary digits, which is 2 to the power of 16 = 65535. If a VARCHAR(M) type column uses a non-ASCII character set, the maximum value of M depends on the maximum number of bytes that the character set requires to represent a character.

Here’s an example:

  • GBK character set indicates that a character needs a maximum of 2 bytes, so in this character set, the maximum value of M is 32766 (65532/2), that is, it can store a maximum of 32766 characters;

  • The utF8 character set indicates that the maximum value of M in this character set is 21844, which means that a maximum of 21844 (65532/3) characters are stored.

In addition, when varchar(100) and varchar(1000) hold the “Douglea” string, the actual number of bytes used is the same, but the memory footprint is different, which is the specified size.

That’s all for today’s share, welcome to clap bricks!