In my years of experience, the maximum length of a VARCHAR, the choice of string type, is unclear to anyone who uses MySQL. Most articles on the net are mistaken.

This article not only introduces the principle, but also provides a case to teach you to analyze yourself and thoroughly solve your doubts.

Suppose we have a VARCHAR(64) CHARSET UTf8MB4 column that stores the Chinese cn string.

Can you guess how many bytes MySQL uses to store data?

  • A: 4 Bytes

  • B: 5 Bytes

  • C: 8 Bytes

  • D: 9 Bytes

  • E: 10 Bytes

  • F: 10.125 Bytes

  • G: 11 Bytes

  • H: 12 Bytes

  • I: 12.125 Bytes

  • K: 13 Bytes

The correct answer is F and G.

If you didn’t get it right, take 7 to 10 minutes to read this article to solve the puzzle. Happy growth is easy.

The article directories

  • The definition of a VARCHAR

  • The maximum length of a VARCHAR

  • Maximum row size

  • Null column identifies bits

  • Maximum number of bytes in a character set

  • Length identifier bit of a VARCHAR

  • The sample

This article applies to MySQL 5.5/5.6/5.7/8.x

The definition of a VARCHAR

VARCHAR is a variable length string.

Considering that there are many elements in the variable length principle, it is necessary to review the official definition together before breaking it down.

To make this easier to understand, I’ll use the CHAR fixed-length type for comparison. Let’s start with two small examples:

  • VARCHAR(4), stores up to 4 characters, and stores several characters. Number of bytes stored = bytes of data value plus + 1 byte (length identifier, covered later)

  • CHAR(4), stores up to four characters, less than four with trailing Spaces. Number of bytes stored = number of bytes of data value and + complement space

In general, VARCHAR and CHAR are both MySQL string types. They can store multiple characters and the maximum number of characters that can be set. The storage overhead is dependent on the data length and character set. Is the most commonly used string type in MySQL.

CHAR and VARCHAR:

If the PAD_CHAR_TO_FULL_LENGTH mode is enabled, trailing Spaces are not removed during retrieval

CHAR exceeds 255 characters, prompting TEXT or BLOB:

ERROR 1074 (42000): Column length too big for column ''long_char''  (max = 255); use BLOB or TEXT instead

Copy the code

The maximum length of a VARCHAR

In the COMPACT and DYNAMIC rows, the maximum length is affected by several factors:

  • The maximum number of bytes stored in a row

  • Storage costs other than data. The official definition includes the NULL identifier and length identifier

  • A character set that stores characters

The algorithm is as follows:

Maximum length (characters) = (Maximum number of bytes in a row – Number of bytes in a column – Number of bytes in a column)/Maximum number of bytes in a character set. We round down if we have a remainder.

The following step-by-step example demonstrates how to calculate the maximum length.

Maximum row size

MySQL rows default to a maximum of 65535 bytes and are shared by all columns, so the maximum value of VARCHAR is limited by this.

Next, we will create a VARCHAR 65536 bytes to validate this boundary value.

As mentioned earlier, the length of a VARCHAR declaration refers to the number of characters. To convert to 65536 bytes, it is best that a character is only one byte.

The latin1 character set is used.

mysql> create table test_varchar_length(v varchar(65536) not null);
ERROR 1074 (42000): Column length too big for column 'v' (max = 65535); use BLOB or TEXT instead

Copy the code

We can see an error indicating that the maximum line length is 65535 bytes.

If we want to insert a non-empty VARCHAR, its maximum length cannot exceed 65535(line maximum) -2 (length identifier bit) = 65533 bytes (length identifier bit requires 2 bytes to represent 2^16=65536 digits) :

/** Test boundary value 65534, confirm still too large; Mysql > create table test_varCHAR_length (v varchar(65534) not null); ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, Check the manual. You have to change some columns to TEXT or BLOBs /** test boundary value 65533, Mysql > create table test_VARCHAR_length (v vARCHar (65533) not NULL); Query OK, 0 rows affected (0.02sec) /** Query OK, 0 rows affected (0.02sec) /** Query OK, 0 rows affected (0.02sec) /** Mysql > show create table test_varchar_length; +----------------------+------------------------------------------------------------------------------------------------ ------------+ | Table | Create Table | +----------------------+------------------------------------------------------------------------------------------------ ------------+ | test_varchar_length | CREATE TABLE `test_varchar_length` ( `v` varchar(65533) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +----------------------+------------------------------------------------------------------------------------------------ ------------+ 1 row in set (0.00 SEC)Copy the code

Null column identifies bits

In COMPACT and DYNAMIC row formats, the row size includes not only the data column length, but also the nullable identifier, that is, the NULL identifier bit.

If a column is allowed to be null, one bit is required to identify it, and each 8-bit identifier forms a field that is stored at the beginning of each row.

Notice that this identifier bit is not placed on every column, but is shared on every row.

If N nullable fields exist in a table, the NULL identifier requires ⌈N / 8 ⌉ (rounded up) bytes. In this case, the space available for data storage is only 65535 − ⌈ N / 8 ⌉ bytes.

Talk is Cheep

In the row size example, we know that a maximum of 65533 bytes of non-empty VARCHAR columns can be created. MySQL > create VARCHAR (VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR);

Mysql > drop table test_varchar_length; Query OK, 0 rows affected (0.01 SEC) /** Test boundary value 65533, still too large; Mysql > create table test_varCHAR_length (v varchar(65533)); ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, Check the manual. You have to change some columns to TEXT or BLOBs /** check the manual. You have to change some columns to TEXT or BLOBs. Mysql > create table test_varCHAR_length (v varchar(65532)); Query OK, 0 rows affected (0.03 SEC)Copy the code

To calculate the maximum length of a VARCHAR, the nullable identifier bit is the easiest to ignore.

Maximum number of bytes in a character set

The maximum number of bytes in a character set is not difficult to understand.

  • GBK: a single character can contain a maximum of 2 bytes.

  • UTF8: contains a maximum of 3 bytes.

  • UTF8MB4: A single character contains a maximum of 4 bytes.

Suppose there are 6 bytes left for storing characters. Based on the maximum number of bytes occupied by a single character, 3 GBK, 2 UTF8, and 1 UTF8MB4 can be stored.

Length identifier bit of a VARCHAR

Length identifier bit is relatively complex, the introduction of a lot of mistakes on the Internet, also easy to calculate wrong.

It records the number of bytes of data.

The storage cost is 1 byte for smaller than 255 and 2 bytes for larger than 255. The data is divided into 0 to 255(28) and 256 to 65535(216) according to the possible data size, which corresponds to 1 byte and 2 byte.

Note, however, that the calculation is based on the character length of the field declaration, the number of possible bytes, and then the number of bytes for the length flag. For example, VARCHAR(100), the character set is UTF8, the number of possible bytes is 300, and the length is 2 bytes. This is the most inaccurate description on the Internet.

In addition, the length flag bit is the underlying storage overhead and does not occupy the character length of the field declaration. The declared character length is the number of characters in the data. The number of bytes in the data is related to the character set.

For example, VARCHAR(1) can store 1 character, MySQL will find an extra byte to store the length identifier

The sample

The maximum length of a VARCHAR = (maximum row size – NULL identifies the number of bytes in a column – Length identifies the number of bytes)/Maximum number of bytes in a character set. We round down if we have a remainder.

The next step is to verify it through experiments. Some adjustments have been made to the example to make the calculation easier to understand:

  • Do not set nullable columns so that NULL identification columns can be removed

  • In order to reflect the gap of length identifier bits, multiple columns are used to magnify its existence

  • To reflect the possible number of bytes of length, the multi-byte character set GBK is used

Create a table containing two non-empty VARCHars (127) with storage overhead of 127*2 per column (maximum possible number of bytes, GBK characters are 2 bytes)+ length identifier bit 1=255 bytes:

  • The remaining space is 65025 bytes

  • VARCHAR(32511) NOT NULL column (32511 *2(GBK characters in 2 bytes)+2(length identifier in 2 bytes)=65024)

mysql> drop table test_varchar_length; Query OK, 0 rows affected (0.01sec) /** Mysql > create table test_varCHAR_length (v1 varchar(127) not null,v2 varchar(127) not null, VM varchar(32512) not null) CHARSET=GBK; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, Check the manual. You have to change some columns to TEXT or BLOBs /** */ mysql> create table test_VARCHAR_length (v1 vARCHAR (127) not NULL,v2 varchar(127) not null, VM varchar(32511) not null) CHARSET=GBK; Query OK, 0 rows affected (0.02sec)Copy the code

Next, scale up the two fields to 128 characters, and store each column for a maximum of 256 bytes + length identifier bit 2=258 bytes

  • Remaining space 655-258 x 2 = 65019 bytes

  • VARCHAR(32508) NOT NULL column (32508 *2(GBK characters in 2 bytes)+2(length identifier in 2 bytes)=65018) :

mysql> drop table test_varchar_length; Query OK, 0 rows affected (0.01sec) /** Mysql > create table test_varCHAR_length (v1 varchar(128) not null,v2 varchar(128) not null, VM varchar(32509) not null) CHARSET=GBK; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, Check the manual. You have to change some columns to TEXT or BLOBs /** Mysql > create table test_VARCHAR_length (v1 vARCHAR (128) not NULL,v2 vARCHAR (128) not null, VM varchar(32508) not null) CHARSET=GBK; Query OK, 0 rows affected (0.02sec)Copy the code

Congratulations, there are not many people who can see this, but you have been promoted.

So let’s solve the original problem again:

  • In UTF8MB4 characters, Chinese characters need 3 bytes (most Chinese characters need only 3 bytes, and 4 bytes are mainly auxiliary plane characters such as emoji), so “CN” needs 3+3+1+1, a total of 8 bytes

  • VARCHAR(64) CHARSET UTf8MB4, the maximum possible number of bytes is 64*4=256, so 2 bytes are required as the length identifier bit;

  • MySQL generates a 1-byte NULL identifier column to record the NULL identifier.

  • So to store “Cn”, the column needs 8 + 2 bytes, plus 1 byte as the NULL identification column; Because the column is shared by multiple columns, if the table has only one field, the storage overhead should be 11 bytes, otherwise it counts as 10.125 bytes (1/8 equals 0.125)

So the answer is 10.125 or 11 bytes.