The string types of the MySQL database are CHAR, VARCHAR, BINARY, BLOB, TEXT, ENUM, and SET. Different types have completely different performance in business design and database performance, of which CHAR and VARCHAR are the most commonly used. Today I’m going to take you through the use of the string types CHAR and VARCHAR.

CHAR and VARCHAR definitions

CHAR(N) is used to hold characters of fixed length. N ranges from 0 to 255. Remember that N is a character, not a byte. VARCHAR(N) is used to store variable-length characters. The value of N ranges from 0 to 65536. N also indicates a character.

For more than 65536 bytes, you can consider using the larger character types TEXT or BLOB, which have a maximum storage length of 4G. The difference is that BLObs have no character set attributes and are purely binary storage.

Different from traditional relational databases such as Oracle and SQL Server, the VARCHAR character type of MySQL database can store a maximum of 65536 bytes. Therefore, in MySQL database, the VARCHAR type is enough for most scenarios.

Character set

In table structure design, in addition to defining columns as CHAR and VARCHAR to store characters, we also need to define the character set corresponding to characters, because each character corresponds to different binary values under different character set encoding. Common character sets include GBK and UTF8, and UTF8 is recommended as the default character set.

In addition, with the rapid development of mobile Internet, it is recommended to set the default MySQL character set to UTF8MB4. Otherwise, some emoji characters cannot be stored in UTF8 character set, such as emoji smiley face, which corresponds to the character encoding 0xF09F988E:

MySQL > insert emoji characters into UTF8 columns

mysql> SHOW CREATE TABLE emoji_test\G

*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: emoji_test

Create Table: CREATE TABLE `emoji_test` (

  `a` varchar(100) CHARACTER SET utf8,

  PRIMARY KEY (`a`))ENGINE=InnoDB DEFAULT CHARSET=utf8



1 row in set (0.01 sec)

mysql> INSERT INTO emoji_test VALUES (0xF09F988E);

ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'a' at row 1
Copy the code

Including MySQL 8.0, the default character set is UTF8MB4. Prior to 8.0, the default character set was Latin1. Because of the different versions of the default character set, you need to explicitly configure the parameters in the configuration file:

[mysqld]

character-set-server = utf8mb4

...
Copy the code

In addition, different character sets, CHAR(N), VARCHAR(N) correspond to the longest byte is different. For example, in the GBK character set, a character can store a maximum of two bytes, and in the UTF8MB4 character set, a character can store a maximum of four bytes. So from the view of the underlying storage kernel, in the multi-byte character set, CHAR and VARCHAR underlying implementation is exactly the same, are variable length storage!

As you can see from the example above, CHAR(1) can store either a single ‘A’ byte or a four-byte emoji smiley face, so CHAR is also inherently longer.

Since the default character set is currently recommended to be UTF8MB4, you can use VARCHAR instead of CHAR to create the same underlying storage.

collation

Collation is a method of comparing and collating strings. Each character set has a default Collation. You can use the SHOW CHARSET command to check this:

mysql> SHOW CHARSET LIKE 'utf8%';

+---------+---------------+--------------------+--------+

| Charset | Description   | Default collation  | Maxlen |

+---------+---------------+--------------------+--------+

| utf8    | UTF-8 Unicode | utf8_general_ci    |      3 |

| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci |      4 |

+---------+---------------+--------------------+--------+

2 rows in set (0.01 sec)



mysql> SHOW COLLATION LIKE 'utf8mb4%';

+----------------------------+---------+-----+---------+----------+---------+---------------+

| Collation                  | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |

+----------------------------+---------+-----+---------+----------+---------+---------------+

| utf8mb4_0900_ai_ci         | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |

| utf8mb4_0900_as_ci         | utf8mb4 | 305 |         | Yes      |       0 | NO PAD        |

| utf8mb4_0900_as_cs         | utf8mb4 | 278 |         | Yes      |       0 | NO PAD        |

| utf8mb4_0900_bin           | utf8mb4 | 309 |         | Yes      |       1 | NO PAD        |

| utf8mb4_bin                | utf8mb4 |  46 |         | Yes      |       1 | PAD SPACE     |

......
Copy the code

Collation ends with _ci for Case insensitive (Case Insentive), _cs for Case sensitive, and _bin for comparison by storing binary characters. Note that by default, MySQL strings are collated regardless of size:

mysql> SELECT 'a' = 'A';

+-----------+

| 'a' = 'A' |

+-----------+| | + 1-----------+

1 row in set (0.00 sec)



mysql> SELECT CAST('a' as char) COLLATE utf8mb4_0900_as_cs = CAST('A' as CHAR) COLLATE utf8mb4_0900_as_cs as result;

+--------+

| result |

+--------+

|      0 |

+--------+

1 row in set (0.00 sec)
Copy the code

Keep in mind that most business table structures are designed without setting collation to be case sensitive! Unless you understand what your business really needs.

Modify the character set correctly

Of course, I believe that many businesses have not considered the impact of character set on business data storage in the design, so character set conversion is required in the later stage. However, many students will find that UTF8MB4 characters such as emoji cannot be inserted after performing the following operations:

ALTER TABLE emoji_test CHARSET utf8mb4;
Copy the code

The next time you add a column, if you do not explicitly specify a character set, the character set of the new column will change to UTF8MB4. For existing columns, the default character set will not change. You can check this by running the SHOW CREATE TABLE command:

mysql> SHOW CREATE TABLE emoji_test\G

*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: emoji_test

Create Table: CREATE TABLE `emoji_test` (

  `a` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,

  PRIMARY KEY (`a`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

1 row in set (0.00 sec)
Copy the code

As you can see, the character set for column A is still UTF8, not UTF8MB4. Therefore, commands that correctly modify the column character set should use ALTER TABLE… CONVERT TO… Alter column A character set from UTF8 to UTF8MB4:

mysql> ALTER TABLE emoji_test CONVERT TO CHARSET utf8mb4;

Query OK, 0 rows affected (0.94 sec)

Records: 0  Duplicates: 0  Warnings: 0



mysql> SHOW CREATE TABLE emoji_test\G

*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: emoji_test

Create Table: CREATE TABLE `emoji_test` (

  `a` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,

  PRIMARY KEY (`a`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

1 row in set (0.00 sec)
Copy the code

Business table structure design practice

User gender design

When designing a table structure, you will encounter fields with fixed option values. For example, the Sex field (Sex), only male or female; Or a State field, where valid values are finite states such as run, stop, or restart.

I observed that most developers prefer to use numeric types of INT to store gender fields, such as:

CREATE TABLE `User` (

  `id` bigint NOT NULL AUTO_INCREMENT,

  `sex` tinyint DEFAULT NULL. PRIMARYKEY (`id`))ENGINE=InnoDB;Copy the code

Where, the tinyint column sex indicates the user gender, but this design problem is obvious.

  • Unclear expression: in specific storage, is 0 female or 1 female? Each business may have different unspoken rules;

  • Dirty data: because it is tinyint, so in addition to 0 and 1, the user can insert 2, 3, 4 and so on, there is the possibility of invalid data in the table, and then later clean up, it is very expensive.

Prior to MySQL 8.0, it was possible to use ENUM strings to enumerate types, allowing only a limited number of defined values to be inserted. If SQL_MODE is set to strict mode, an error will be reported if undefined data is inserted:

mysql> SHOW CREATE TABLE User\G

*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: User

Create Table: CREATE TABLE `User` (

  `id` bigint NOT NULL AUTO_INCREMENT,

  `sex` enum('M'.'F') COLLATE utf8mb4_general_ci DEFAULT NULL,

  PRIMARY KEY (`id`))ENGINE=InnoDB

1 row in set (0.00 sec)



mysql> SET sql_mode = 'STRICT_TRANS_TABLES'; Query OK, 0 rows affected (0.00 SEC) mysql>INSERT INTO User VALUES (NULL.'F');

Query OK, 1 row affected (0.08 sec)



mysql> INSERT INTO User VALUES (NULL.'A');

ERROR 1265 (01000): Data truncated for column 'sex' at row 1

Copy the code

The ENUM type is not a standard SQL data type, but a unique string type of MySQL. The error message thrown is also not intuitive, and there is always some regret in this implementation, mainly because constraints were not provided in MySQL prior to 8.0. Since MySQL 8.0.16, the database natively provides the CHECK constraint, which makes it easy to design finite state column types:

mysql> SHOW CREATE TABLE User\G

*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *Table: User

Create Table: CREATE TABLE `User` (

  `id` bigint NOT NULL AUTO_INCREMENT,

  `sex` char(1) COLLATE utf8mb4_general_ci DEFAULT NULL,

  PRIMARY KEY (`id`),

  CONSTRAINT `user_chk_1` CHECK (((`sex` = _utf8mb4'M') or (`sex` = _utf8mb4'F'))))ENGINE=InnoDB

1 row in set (0.00 sec)



mysql> INSERT INTO User VALUES (NULL.'M');

Query OK, 1 row affected (0.07 sec)



mysql> INSERT INTO User VALUES (NULL.'Z');

ERROR 3819 (HY000): Check constraint 'user_chk_1' is violated.

Copy the code

As you can see from this code, the constraint on line 8 defines user_CHk_1 as the value range for column sex, which can only be M or F. At the same time, when line 15 inserts illegal data Z, you can see MySQL explicitly throw a hint for an illegal constraint.

Account password storage design

Remember, in the database table structure design, do not directly store passwords in the database table, once malicious users enter the system, it faces a great risk of user data leakage. For example, in the financial industry, from the perspective of compliance, all user privacy fields need to be encrypted, and even the business itself cannot know the information stored by users (private data such as login password, mobile phone, credit card information, etc.).

I believe that many developers will use MD5 encryption function to store private data, which is not wrong, because MD5 algorithm is not reversible. However, MD5 encrypted value is fixed, such as password, 12345678, its corresponding MD5 25 d55ad283aa400af464c76d713c07ad as a fixed value.

Therefore, you can brute force crack MD5 and calculate the MD5 values of all possible strings. If you cannot enumerate all string combinations, you can calculate common passwords such as 111111, 12345678, and so on. The site I included in the document can be used to decrypt MD5-encrypted strings online.

Therefore, in the design of password storage use, also need to add salt (salt), each company’s salt value is different, so the calculated value is also different. If the salt value is psalt, the password 12345678 in the database is:

Password = MD5 (' psalt12345678 ')Copy the code

Such a password storage design is a fixed salt value encryption algorithm, which has three main problems:

If the salt value is leaked by a (former) employee, there is still the possibility of profiteer hacking by outside hackers.

If a user’s password is leaked, the passwords of other users with the same password will also be leaked.

The MD5 encryption algorithm is always used. Once the MD5 algorithm is cracked, the impact is serious.

So a really good password storage design, should be: dynamic salt + unfixed encryption algorithm.

I prefer to design the password in the following format:

$salt$cryption_algorithm$value
Copy the code

Among them:

  • $salt: indicates the dynamic salt. Each time a user registers, the service generates a different salt value and stores it in the database. To be more sophisticated, you can combine the dynamic salt value with the user’s registration date to create a more dynamic salt value.

  • $cryption_algorithm: indicates the encryption algorithm. For example, v1 indicates the MD5 algorithm, v2 indicates the AES256 algorithm, and v3 indicates the AES512 algorithm.

  • $value: indicates the encrypted string.

The structural design of the table User is as follows:

CREATE TABLE User (

    id BIGINT NOT NULL AUTO_INCREMENT,

    name VARCHAR(255) NOT NULL,

    sex CHAR(1) NOT NULL.password VARCHAR(1024) NOT NULL,

    regDate DATETIME NOT NULL.CHECK (sex = 'M' OR sex = 'F'),

    PRIMARY KEY(id));SELECT * FROM User\G

*************************** 1. row* * * * * * * * * * * * * * * * * * * * * * * * * * *id: 1

    name: David

     sex: M

password: $fgfaef$v1$2198687f6db06c9d1b31a030ba1ef074

 regDate: 2020- 0907 - 15:30:00* * * * * * * * * * * * * * * * * * * * * * * * * * *2. row* * * * * * * * * * * * * * * * * * * * * * * * * * *id: 2

    name: Amy

     sex: F

password: $zpelf$v2$0x860E4E3B2AA4005D8EE9B7653409C4B133AF77AEF53B815D31426EC6EF78D882

 regDate: 2020- 0907 - 17:28:00
Copy the code

In the example above, the passwords of the users David and Amy are 12345678, but because of dynamic salt and dynamic encryption algorithms, they store completely different contents.

Even if the user with ulterior motives gets the current password encryption algorithm, the $cryption_algorithm version can be used to upgrade the password stored by the user to further defend against malicious data attacks.