Character sets and collation rules

Each database has a character set and collation, the character set used to specify storage way of string, and ordering rules compare strings used to define the way, they are a one-to-many relationship, also is the rule of a character set may correspond to multiple sort, MySql support more than 30 character set, as well as more than 70 kinds of collation, ALTER DATABASE; ALTER DATABASE; ALTER DATABASE; ALTER DATABASE; ALTER DATABASE; MySql is more flexible than other databases, which use the same character set.

The default collation for the utF-8 CHARACTER SET is UTF8_general_CI.

mysql> SHOW CHARACTER SET;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1|... | utf16le | UTF-16LE Unicode | utf16le_general_ci |4 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)

Copy the code

The last bits of the collation are as follows:

  • _ciIndicates case insensitive
  • _csCase sensitive
  • _binbinary

If we set the collation to _ci, then we should note that the collation is not case-sensitive, i.e. A and A are the same character, as follows:

CREATE TABLE `tb_test` (
  `id` int NOT NULL AUTO_INCREMENT,
  `test` varchar(100) NOT NULL.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Copy the code
INSERT INTO test.tb_test(test)VALUES('a');
INSERT INTO test.tb_test(test)VALUES('A');
Copy the code

Guess what you get from the following query?

SELECT  * from test.tb_test where test ='A'
Copy the code

That’s right, it’s all the values.

mysql> SELECT  * from test.tb_test where test ='A';
+----+------+
| id | test |
+----+------+
|  1 | a    |
|  2 | A    |
+----+------+
2 rows in set (0.00 sec)
Copy the code

What could go wrong?

So the problem is that Base64 is just case sensitive, the characters in Base64 include letters A-Z, A-Z, numbers 0-9, A total of 62 characters, plus symbols “+”, “/”, and as A cushion word “=”, is actually 65 characters.

In Base64, the character A is encoded as QQ== and the character A is encoded as YQ==. If the Base64 values of two data sets are equal in case insensitive, the query will get an incorrect result.

There are two solutions, one is to change the character sorting rules, the second is to change the data type.

Modify the character sorting rule

For the test table above, we can just change the collation of the fields, and this time the query will be case-sensitive.

mysql> ALTER TABLE `tb_test` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_cs_0900_as_cs;
Query OK, 0 rows affected (0.41 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT  * from test.tb_test where test ='A';
+----+------+
| id | test |
+----+------+
|  2 | A    |
+----+------+
1 row in set (0.00 sec)

Copy the code

Or modify the table directly.

mysql> ALTER TABLE tb_test CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_cs_0900_as_cs;
Query OK, 0 rows affected (0.44 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SELECT  * from test.tb_test where test ='A';
+----+------+
| id | test |
+----+------+
|  2 | A    |
+----+------+
1 row in set (0.00 sec)

Copy the code

Modifying data Types

The VARBINARY type is a binary string, that is, contains a byte string, not a character string, and compares the binary of the byte. The maximum width that he specifies is in bytes; to use this data type, you must specify a size.

So you can use this feature to solve the case problem.

mysql> ALTER TABLE tb_test MODIFY test VARBINARY(200);
Query OK, 2 rows affected (0.41 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT  * from test.tb_test where test ='A';
+----+------------+
| id | test       |
+----+------------+
|  2 | 0x41       |
+----+------------+
1 row in set (0.00 sec)

Copy the code