This is the fourth and final article in the series “Java and MySQL Character Set and Encoding”. This series was originally intended to understand MySQL encoding problems. In the process of sorting out, Java byte encoding problems came to mind, so this series was born.

Character encoding and comparison rules

Character encodings have been covered in previous sections. Here is just a little bit, MySQL castrate UTF-8.

1.1 UTF-8 in MySQL

Normally, utF-8 is 1 to 4 bytes long in various parts, such as the system and the Java language itself. MySQL utF-8 is utF8MB3 (UTF-8 most bytes 3), which supports a maximum of 3 bytes. As a result, some UTF-8 characters that are supported by other systems cannot be represented in MySQL, such as emojis.

To be compatible with utF-8, MySQL uses UTF8MB4 encoding scheme to support it. Let’s take a look at what character sets a MySQL system supports:

show charset;
Copy the code

1.2 Comparison Rules

The comparison rule is to compare the size of two characters. The most common way to do this is to use binary comparison sizes. Some may involve case-insensitive comparisons, which can lead to a variety of comparison rules. Each encoding scheme has a number of comparison rules. Common comparison rules (utF-8 as an example) :

show collation like 'utf8%';
Copy the code

2 character encoding and setting of comparison rules

There are four levels of character sets and comparison rules in MySQL:

Server Level Database level Table Level Column levelCopy the code

2.1 Server Level

SHOW VARIABLES LIKE 'character_set_server';
SHOW VARIABLES LIKE 'collation_server';
Copy the code

Of course, we can set this up when the service starts.

[server]
character_set_server=gbk
collation_server=gbk_chinese_ci
Copy the code

2.2 Database Level

SHOW VARIABLES LIKE 'character_set_database';
SHOW VARIABLES LIKE 'collation_database';
Copy the code

You can run the preceding two commands to view the information. Note: Database-level encodings can only be specified at creation time and cannot be changed after creation.

2.3 table level

You can specify when creating a table. If not specified, the database’s encoding and comparison rules are used.

2.4 column level

Columns are generally not encoded in a specific way, but can be changed by command if desired. Take the following example:

ALTER TABLE TABLE_name [[DEFAULT] CHARACTER SET CHARACTER SET [COLLATE]Copy the code

2.5 to modify

For changing character encoding and comparison rules, follow the following rules, and simply say that the two are linked:

  • If you modify only the character set, the comparison rule changes to the default comparison rule of the modified character set.
  • If only the comparison rule is modified, the character set is changed to the character set corresponding to the modified comparison rule.

3 Character encoding transcoding in MySQL communication

3.1 Processes involved

In the process of MySQL communication, it involves the whole process from client -> server -> client, so the coding conversion will be involved in the middle. As shown below (the picture is from the nuggets booklet “How MySQL works: Understanding MySQL from the root”).

3.2 Related Configurations

The MySQL transcoding configuration is as follows:

character_set_client
character_set_connection
character_set_results
Copy the code

Where character_set_client determines what encoding is used to decode the encoding from the client. Character_set_connection is used to convert the character set received by the Character_set_client to the specific encoding set used. Character_set_results refers to the coding set given to the client when the result is returned.

Let’s take a look at some of the database configurations:

3.3 SETTING demo for JDBC Coding

Change the spring JDBC encoding to GBK:

UncategorizedSQLException is the coding problem.

So let’s go back to UTF-8, and we’ll find it’s normal.

4 summary

This article first talked about MySQL character encoding, the theoretical knowledge of comparison rules, and looked at the specific data configuration. And then talk about the character encoding transcoding in MySQL communication. Finally, we addressed the previous article about the interaction between JDBC configuration and Character_set_client.

5 References

How MySQL Works: Understanding MySQL from the Root