Author: spermwhale0 Jane books: www.jianshu.com/p/94d6b75bd…

MySQL coding process

There are many reasons for garbled characters in MySQL, usually related to the character_set parameter. Let’s take a look at the parameters:

SHOW VARIABLES LIKE "character%";
Copy the code
Variable_name   Value
character_set_client    utf8
character_set_connection    utf8
character_set_database  utf8
character_set_filesystem    binary
character_set_results   utf8
character_set_server    utf8
character_set_system    utf8
character_sets_dir  /usr/local/ Cellar/[email protected]/5.7.24 / share/mysql/charsets /Copy the code

The most important ones are character_set_client and Character_set_Results. What is the use of these two parameters?

When the client enters a command into MySQL, MySQL only knows that the command is the byte stream 0101, but does not know the encoding. The first parameter character_set_client tells MySQL that the command is utF-8 encoded, so MySQL will decode the byte stream using UTF-8. When MySQL successfully decodes, it converts the command content to the encoding of the target table.

The encoding of the table can be viewed with the following command:

SHOW FULL COLUMNS FROM student;
Copy the code

MySQL character_set_client is set to UTF-8 and the table encoding is GBK. INSERT INTO student VALUES (‘ xiaomin ‘, 12) on utF-8, MySQL will decode the command using UTF-8, convert the “xiaomin” character INTO the GBK encoding, and save the table.

The other parameter character_set_results refers to the encoding of the output of the query result. If the encoding of the table is GBK and character_set_Results is set to UTF-8, then the content queried in the table is first converted to UTF-8 encoding and then output to the terminal.

The process of reading and writing MySQL data can be shown as follows:

As can be seen from the figure, garbled characters occur when the decoding/encoding process for storing in the table does not correspond to the decoding/encoding process for reading the table.

To change character_set_client and character_set_results, you can easily run the following command:

SET names gbk;
Copy the code
Variable_name   Value
character_set_client    gbk
character_set_connection    gbk
character_set_database  utf8
character_set_filesystem    binary
character_set_results   gbk
character_set_server    utf8
character_set_system    utf8
character_sets_dir  /usr/local/ Cellar/[email protected]/5.7.24 / share/mysql/charsets /Copy the code

In this way, character_set_client and Character_set_Results are changed to GBK.

Utf-8, GBK, and Latin-1

Utf-8, GBK, and Latin-1 are the three most common encodings in MySQL.

  • They are both backwards compatible with ASCII. The same ASCII encoded string has the same result when converted to UTF-8, GBK, and Latin-1. Therefore, if the client passes the SET NAMES latin1 directive, character_set_client can be decoded and executed regardless of whether the CHARACTER_set_client is SET to UTF-8, GBK, or Latin-1.
  • Latin-1 is a single-byte encoding in the range 0x00-0xFF. This means that any 8-bit binary byte can correspond to a character in Latin-1.
  • The representation range of UTF-8 is much larger than that of GBK. All Latin-1 characters can be converted to UTF-8 characters, but not necessarily GBK characters.

The above points provide conditions for MySQL to “error in and error out”. The so-called error-in and error-out refers to the phenomenon that the character encoding of the client is different from that of the final table, but the output without garbled characters can be obtained as long as the encoding of the saved and fetched character sets is the same.

Wrong in the wrong

Let’s first consider this command:

INSERT INTO table VALUE("Ah");
Copy the code

Assuming the terminal encoding is GBK, the binary representation of “ah” is 10110000 10100001. MySQL takes the command and decodes it using the character_set_client encoding.

  • ifcharacter_set_clientGBK, which MySQL will recognize as an “ah” character;
  • ifcharacter_set_clientisLatin-1, MySQL will treat it as two separate Latin-1 characters (10110000) (10100001), which will be decoded to °¡ .
  • ifcharacter_set_clientIt is UTF-8, and since 10110000 10100001 is not a valid UTF-8 encoding, either an error is reported or an error identifier is replaced. In this case, if the direct deposit table, you can not achieve “error in, error out”.

Therefore, a necessary condition for error in and error out is to set character_set_client to Latin-1. If you set it to GBK or UTF-8, it cannot be guaranteed to decode correctly.

This is the decoding process. Once the decoding is complete with Latin-1, the data is stored in the target table.

  • If the target table is Latin-1 encoded, the decoded data can be stored directly into the table.
  • If the target table is utF-8 encoded, the decoded data is converted to UTF-8 encoded before being stored in the table.
  • If the target table is GBK encoded, not every Latin-1 encoded character can be found in GBK, so errors may be reported during transcoding.

Therefore, another condition for error in and error out is that the target table must be latin-1 or UTF-8 encoded.

When read, MySQL converts the data in the target table to the encoding specified by Character_set_Results. Because of the Latin-1 we used for writing, we also need to specify character_set_results as Latin-1 for reading. So you end up with “error in, error out.”

For example

Suppose we have a student table like this:

| name | age | | -- - | -- - | | xiaoming 12 | | | | | 10 small redCopy the code

The name column is encoded as Latin-1, and the data stored there is encoded as GBK.

This means that the person storing data into the table may execute the following statement using GBK terminal:

SET NAMES latin1;
INSERT INTO student VALUES ('Ming', 12);
Copy the code

So, if the terminal code we are using is UTF-8, how do we query the information about Ming from the table?

  1. MySQL > select * from ‘MySQL’;
SELECT * FROM student WHERE name = "Xiao Ming";
Copy the code

But doing so led to a mistake:

ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
Copy the code

MySQL’s default user terminal uses utF-8 encoding, which is inconsistent with latin-1 for tables, so MySQL first tries to convert queries to Latin-1. However, latin-1 does not have an encoding for “xiao Ming”, so an error will be reported.

  1. If I add a changecharacter_set_client“, what happens?
SET NAMES latin1;
SELECT * FROM students WHERE name = "Xiao Ming";
Copy the code

This time MySQL will assume that the user’s terminal is latin-1, so no conversion is done. But the result of the final query is empty.

This is because the encoding of the user terminal is UTF-8, so the encoding of the “Xiaoming” passed in is ALSO UTF-8, while the data in the table is GBK encoding, and they are stored in different forms in memory. Therefore, even if MySQL treats them both as latin-1, they are not considered equal.

  1. MySQL > convert query statement to GBK code in Shell and pass it to MySQL:
echo " SET names latin1; SELECT * FROM student WHERE name = 'xiaowei ';\
| iconv -f utf8 -t gbk\
| mysql -uroot -p123 -Dtest
Copy the code

The role of iconv is to convert standard input to the specified encoding format (GBK in this case), and then pass the standard output to MySQL. We get:

The name age С � � 12Copy the code

The result is available, but the name part is garbled. This is because the data stored in the table is encoded in GBK, while the terminal code is UTF-8. So you need to add a final step: convert the results of the query to UTF-8.

echo " SET names latin1; SELECT * FROM student WHERE name = 'xiaowei ';\
| iconv -f utf8 -t gbk\
| mysql -uroot -p123 -Dtest\
| iconv -f gbk -t utf8
Copy the code

The output is:

Name Age Xiaoming 12Copy the code

In this way, we finally got the right information.

If the tables themselves were GBK encoded, rather than Latin-1, would such a tedious process be necessary?

The answer is no. Because as long as the character_set_client and character_set_results are set correctly, MySQL will automatically convert the table to character_set_results, despite the GBK encoding.

The resources

  • Remember, never use “UTF8” in MySQL
  • Mysql modifies character set

Learning Materials Sharing

12 sets of core technology data of microservice, Spring Boot and Spring Cloud have been collected and prepared. This is part of the data directory:

  • Spring Security authentication and authorization
  • Spring Boot Project practice (background service architecture and operation and maintenance architecture of small and medium-sized Internet companies)
  • Spring Boot Project (Enterprise rights Management project)
  • Spring Cloud Micro-service Architecture Project (Distributed transaction solution)
  • Spring Cloud + Spring Boot + Docker
  • Spring Cloud website project practice (real estate sales)
  • Practice of Spring Cloud micro-service Project (large-scale e-commerce architecture system)
  • Single point landing base to combat
  • Spring Boot Project actual practice (Enterprise wechat ordering system) (primary practice)
  • Spring Cloud Internet Application Project (Weather forecast system)
  • Spring source depth analysis + annotation development full set of video tutorials
  • Practice of Spring Boot Project (Financial product system)

Directory screenshot:

Public account backstage to collect information: