Abstract:

In the process of using MySQL, it is important to understand the concept of character set, character order, and the impact of different Settings on data storage and comparison. The problem of “garbled characters” encountered by many students in their daily work is probably caused by poor understanding of character set and character order and wrong setting. This paper introduces the following contents from shallow to deep: 1.

I. Content Overview

In the process of using MySQL, it is important to understand the concept of character set, character order, and the impact of different Settings on data storage and comparison. The problem of “garbled characters” encountered by many students in their daily work is probably caused by poor understanding of character set and character order and wrong setting.

This paper introduces the following contents from the shallow to the deep:

  1. The basic concept and relation of character set and character order
  2. MySQL supports character set and character sequence setting levels
  3. View and set server, Database, table, and column character sets and character sequences
  4. When to set character set, character order

The concept and relation of character set and character order

MySQL provides different character set support for data storage. In the data comparison operation, it provides different character order support.

MySQL provides different levels of Settings, including server level, database level, table level, and column level, which can provide very precise Settings.

What is character set, character order? To put it simply:

  1. Character set: Defines a character and its encoding.
  2. Collation: Defines the rules for comparing characters.

Here’s an example:

There are four characters: A, B, A, and B. The encoding of these four characters is A = 0, B = 1, A = 2, and B = 3. The character + encoding here constitutes a character set.

What if we want to compare the size of two characters? For example, A, B, or A, B, and the most intuitive way to compare them is to use their codes, like 0 < 1, so A < B.

In addition, for A and A, although they are coded differently, we expect the upper and lower case characters to be equal, i.e., A == A.

Two comparison rules are defined here, and the set of comparison rules is called collation.

  1. The same is uppercase characters, lowercase characters, then compare their encoding size;
  2. If two characters are case – sensitive, they are equal.

MySQL supports character sets and character sequences

MySQL supports multiple character sets and character sequences.

  1. A character set corresponds to at least one character sequence (usually 1 to many).
  2. Two different character sets cannot have the same character order.
  3. Each character set has a default character order.

The above is rather abstract, and we’ll see what’s going on in the next few sections.

1. View the supported character set

You can view the character set supported by MYSQL in the following ways.

A:

mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | ... omitCopy the code

Method 2:

mysql> use information_schema; mysql> select * from CHARACTER_SETS; +--------------------+----------------------+-----------------------------+--------+ | CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | +--------------------+----------------------+-----------------------------+--------+ | big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 | | dec8 | dec8_swedish_ci | DEC West European | 1 | ... omitCopy the code

When usingSHOW CHARACTER SETYou can also add it when viewingWHEREorLIKEQualifying conditions.

Example 1: UseWHEREQualifying conditions.

mysql> SHOW CHARACTER SET WHERE Charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci   |      3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)Copy the code

Example 2: UseLIKEQualifying conditions.

mysql> SHOW CHARACTER SET LIKE "utf8%";
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci    |      3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |      4 |
+---------+---------------+--------------------+--------+
2 rows in set (0.00 sec)Copy the code

2. View the supported character sequence

Similarly, you can view the character sequence supported by MYSQL in the following way.

Method 1: PassSHOW COLLATIONTake a look.

As you can see,utf8There are more than 10 character sequences in the character set. throughDefaultIs the value ofYesTo determine whether the character order is the default.

mysql> SHOW COLLATION WHERE Charset = 'utf8'; +--------------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------------+---------+-----+---------+----------+---------+ | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | ... slightlyCopy the code

Method 2: Queryinformation_schema.COLLATIONS.

mysql> USE information_schema;
mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME           | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci          | utf8               |  33 | Yes        | Yes         |       1 |
| utf8_bin                 | utf8               |  83 |            | Yes         |       1 |
| utf8_unicode_ci          | utf8               | 192 |            | Yes         |       8 |Copy the code

3. Naming conventions for character sequences

The character order is named, prefixed by its corresponding character set, as shown below. Like character orderutf8_general_ciTo indicate that it is a character setutf8The character sequence of.

More rules can be found in the official documentation.

MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2; 
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME  |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
| utf8               | utf8_bin        |
+--------------------+-----------------+
2 rows in set (0.00 sec)Copy the code

4, Server character set, character sequence

Use: If you create a database and do not specify a character set or sequence, the server character set and sequence will be used as the default character set and collation rules for the database.

How to specify: When the MySQL service is started, you can specify this parameter using the command line. It can also be specified as a variable in the configuration file.

Server default character set and character sequence: specified by the compilation parameter when MySQL is compiled.

character_set_server,collation_serverThey correspond to the server character set and server character sequence.

1. Check the server character set and character sequence

correspondingcharacter_set_server,collation_serverTwo system variables.

mysql> SHOW VARIABLES LIKE "character_set_server";
mysql> SHOW VARIABLES LIKE "collation_server";Copy the code

2. Specify when starting the service

You can specify the server character set and character sequence when the MySQL service is started. If not specified, the default character sequences are respectivelylatin1,latin1_swedish_ci

mysqld --character-set-server=latin1 \
       --collation-server=latin1_swedish_ciCopy the code

Specify the server character set separately. In this case, the server character sequence islatin1The default character order oflatin1_swedish_ci.

mysqld --character-set-server=latin1Copy the code

3. Specify the configuration file

In addition to the command line argument, you can also specify it in the configuration file, as shown below.

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8Copy the code

4. Runtime modification

Example: Runtime changes (invalid after restart, need to be written into configuration file if you want to keep the same after restart)

mysql> SET character_set_server = utf8 ;Copy the code

5. Specify the default character set and character sequence at compile time

character_set_server,collation_serverCan be specified at MySQL compile time with the compile option:

cmake . -DDEFAULT_CHARSET=latin1 \
           -DDEFAULT_COLLATION=latin1_german1_ciCopy the code

The character set and character sequence of database

Purpose: Specify database – level character set, character sequence. You can specify different character sets/character sequences for the same MySQL database.

1. Set the character set/character sequence of the data

When creating or modifying a database, run theCHARACTER SET,COLLATESpecifies the character set and collation rules for the database.

Create database:

CREATE DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]Copy the code

Modify database:

ALTER DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]Copy the code

Example: Creating a databasetest_schema, character set toutf8, the default sorting rule isutf8_general_ci.

CREATE DATABASE `test_schema` DEFAULT CHARACTER SET utf8;Copy the code

2, Check the database character set/character sequence

There are three ways to view the character set/character order of a database.

Example 1: Viewtest_schemaCharacter set, collation rules. (Need to switch the default database)

mysql> use test_schema;
Database changed
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.00 sec)Copy the code

Example 2: You can also run the following command to view the informationtest_schemaCharacter set, database (no need to switch default database)

mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME  FROM information_schema.SCHEMATA WHERE schema_name="test_schema";
+-------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+-------------+----------------------------+------------------------+
| test_schema | utf8                       | utf8_general_ci        |
+-------------+----------------------------+------------------------+
1 row in set (0.00 sec)Copy the code

Example 3: You can also view the character set by looking at the statement that created the database.

mysql> SHOW CREATE DATABASE test_schema; +-------------+----------------------------------------------------------------------+ | Database | Create Database | +-------------+----------------------------------------------------------------------+ | test_schema | CREATE DATABASE `test_schema` /*! 40100 DEFAULT CHARACTER SET utf8 */ | +-------------+----------------------------------------------------------------------+ 1 rowin set (0.00 sec)Copy the code

3, How to determine the database character set and character sequence

  • Specified when creating the databaseCHARACTER SETorCOLLATE“, use the corresponding character set and sorting rule.
  • If no character set or collation rule is specified when creating a database, run thecharacter_set_server,collation_serverShall prevail.

Table character set, character sequence

The syntax for creating and modifying a table is as followsCHARACTER SET,COLLATESet the character set and character sequence.

CREATE TABLE tbl_name (column_list)
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]]

ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]Copy the code

Create table and specify character set/character sequence

In the following example, specify character setutf8The default character order is used.

CREATE TABLE `test_schema`.`test_table` (
  `id` INT NOT NULL COMMENT ' ',
  PRIMARY KEY (`id`)  COMMENT ' ')
DEFAULT CHARACTER SET = utf8;Copy the code

2. Check the character set/sequence of table

Similarly, there are three ways to view the character set/character order of a table.

Method 1: PassSHOW TABLE STATUSCheck the table statusCollationforutf8_general_ciThe corresponding character set isutf8.

MariaDB [blog]> SHOW TABLE STATUS FROM test_schema \G;
*************************** 1. row ***************************
           Name: test_table
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 11534336
 Auto_increment: NULL
    Create_time: 2018-01-09 16:10:42
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)Copy the code

Method 2: View informationinformation_schema.TABLESThe information.

mysql> USE test_schema;
mysql> SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = "test_schema" AND TABLE_NAME = "test_table";
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+Copy the code

Method 3: PassSHOW CREATE TABLEConfirmation.

mysql> SHOW CREATE TABLE test_table; +------------+---------------------------------------------------------------------------------------------------------- ------+ | Table | Create Table | +------------+---------------------------------------------------------------------------------------------------------- ------+ | test_table | CREATE TABLE `test_table` ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +------------+---------------------------------------------------------------------------------------------------------- ------+ 1 rowin set (0.00 sec)Copy the code

3, how to determine the table character set and character sequence

Assuming thatCHARACTER SET,COLLATEThe values of are respectivelycharset_name,collation_name. Create table;

  • Has been clear about thecharset_name,collation_name,charset_name,collation_name.
  • Only clear thecharset_name, butcollation_nameIf it is not clear, the character set is adoptedcharset_name, the character sequence is adoptedcharset_nameThe corresponding default character order.
  • Only clear thecollation_name, butcharset_nameIf no, the character sequence is adoptedcollation_name, character set usedcollation_nameAssociated character set.
  • charset_name,collation_nameIf no, use the character set and character sequence of the database.

7, column character set, sort

CHAR, VARCHAR, and TEXT columns can specify character set/character sequence. The syntax is as follows:

col_name {CHAR | VARCHAR | TEXT} (col_length)
    [CHARACTER SET charset_name]
    [COLLATE collation_name]Copy the code

Add column and specify character set/collation

Example :(similar to creating a table)

mysql> ALTER TABLE test_table ADD COLUMN char_column VARCHAR(25) CHARACTER SET utf8;Copy the code

2, Check column character set/character sequence

Examples are as follows:

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA="test_schema" AND TABLE_NAME="test_table" AND COLUMN_NAME="char_column";
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME  |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)Copy the code

3, column character set/collation

Assuming thatCHARACTER SET,COLLATEThe values of are respectivelycharset_name,collation_name:

  • ifcharset_name,collation_nameAre clear, character set, character ordercharset_name,collation_nameShall prevail.
  • Only clear thecharset_name.collation_nameIf this parameter is not specified, the character set ischarset_name, the character sequence ischarset_nameThe default character order of.
  • Only clear thecollation_name.charset_nameIf no, the character sequence iscollation_name, character set iscollation_nameAssociated character set.
  • charset_name,collation_nameIf no, use the character set and character sequence of table.

Select when to set character set and character sequence

In general, you can configure it in three places:

  1. This is done when the database is created.
  2. The mysql Server is configured when it is started.
  3. When compiling mysql from source, configure it by compiling parameters

1. Method 1: Configure the database when creating the database

This approach is flexible and safe in that it does not rely on the default character set/character order. If you specify a character set/sequence when creating a database, you will inherit the character set/sequence of the database if you do not specify it when creating a table or column.

CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;Copy the code

2. Mode 2: Configure the mysql server when it is started

You can add the following configuration so that when mysql Server is started, character-set-server and collation server will be configured.

When you create a database/table/column using mysql Client and do not specify a character set/sequence, use character-set-server/collation-server as the default character set/sequence.

In addition, the character SET/character sequence for client and server connections must be SET to SET NAMES.

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ciCopy the code

3. Method 3: When compiling mysql from source code, set the compilation parameters

If you specify it at compile time-DDEFAULT_CHARSETand-DDEFAULT_COLLATION, then:

  • When creating database and table, they are used as the default character set/sequence.
  • When the client connects to the server, it takes this character set/character sequence as the default character set/character sequence. (Do not SET NAMES separately)
shell> cmake . -DDEFAULT_CHARSET=utf8 \
           -DDEFAULT_COLLATION=utf8_general_ciCopy the code

Write in the back

This paper introduces in detail the MySQL character set, character order related content, this part of the content is mainly aimed at data storage and comparison. There is a very important part that is not covered: character set and character order Settings for connections.

Due to the connection of the character set, character sequence Settings caused by improper garble problems are also very much, this part of the content is not a lot of content to expand the content, in the next article to explain.

Space is limited, some content is not detailed, interested students welcome to exchange, or check the official documents. Please point out any mistakes or omissions.

X. Related links

10.1 Character Set Support

https://dev.mysql.com/doc/refman/5.7/en/charset.html

Copyright Notice: The content of this article is contributed by Internet users, copyright belongs to the author, the community does not have the ownership, also do not assume the relevant legal responsibility. If you find any content suspected of plagiarism in our community, you are welcome to send an email to [email protected] to report and provide relevant evidence. Once verified, our community will immediately delete the content suspected of infringement.

The original link