If The partition contains Chinese characters when a dynamic partition is created using Hive, the following error message is displayed.

COERCIBLE: Illegal mix of collations (latin1_bin,IMPLICIT) and (UTf8_general_ci,COERCIBLE) for operation '='Copy the code

The reason is that the partition table encoding is different from the global encoding

Two solutions are provided:

  • Solution 1: Modify the mysql configuration

Temporary change: Enter mysql and set the following parameters

set character_set_client = utf8;
set character_set_connection = utf8;
set character_set_results = utf8;
 
SET collation_server = utf8_general_ci
SET collation_database = utf8_general_ci
Copy the code

Permanent change: Modify the mysql configuration file and restart it

[root@ambari03 etc] vim /etc/my.cnf
#Add the following information under [client]
[client]
default-character-set=utf8
#Add the following information to [mysqld]
[mysqld]
default-character-set=utf8
init_connect='SET NAMES utf8'
#Add the following information to [mysql]
[mysql]
default-character-set=utf8
Copy the code

Then systemctl restart mysqld restarts the mysql service

The problem still does not resolve after using this method!

  • Solution 2: Change the encoding of each table in the Hive metadata database in mysql

After logging in to mysql, run the following statement to change the encoding information of the Hive metadata table

alter database hive_meta default character set utf8;
alter table BUCKETING_COLS default character set utf8;
alter table CDS default character set utf8;
alter table COLUMNS_V2 default character set utf8;
alter table DATABASE_PARAMS default character set utf8;
alter table DBS default character set utf8;
alter table FUNCS default character set utf8;
alter table FUNC_RU default character set utf8;
alter table GLOBAL_PRIVS default character set utf8;
alter table PARTITIONS default character set utf8;
alter table PARTITION_KEYS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table PARTITION_PARAMS default character set utf8;
-- alter table PART_COL_STATS default character set utf8;
alter table ROLES default character set utf8;
alter table SDS default character set utf8;
alter table SD_PARAMS default character set utf8;
alter table SEQUENCE_TABLE default character set utf8;
alter table SERDES default character set utf8;
alter table SERDE_PARAMS default character set utf8;
alter table SKEWED_COL_NAMES default character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP default character set utf8;
alter table SKEWED_STRING_LIST default character set utf8;
alter table SKEWED_STRING_LIST_VALUES default character set utf8;
alter table SKEWED_VALUES default character set utf8;
alter table SORT_COLS default character set utf8;
alter table TABLE_PARAMS default character set utf8;
alter table TAB_COL_STATS default character set utf8;
alter table TBLS default character set utf8;
alter table VERSION default character set utf8;
alter table BUCKETING_COLS convert to character set utf8;
alter table CDS convert to character set utf8;
alter table COLUMNS_V2 convert to character set utf8;
alter table DATABASE_PARAMS convert to character set utf8;
alter table DBS convert to character set utf8;
alter table FUNCS convert to character set utf8;
alter table FUNC_RU convert to character set utf8;
alter table GLOBAL_PRIVS convert to character set utf8;
alter table PARTITIONS convert to character set utf8;
alter table PARTITION_KEYS convert to character set utf8;
alter table PARTITION_KEY_VALS convert to character set utf8;
alter table PARTITION_PARAMS convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
alter table ROLES convert to character set utf8;
alter table SDS convert to character set utf8;
alter table SD_PARAMS convert to character set utf8;
alter table SEQUENCE_TABLE convert to character set utf8;
alter table SERDES convert to character set utf8;
alter table SERDE_PARAMS convert to character set utf8;
alter table SKEWED_COL_NAMES convert to character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP convert to character set utf8;
alter table SKEWED_STRING_LIST convert to character set utf8;
alter table SKEWED_STRING_LIST_VALUES convert to character set utf8;
alter table SKEWED_VALUES convert to character set utf8;
alter table SORT_COLS convert to character set utf8;
alter table TABLE_PARAMS convert to character set utf8;
alter table TAB_COL_STATS convert to character set utf8;
alter table TBLS convert to character set utf8;
alter table VERSION convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_client = utf8 ;
-- SET character_set_connection = utf8 ;
 
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_database = utf8 ;
SET character_set_results = utf8 ;
SET character_set_server = utf8 ;
-- SET collation_connection = utf8 ;
-- SET collation_database = utf8 ;
-- SET collation_server = utf8 ;
SET NAMES 'utf8';
Copy the code

After the modification, the table with Chinese partition can be inserted. However, Hive will still report an error even if the data has been inserted

The following error is found in the log

2021-02-23T10:54:264,249 ERROR [hiveserver2-BACKground-pool: thread-248] exec.StatsTask: Failed to run stats task org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Insert of object "org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics@710c136f" using statement "INSERT INTO `PART_COL_STATS` (`CS_ID`,`AVG_COL_LEN`,`BIT_VECTOR`,`CAT_NAME`,`COLUMN_NAME`,`COLUMN_TYPE`,`DB_NAME`,`BIG_DECIMAL_HIGH_VALUE`,`BIG_DECIM AL_LOW_VALUE`,`DOUBLE_HIGH_VALUE`,`DOUBLE_LOW_VALUE`,`LAST_ANALYZED`,`LONG_HIGH_VALUE`,`LONG_LOW_VALUE`,`MAX_COL_LEN`,`N UM_DISTINCTS`,`NUM_FALSES`,`NUM_NULLS`,`NUM_TRUES`,`PART_ID`,`PARTITION_NAME`,`TABLE_NAME`) VALUES (? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,? ,?) " failed : Incorrect string value: '\xE6\xB9\x96\xE5\x8C\x97' for column 'PARTITION_NAME' at row 1)Copy the code

Query ART_COL_STATS from metastore in Mysql

use metastore;
-- show create table PART_COL_STATS;
show full columns from PART_COL_STATS;
Copy the code

The results of some fields are as follows

Field Type Collation
CS_ID bigint(20)
CAT_NAME varchar(256) latin1_bin
DB_NAME varchar(128) latin1_bin
TABLE_NAME varchar(256) latin1_bin
PARTITION_NAME varchar(500) latin1_bin
COLUMN_NAME varchar(767) latin1_bin
COLUMN_TYPE varchar(128) latin1_bin
PART_ID bigint(20)
LONG_LOW_VALUE bigint(20)
LONG_HIGH_VALUE bigint(20)
DOUBLE_HIGH_VALUE Double (53, 4)
DOUBLE_LOW_VALUE Double (53, 4)
BIG_DECIMAL_LOW_VALUE varchar(4000) latin1_bin
BIG_DECIMAL_HIGH_VALUE varchar(4000) latin1_bin
NUM_NULLS bigint(20)
NUM_DISTINCTS bigint(20)
BIT_VECTOR blob
AVG_COL_LEN Double (53, 4)
MAX_COL_LEN bigint(20)
NUM_TRUES bigint(20)
NUM_FALSES bigint(20)
LAST_ANALYZED bigint(20)

You can see that the encoding of the PARTITION_NAME field is still latin1, so you need to change the encoding of the PARTITION_NAME field to UTF8

use metastore;
alter table PART_COL_STATS modify column PARTITION_NAME varchar(500) character set utf8;
Copy the code

Re-insert the table with Chinese partition after modification, it can work normally

Reference cases are as follows

create database mydb;
use mydb;
Create table stU without partition
create table stu
(
    name string,
    age  int
) row format delimited fields terminated by '\t';

-- (2) Upload data to the HDFS directory

Create table EMP
create table emp
(
    name string,
    age  int
)
    partitioned by (provice string)
    row format delimited fields terminated by '\t';

-- ④ Insert data into emP table with Chinese partition
insert into emp partition(provice ="Hubei")select * from stu;
Copy the code

References:

Illegal mix of collations reported hive error

Illegal mix of collations

Garbled characters displayed in Hive Chinese partition Comments Garbled characters are displayed