Today, I met a problem on the Internet and found it very interesting. I helped people solve it.

demand

Insert Chinese data into MySQL database encoded as Latin1, and another system transcodes latin1-encoded string into GBK and sends it as SMS content.

Simple solution

import (
	"golang.org/x/text/encoding/charmap"
	"golang.org/x/text/encoding/simplifiedchinese"
)

func Convert(src string) (string, error) {
	gbk, err := simplifiedchinese.GBK.NewEncoder().Bytes([]byte(src))
	iferr ! = nil {return "", err
	}
	latin1, err := charmap.ISO8859_1.NewDecoder().Bytes(gbk)
	iferr ! = nil {return "", err
	}
	return string(latin1), nil
}
Copy the code

parsing

Latin1 is isO-8859-1, copy a paragraph of introduction, see Baidu Encyclopedia

Because the ISO-8859-1 encoding range uses all the space within a single byte, transmitting and storing byte streams of any other encoding on a system that supports ISO-8859-1 is not discarded. In other words, it’s okay to treat byte streams of any other encoding as ISO-8859-1 encoding. This is an important feature, and the MySQL database defaults to Latin1 to take advantage of this feature. ASCII encoding is a 7-bit container, and ISO-8859-1 encoding is an 8-bit container.

First, a rule for dealing with coding: make sure you use the same set of rules for writing and reading.

According to this principle, how to deal with the problem that the database in the middle link does not support Chinese should first depend on how the other system reads the data: Transcoding the Latin1 encoded string into GBK as the content of SMS, then our task is: The text message content to GBK coding forced transcoding isO-8859-1 and then stored in the database to clear the task, the following is to achieve.

  1. UTF8->GBK, Golang is UTF8 encoding, so first transcode GBK. It is important to note that Encoder.String() should not be used, because this will force the encoded GBK stream to be decoded with Golang’s built-in UTF8 Decoder, and the resulting garbled String will not be able to restore the original GBK stream.

  2. GBK byte stream force isO-8859-1 byte stream, how to do? Is to do nothing…

  3. Iso-8859-1 byte stream ->UTF8 string, I am not sure how to commit []byte in SQL, so a conservative approach is to transcode ISO-8859-1 to UTF8, and then have the database driver convert UTF8 back to ISO-8859-1 to commit.

Another point that can be mentioned, isO-8859-1 does not support Chinese, so directly submit UTF8 Chinese, the database driver will directly replace Chinese? .