Before and after end data transmission has to face transcoding problems

by.

Data transfer transcoding

Data in the process of transmission, the browser will to encode the data, if I now have a data {” name “:” test “}, if we transfer data through the get method, the data will be spliced into the back of the url request, such as: localhost: 8080 / SRC/text. HTML? Name = test.

Uris themselves are encoded in ASCII, so characters that are not in the ASCII encoding set will be encoded during transmission. The encoding method is the same as encodeURI’s encoding rules, but the encoding rules are controlled by browsers. Different browsers use different encoding methods (UTF-8,GBK). The encoded data is sent to the server, and the server decodes the data with ISO-8859-1 encoding. The backend personnel obtains parameter data through Request. getParameter(“name”), and the obtained data is decoded, which cannot be specified in the program during the decoding process. For get requests for data request. SetCharacterEncoding (” character set “) decoding rules specified is invalid.

If we pass data through post, the browser will also encode the data, if we set setRequestHeader(“ContentType”,” Application/X-www-form-urlencoded; charset=UTF-8″); The browser will encode the data with the charset value. If it is not set, it is determined by the charset attribute of the meta tag of the web page. The encoded data is sent to the server, and the server uses ISO-8859-1 to decode the data. For post request data from the back-end personnel can use request. SetCharacterEncoding (” character set “) decoding rules specified.

I believe you have found out the cause of garbled code. Because of the data transmitted by the GET method, the transcoding rules of the browser are inconsistent with the decoding rules of the server. There is garbled code. If the data sent by get has Chinese characters and special characters, encodeURI() method will be used in front of the transcoding first, so that the CHARACTERS in THE URL are ASCII encoding set, which saves the transcoding of the browser, and the transcoding rules of encodeURI() are controllable. Affected by the charset attribute in the meta header of the web page,

  1. Of the labelcharsetProperties forutf-8When:
Var data = 'baidu &%$#@baidu'; console.log(encodeURI(data)); // %E7%99%BE%E5%BA%A6&%25$#@baidu console.log(encodeURIComponent(data)); // %E7%99%BE%E5%BA%A6%26%25%24%23%40baiduCopy the code
  1. Of the labelcharsetProperties forGBKWhen:
Var data = 'baidu &%$#@baidu'; console.log(encodeURI(data)); // %E9%90%A7%E6%83%A7%E5%AE%B3&%25$#@baidu console.log(encodeURIComponent(data)); // %E9%90%A7%E6%83%A7%E5%AE%B3%26%25%24%23%40baiduCopy the code

The data obtained by the backend personnel decoded with ISO-8859-1 is generally restored to the bytecode first, and then decoded in the way of the front and back protocols. The decoding rules can also be configured in the configuration file of the server. And send a post request data can use request. SetCharacterEncoding (” character set “) specified decoding rules to achieve unified front and rear end transcoding.

When we need to transfer a large amount of data, complex structure, business scenes, technical implementation needs, we will find that the problem of garbled code still exists, such as

  • The data in JSON format is parsed with special characters.
  • XML data parsing problems occur due to special characters that break the XML format.
  • The transcoding results of some special characters in some languages are inconsistent, and not all special characters can be transcoded.

If we use encodeURI or encodeURIComponent encoding to transmit to the back end, the data decoded by the back end will always result in different results due to the inconsistent transcoding of some special characters. If md5 verification is added, The data passed by the front end will not be parsed into the library due to md5 differences.

So at this time we should consider whether there is a transcoding rule can solve all the above problems? Base64 transcoding you deserve it.

Base64 transcoding

Base64 encoding is the process from binary to character, encoding by the HTML page header mate; The charset attribute of the tag is affected. When the charset attribute is different, the binary generated when the encoding is converted to binary is different, so the final base64 characters are also different.

  1. mateOf the labelcharsetProperties forutf-8When:
<! DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>base64</title> </head> <body> <script SRC = "base64. Min. Js" > < / script > < script > var data = '& % $# @ baidu baidu; console.log(base64encode(data)); // fqYmJSQjQGJhaWR1 </script> </body>Copy the code
  1. mateOf the labelcharsetProperties forGBKWhen:
<! DOCTYPE html> <html lang="en"> <head> <meta charset="GBK"> <title>base64</title> </head> <body> <script SRC = "base64. Min. Js" > < / script > < script > var data = '& % $# @ baidu baidu; console.log(base64encode(data)); // J+ezJiUkI0BiYWlkdQ== </script> </body>Copy the code

Base64 transcoding principle is interested in encyclopedia.

conclusion

Therefore, if the work involves complex content data transmission such as text box input, base64 transcoding is used to avoid the trouble caused by Chinese garbled characters and various special symbols. If it is only passing simple parameters in the URL, encodeURI and encodeURIComponent can be used for transcoding.