• pre-notify
  • The encoding supported by Node.js
  • gb2312
  • Utf8 and the murderer hidden in the night
  • BOM
    • The real face of BOM
    • To kill BOM.
  • Make Node support GBK — IconV-Lite
  • string_decoder

pre-notify

This article will continue to improve, also welcome you together “polish”

The encoding supported by Node.js

When we read a file, what encoding can we specify to read it

fs.readFile('./test.txt'.'utf8'.function(err,data){ console.log(data); // Print out the encoded string})Copy the code

Note:

  • Node.js supports UTF8 and Base64 encoding by default, but does not support GB series
  • When no encoding is entered, the default data form is buffer

gb2312

When we create a.txt on the desktop, the default encoding is GB2312, and since Node.js is not supported by default, So the buffer we read through FS cannot be converted into the Chinese characters we want.

[warning] It is the same even if you change the suffix of the TXT file to.js.

Utf8 and the murderer hidden in the night

We can save the text as UTF8

This allowed Chinese to display normally, but we found garbled characters in the initial position.

Why is that?

When we use Unicode encoding, because there are many implementations of Unicode encoding, So it is signed to indicate that the file is utF8 encoded and belongs to UTF8. So that when we open the editor again it won’t be garbled

what? ! Node.js file toString is not garbled when the editor is opened.

Foot numbness, let’s change the editor, eh, WebStorm

what? ! No more gibberish? This is why!!

B: well… Don’t party! Don’t party! There is only one truth!!

Hey! We look carefully, no garbled code, but become a dot.

So there is a reasonable inference to this issue and I webstorm do justice for Node.js!

This is a mess caused by VScode. Node.js should not be backed by vscode.

B: well… There is no garble in Webstorm, but there is a ‘for no reason. Is that bad?

BOM

The real face of BOM

So, what is this’? In fact, as we said above, if it is a Unicode code, it will be signed at the beginning, So this is the same as the signature, the hexadecimal form of the word looks like this

0xFEFF
Copy the code

The converted buffer looks like this

<Buffer ef bb bf>
Copy the code

B: well… Now that we know what it is, it’s just a matter of a few lines of code.

To kill BOM.

let bf2 = fs.readFileSync(path.join(__dirname,'bom.txt'),'utf8');
let result = stripBOM(bf2);
console.log(result.toString());
// console.log(result);

function stripBOM(content){
  if(Buffer.isBuffer(content)){
    if(content[0]===0xEF&&content[1]===0xBB&&content[2]===0xBF){
      return content.slice(3);
    }
    return content;
  }else{/ / is a stringif(content.charCodeAt(0)===0xFEFF){
      return content.slice(1);
    }
    returncontent; }}Copy the code

Make Node support GBK — IconV-Lite

Well, it’s really easy. It’s a bag thing

let iconv = require('iconv-lite');
letString = iconv.decode(buffer to convert, desired encoding) console.log(string)Copy the code

string_decoder

We usually use UTF8 to encode, UTF8 encoding a Chinese character is equal to 3 bytes, but sometimes we can not necessarily get the complete data at one time, may first take 2 and then take 4, so the first time we take the output will generate garbled code.

So we need something to look at for us. When we get a Chinese character we need 3 bytes to print it, otherwise we save it first. That’s one of the things string_decoder does.

let buffer = Buffer.from('Amitabha');
letBuff1 = buffer. Slice (0, 5);let buff2 = buffer.slice(5);
let {StringDecoder} = require('string_decoder');
let sd = new StringDecoder();
console.log(sd.write(buff1).toString());
console.log(sd.write(buff2).toString());
Copy the code