As we all know, files have different encoding, for example, we commonly used Chinese encoding: UTF8, GK2312 and so on.

In Windows, new files are prefixed with a few characters at the beginning to identify the code.

For example, create a new text file, write the word Hello and save it as UTF8. Hello is five bytes, but the text size is eight bytes. Win7 系统 still like this, win10 has removed the encoding prefix, so win10 下 file size is still 5 bytes. It seems Microsoft itself has changed.)

We use StreamWriter to generate files.

using (StreamWriter sw = new StreamWriter("a.txt"))
{
    sw.Write("Hello");  / / 5 bytes
}

using (StreamWriter sw = new StreamWriter("b.txt".false, Encoding.UTF8))
{
    sw.Write("Hello");  / / 8 bytes
}
Copy the code

StreamWriter’s default encoding is UTF8, so the file size is different.

UTF8Encoding has two private attributes: emitUTF8Identifier and isThrowException, which are passed in by the constructor when initialized.

  • emitUTF8IdentifierIndicates whether to add an encoding prefix
  • isThrowExceptionIndicates whether an error is reported when an encoding error is encountered

Thus, it is possible to control whether or not to add coded prefixes.

Encoding UTF8 is defined as follows.

public static Encoding UTF8 {
    get {
        if (utf8Encoding == null) utf8Encoding = new UTF8Encoding(true);
        returnutf8Encoding; }}Copy the code

The default encoding used in StreamWriter, emitUTF8Identifier=false:

internal static Encoding UTF8NoBOM {
    get { 
        if (_UTF8NoBOM == null) {
            UTF8Encoding noBOM = new UTF8Encoding(false.true);
            _UTF8NoBOM = noBOM;
        }
        return_UTF8NoBOM; }}Copy the code

This is why the two files in the initial code are of different sizes.