The character set assigns each character a unique number by which the corresponding character can be found. In the programming process we often use the character, and the premise of using the character is to put the character into memory, no doubt, put into memory is only the character number, not the real character entity.

This raises the question, how do I get character numbers into memory?

For ASCII character sets, this is easy. ASCII contains a total of 128 characters, which can be stored in just seven bits, but since computers use bytes as their basic unit, we can store ASCII in one Byte (that is, eight bits) for ease of operation. This wastes a bit, but improves read and write efficiency.

But with Unicode, the problem is not so simple. Unicode currently contains millions of characters, with the first characters stored in one byte and the last characters in three bytes. We can allocate three bytes of memory for all characters, one or two bytes of memory for low-numbered characters, and three bytes of memory for high-numbered characters.