I am three diamonds, one of you in “Technology Galaxy” to come together for a lifetime of drifting learning. Praise is power, attention is recognition, comment is love! See you next time 👋!

This note is based on the summary of teacher Winter’s “Relearning front end”.

The smallest structure in JavaScript, what do you know about it? I think all of you might think of something like a keyword, or the number 123, or a String character. We’ll start with the smallest units, literals, and runtime types.

Atom

There are two sections here: Grammer and Runtime.

Grammer

  • Literal/Literal
  • Variable
  • Keywords (Keywords)
  • Space/line break (Whitespace)
  • Line Terminator

These are the minimal elements/units we use to make up the JavaScript language through our literals, such as 123, 1.1, 2.2 for a numeric type, followed by our variables and if and else keywords, as well as some symbols, whitespace, line breaks, etc. They don’t have any linguistic effects, but they do make the format of our entire language look better.

Runtime

  • Types
  • Execution Context

The elements of the syntax are actually reflected in the Runtime, where literals can be written in five or six different ways, corresponding to several of the seven basic types of JavaScript. In addition, our variables actually correspond to some storage changes in the Execution Context at runtime. Eventually all of this syntax will result in runtime changes.

Types in JavaScript

  • Number type
    • I’ve known this since I was in elementary school
    • But when it comes to JavaScript, it’s not the concept you understood in elementary school
  • Character type (String)
    • This is a concept that you’ll know when you learn to program
  • Boolean type (Boolean)
    • Said true value
    • True and false in computing abstract the concepts of truth and falsehood from everyday life
  • Object
    • Object has a long history
  • Null
    • It’s a value, but it’s empty
    • – There is a design bug where typeof null variables will return Object, this will have to be tolerated because it will not be fixed.
  • Undefined
    • This value has not been defined
  • Symbol
    • The new base type
    • It kind of takes the place of String
    • Can be used for indexes in Object
    • The biggest difference with String is that String is the same everywhere, as long as you can guess what the content of String is, no matter how many symbols are added before and after it, if someone wants to use it, the properties of the object can always be retrieved.
    • Symbol is not the same, if you do not get the Symbol, there is no way to obtain the content. This is also a unique feature of JavaScript.

Null and Undefined are often mixed up by our front-end engineers, so we don’t use Undefined as an assignment, we only check if the value of a variable is Undefined. But JavaScript objectively allows Undefined assignment. We recommend that you refrain from using Null instead of Undefined whenever we do an assignment.

There are five basic types used in real programming: Number, String, Boolean, Object, and Null.

The Number type

In our concept, “Number” is a Number. In JavaScript, “Number” corresponds to a decimal of a finite Number of digits in our concept.

Number, by definition, is a double float. Many times our understanding of Number is superficial, so we need to understand the Float standard defined by IEEE754 before we can really understand the Number in JavaScript.

A Float is a floating point number, meaning its decimal point can Float back and forth. His basic idea was to break down a number into its “exponents” and “significant digits”. The significant digit of this number determines the precision of the floating-point representation, and the exponent determines the range of the floating-point table.

Floating-point numbers also have a notation that can be plus or minus one digit in total. 0 is positive, 1 is negative.

Floating point numbers defined by IEEE754 include the following:

  • Sign bit | Sign (1 bit) — used to represent positive and negative numbers
  • Exponent | Exponent (11 bits)
  • Precision bit | Fraction (52 bits)

The syntax for Number in the 2018 standard has four parts:

  • Decimal Literal
    • 0,0.,2.,1e3
  • Binary Integeral Literal
    • 0b1110bYou can start with a 0 or a 1
  • Octal Integral Literal
    • 0o100oWe can start with 0 minus 7
  • Hex Integer Literal
    • 0xFF0xYou could start with 0 minus 9, and then A minus F

Decimal case

In decimal notation, let’s say we now have a floating point Number 205.75. What about decimal notation?

The representation in mathematics is: 205.75=2×100+0×10+5×1+7×0.1+5×0.01205.75 =2 \times100 +0 \times10 +5 \times1 +7 \times0.1 + 5 \ times0.01205.75 = 2 x 100 + 0 * 10 + 5 * 1 + 5 + 7 x 0.1 x 0.01

In computer decimal: 205.75=2×102+0×101+5×100+7×10−1+5×10−2205.75 =2 \times10^2 +0 \times10^1 +5 \times10^0 +7 \times10^{-1} + 5 \ times10 ^ {2} = 205.75 2 x 102 + 0 x 100 x 101 + 5 + 7 x 5 x 10-10-1 + 2

The above two formulas give us 200 + 0 + 5 + 0.7 + 0.05 = 205.75

Binary case

Each digit in IEEE754 above is binary, and there are 64 bits in total. What about binary 5.75?

Binary representation in a computer: 5.75 = (1 * 21 + 22 + 0 1 x 20) + (1 + 1 x 2 x 2-1-2) = 5.75 (1 \ times2 ^ 2 + 0 \ times2 ^ 1 + 1 \ times2 ^ 0) + (1 \ times2 ^ + {1} 1 \ times2 ^ {2}) = 5.75 (1 * 21 + 22 + 0 1 x 20) + (1 + 1 x 2 x 2-1-2)

There is a classic floating point error of 0.1. What is going on? Let’s use binary notation to see what’s going on!

First let’s list some precision bits in binary, which we’ll use later when we represent 0.1.


1 2 = 0.5 1 4 = 0.25 1 8 = 0.125 1 16 = 0.0625 1 32 = 0.03125 1 64 = 0.015625 1 128 = 0.0078125 1 256 = 0.00390625 1 512 = 0.001953125 1 1024 = 0.000976562 \frac12 = 0.5 \\ frac14 = 0.25 \\ frac18 = 0.125 \\ frac1{16} = 0.0625 \\ frac1{32} = 0.03125 \\ frac1{64} = \\ frac1{128} = 0.0078125 \\ frac1{256} = 0.00390625 \\ frac1{512} = 0.001953125 \\ frac1{1024} = 0.015625 \\ frac1{128} = 0.0078125 \\ frac1{256} = 0.00390625 \\ frac1{512} = 0.001953125 \\ frac1{1024} = 0.000976562 \ \

When we use binary, we’re actually adding up all the second power values to get our numbers. And since we’re starting with the decimal point we’re going to start with 2−12^{-1}2−1 (12\frac1221). We then use the above quadratic table to find the values that can be added so that the sum is equal to, or closest to, 0.1.

  • Here we see that the first three values are all greater than0.1So are
    0 x Quadratic 0 times the second power
    Until the
    1 16 \frac1{16}
    You can add them up at the beginning. here
    0.01 1 16 = 0.0375 0.01 – \ FRAC1 {16} = 0.0375
    , so the result of addition is0.1Is 0.0375.
  • So we’re going to have to keep looking for numbers that add up to be less than or equal to0.1.
  • Here we find the next one
    1 32 \frac1{32}
    It’s ok, and then you add it up0.09375


0.1 = 0 x 1 2 + 0 x 1 4 + 0 ∗ 1 8 + 1 x 1 16 + 1 x 1 32 = 0.09375 0.1 = 0\times\frac12 + 0\times\frac14 + 0*\ FRAC18 + 1\times\frac1{16} + 1\times\ Frac1 {32} = 0.09375

  • If we want to get closer0.1So we need to keep looking. First of all
    0.1 0.09375 = 0.00625 0.1-0.09375 = 0.00625
    So we need to keep looking for something less than or equal to this number to the 2 power.
  • We find that the next two words 164\frac1{64}641 and 1128\frac1{128}1281 are both greater than 0,00625, up to 1256\frac1{256}2561 are ok.


0.1 = 0 x 1 2 + 0 x 1 4 + 0 ∗ 1 8 + 1 x 1 16 + 1 x 1 32 + 0 x 1 64 + 0 x 1 128 + 1 x 1 256 + 1 x 1 512 = 0.999609375 0.1 = 0\times\frac12 + 0\times\frac14 + 0*\ FRAC18 + 1\times\frac1{16} + 1\times\frac1{32} + 0\times\frac1{64} + 0\times\frac1{128} + 1\times\frac1{256} + 1\times\frac1{512} = 0.999609375

If we change the above formula to binary: 00110011

If we go all the way down, and we add them up, we’ll see that binary is going to loop 0011 all the time. But because IEEE754 has only 52 precision bits for double precision at most. So even if we fill all 52 bits, we can’t add up to 0.1, we can only get closer and closer. So 0.1 is bound to have at least one epsilon accuracy lost in binary. (The explanation here is learned from the video of “Code speaks” UP. Those who want to watch the video can watch it here.)

Type String

String to you is just text, you can write and read words, you can represent words and we put ‘(quotes) in our code to make it a String. But String has a few things we need to know about penetration.

First, we will introduce some knowledge about Character. A String is a String of characters. A String is a String of characters.

That Character is Character in English, but there is no way to represent a Character in a computer. For example, we see the letter A, Chinese characters are all the same shape, in fact, these are glyph, we think character is actually an abstract expression. Then combine the font to create a visible image.

What does a computer say about Character? It uses a Cold Point to represent Character. A Code Point isn’t anything complicated, it’s just a number. For example, if we specify that 97 stands for A, as long as we combine certain type information, we can just use 97 and the information in the font to find A and draw it on the screen.

So the question is, how does the computer store the number 97? We all know that the basic unit of storage in a computer is the byte. Numbers and English only need one byte to store, but Chinese one byte is not enough. To understand String, we need to understand Character, Code Point, and Encoding.

Character Set (String)

Here we discuss the origin and characteristics of these character sets.

  • ASCII
    • Most of you know the ASCII concept
    • In the early years it was missing because the number of characters was small, so we all called the character encoding ASCII
    • But that’s not true. ASCII only has 127 characters
    • These 127 characters are the 127 most commonly used characters in computers, including 26 uppercase, 26 lowercase letters, 0-9 digits, various tabs, special characters, newlines, and control characters. A total of 127 characters are used, so 0-127 is used
    • The ASCII character set was first developed by computers in the United States, so only English was taken care of
  • Unicode
    • Unicode is the later standard that brings together all the characters of the world into one big collection
    • So it’s also called the federated code set.
    • The number of Unicode characters is huge and then divided into blocks, each of which is divided into characters and fonts from different countries
    • In the early days, it was thought that between 0000 and FFFF, the equivalent of two bytes, was enough, but it turned out not to be enough
    • So this creates some design problems as well
  • UCS
    • UCS came about when Unicode was combined with another standardization organization
    • UCS also has a character set ranging from 0000 to FFFF
  • GB (National standard) — The National standard has gone through several generations
    • There are several versions of gbGB2312,GBK(GB3000),GB18030
    • GB2312 is the first version of the NATIONAL standard, is also widely used by a version.
    • GBK is later launched the expansion version, GBK was also thought enough
    • Later, there was a complete version calledGB18030This fills up all the missing characters
    • The character code point in gb is inconsistent with the character point in Unicode
    • But almost every code in the world is ASCII compatible
    • GB scope is relatively small, and Unicode compared to the same group of Chinese, with GB coding than Unicode to save space
  • ISO-8859
    • Similar to the GB, some Eastern European countries designed their own languages with ASCII extensions like GB
    • The 8859 series is ASCII compatible, but incompatible, so it is not a uniform standard
    • And our country did not push it into ISO, so there is no Chinese version in ISO
  • BIG5
    • BIG5 is similar to gb, is commonly used in Taiwan is BIG5, commonly known as big five
    • When we were kids and people didn’t use Unicode, we found that Taiwanese games were unplayable and all the text was garbled
    • This is because they use big five to represent characters
    • Iso-8859 series and BIG5 series of special properties like, are belonging to a certain national language specific coding format
    • But their code points are repeated, so it is incompatible, so there will be garbled code, need to switch the code to see the text normally

Character Encoding

Because the ASCII character set itself is at most one byte, its encoding is exactly the same as the code point, and there is no way to make coding units smaller than one byte. So ASCII doesn’t have an encoding problem, but GB and Unicode do. Because Unicode combines characters from each country, there are a number of different encodings.

Utf-8 (Unicode Transformation Format 8-bit) is a variable-length character encoding for Unicode and a continuation code. The 8 inside is code 8 bytes.

Let’s take a closer look at the principles behind UTF-8 by understanding how strings compile UTF-8.

Before we convert strings, we need to know the length of the UTF-8 encoding structure, which is determined by the size of a single character.

In JavaScript, we can use charCodeAt to check the character size, we can see: English is 1 character, Chinese is 2 characters.

However, the maximum length of a single Unicode character encoding is 6 bytes. Here is a conversion of how many bytes each character size occupies:

  • 1 byte: The Unicode code is 0-127
  • 2 bytes: The Unicode code is 128-2047
  • 3 bytes: Unicode code 2048-0xFFFF
  • 4 bytes: The Unicode code is 65536-0x1ffFFf
  • 5 bytes: The Unicode code is 0x200000-0x3FFFFFF
  • 6 bytes: The Unicode code is 0x4000000-0x7ffFFFFf

Here, the Unicode code points for English and English characters are 0-127, so English has the same length and bytes in Unicode and UTF-8. Both take up only one byte. However, the Unicode code point range for Chinese characters is 0x2E80-0x9FFF, so the maximum length of Chinese characters in UTF-8 is 3 bytes.

Character conversion utF-8 encoding

1. Get the Unicode value of the character

let string = 'in';
let charCode = str.charCodeAt(0);
console.log(charCode); // Return: 20013
Copy the code

So here we get the character size of the Kanji character is 20013.

2. Determine the length of utF-8

In the previous step, we got the size of the character and calculated how many bytes the character takes based on the Unicode length range. Based on our table above, we can see that characters fall in the range 2048-0xFFFF, which is 3 bytes.

3, complement

When converting to UTF-8, we need to use the rules of complement to convert. First let’s look at the complement rules in UTF-8:

  • 1 byte: 0xxxxxxx
  • Two bytes: 110xxxxx 10XXxxxx
  • Three bytes: 1110XXXX 10XXXXXX 10XXxxxx
  • Four bytes: 11110xxx 10XXXXXX 10XXXXXX 10XXxxxx 10XXxxxx
  • 5 bytes: 111110XX 10XXXXXX 10XXXXXX 10XXXXXX
  • 6 bytes: 1111110x 10xxXXXX 10XXXXXX 10XXXXXX 10XXXXXX 10XXXXXX 10XXxxxx

The position of the x code’s complement here. It is special at the time of a byte, starting directly with a zero control bit, followed by seven complement bits. Everything else starts with n 1 + 0’s. There’s a pattern here. Starting with two bytes, the number of 1s in the first byte is the number of bytes. For example, if you have two bytes, you have two 1s and zeros, and if you have three bytes, you have three 1s and zeros. And then the next byte starts with a 10, followed by a complement.

Now that we know the rules for the complement, if it’s A character “A” how do we fill in the complement to get the UTF-8 encoding?

  1. First, the charCode of “middle” is200013
  2. 200013 is located in the2048 - 0xFFFF“, so “medium” takes up three bytes
  3. The 3-byte complement rule for UTF-8 is:1110xxxx 10xxxxxx 10xxxxxx
  4. First of all, I’m going to convert 200013 to binary, which is01001110, 00101101,
  5. The binary sequence of 200013 is then placed in a three-byte part space (i.exThe location of the)
  6. After putting it in place, we get 11100100 10111000 10101101, which is the part in bold that was 200013 binary.
  7. And then finally we converted it11100100 10111000 10101101The 3-byte UTF-8 encoding is converted tohexadecimal,0xE4 0xB8 0xAD

To prove that the result of our transformation is correct, we can use the Buffer in Node.js to verify.

var buffer = new Buffer('in'); 
console.log(buffer.length); / / = > 3
console.log(buffer); // => <Buffer e4 b8 ad>
// Finally get three bytes 0xe4 0xB8 0xad
Copy the code

This section is a reference to utF-8 Encoding through javascript by Zhang Yatao.

String syntax (Grammer)

In the early days JavaScript supported two types of writing:

  • Double quoted string — “ABC”
  • Single quoted string — ‘ABC’

There is no difference between double quotation marks and single quotation marks. The only difference is that when single and double quotation marks are used, single quotation marks can be used as common characters, while single quotation marks can be used as common characters.

There are some special characters in quotation marks, such as “enter”, which requires \n, and “Tab”, which means \t. In double quotation marks, if we want to use double quotation marks, we can also use a backslash: \”. Characters that have no special meaning are preceded by a backslash. (Then the backslash itself is also \\)

These are the “microgrammars” of strings.

Later, newer versions of JavaScript put “back quotes” — ‘ABC’, the key to the left of key 1 on our keyboard. Backquotes are not used very often, and because they are not used very often, they are very good for syntax.

Backquotes are more powerful than the old double quotes, and can parse carriage returns, Spaces, and tabs. In particular, you can insert ${variable name} directly into the string to insert variable concatenation. As long as we don’t use backquotes in there, we can put whatever we want in there.

So how does the JavaScript engine compile the backquotes and decompose the variables inside?

Ab ${x} ABC ${y} ABC ‘

In the back quotes, the JavaScript engine will split it into three pieces, ‘ab${,’} ABC %{,}ab ‘

  • So it looks like this backquote is a whole
  • But actually in our JavaScript engine, oneThe quotation marksFollowed by a string, followed by a stringThe $signandThe left brace, this is the parenthesis relationship of the pair, they introduce the string
  • The middle structure is all the sameThe right translation, followed byA string of charactersAnd, finally,The $signandLeft translationThis whole thing is also a parenthesis relationship
  • So in fact, a backquote creates in fact 4 different new tokens, respectivelystart,In the middle,The end of theAnd, of course, the back quotes, but in the middle we’re going to do this without inserting variables. This is a String Template syntax using four tokens.
  • If we look at it from the JavaScript engine’s perspective, it’s actually the other way around, enclosing some bare JavaScript syntax, and the rest of the string itself.
  • This format

Example — Here we try to use a regular expression to match a single/double quoted string:

// Double quote character regular expression
"(? : [^"\n\\\r\u2028\u2029]|\\(? : [' '\\bfnrtv\n\r\u2029\u2029]|\r\n)|\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4} | \ \ [^0-9ux'"\\ BFNRTV \n\\ r\u2028\u2029])*"(? : [^'\n\\\r\u2028\u2029]|\\(? : [''\\bfnrtv\n\r\u2029\u2029]|\r\n)|\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}|\\[^0-9ux'"\\bfnrtv\n\\\r\u2028\u2029])*'
Copy the code
  • Let’s see, first of all, the blank definition containsenter,slash,\n\r
  • 2028 å’Œ 2029That’s the correspondingpiecewiseandpaging
  • \x å’Œ \uTwo ways to escape
  • Of course we don’t have to die to remember this, just knowbfnrtvThese special characters, as well as the factors considered above.

Boolean type

It’s true and false, and it’s a really simple type, and it’s a really simple type if it’s not used in conjunction with computation.

Null, and Undefined

These two types are common, and they both mean null values. The difference is:

  • NullRepresents a value, but is null
  • UndefinedSemantically, it means that no one set this value at all, so it isIs not defined

Note that Null is a keyword, but Undefined is not a keyword.

Undefined is a global variable, which in earlier versions of JavaScript could be reassigned. For example, if we set Undefined to true, we end up with a whole bunch of things going wrong. But people are usually not that naughty, such naughty people are usually fired by the company.

Although new versions of JavaScript cannot change the value of Undefined globally, it is possible to change the value of Undefined in the domain of local functions. Here’s an example:

function foo() {
  var undefined = 1;
  console.log(undefined);
}
Copy the code

Then null is a keyword, so it doesn’t have that kind of problem, and if we assign a value to null it will report an error.

function foo() {
  var null = 0;
  console.log(null);
}
Copy the code

Just for the record, how can we say Undefined is safest? We don’t use global variables in development. We use void 0 to generate Undefined, because the void operator is a keyword, and whatever follows void will change the value of the following expression to Undefined. So void 0, void 1, void everything is ok, and normally we’ll write void 0, because everybody else does, and it’s more acceptable for everybody to say the same thing.

A small summary

We have another Symbol and Object that we haven’t talked about yet, but we have combined them together in one article. Stay tuned!

You can follow the wechat public account “Technology Galaxy” to learn for life