The purpose of this article is to try to provide the translation of the core terms in the ECMAScript specification for the evaluation of the peers.

V8. Dev /blog/unders…

The whole world is cold and hot

Mozilla’s Jason Orendorff has written an in-depth look at the bizarre SYNTAX of JS. There are differences in implementation details, but every JS engine faces the same problem with these weird details.

Contains grammar

This article will delve into cover grammar. Inclusion grammar is a way of prescribing grammar for syntactic constructs that at first glance appear to be ambiguous.

For simplicity, we skip the subscript [In, Yield, Await] because it is not important to this article. See article 3 for their meanings and usage.

Limited to check before

Typically, the parser decides which production to use based on a finite lookhead (which follows a fixed number of tokens).

Sometimes, the next token can unambiguously determine which production to use. Such as:

<pre> <i>UpdateExpression</i> : <i>LeftHandSideExpression</i> <i>LeftHandSideExpression</i> ++ <i>LeftHandSideExpression</i> -- ++ <i>UnaryExpression</i> -- <i>UnaryExpression</i> </pre> <! --more-->Copy the code

If we are parsing UpdateExpression and the next tag is ++ or –, we immediately know which production to use. If the next tag isn’t either of them, that’s fine. You can parse LeftHandSideExpression from there, and then decide what to do next.

If the tag after LeftHandSideExpression is ++, the production to use is UpdateExpression: LeftHandSideExpression ++. And then there’s a similar situation with –. If the tag after LeftHandSideExpression is neither ++ nor –, the production UpdateExpression: LeftHandSideExpression is used.

Arrow function argument list, or parenthesized expression?

It’s a little more complicated to distinguish the argument list of arrow functions from the parenthesized expression. Such as:

let x = (a,
Copy the code

Is this the beginning of an arrow function, as in:

let x = (a, b) => { return a + b };
Copy the code

Also an expression with parentheses, as in:

let  x = (a, 3);
Copy the code

The parentheses, whatever it is, could be of any length. So you can’t determine what it is based on the finite tag.

Imagine that we have the following intuitive production:

<pre>
<i>AssignmentExpression</i> :
  ...
  <i>ArrowFunction</i>
  <i>ParenthesizedExpression</i>

<i>ArrowFunction</i> :
  <i>ArrowParameterList</i> => <i>ConciseBody</i>
</pre>
Copy the code

Then you can use finite forward lookup to select the production. If the next tag after AssignmentExpression is (, how do I determine what to parse next? You can parse ArrowParameterList as well as ParenthesizedExpression, but it’s certainly possible to guess wrong.

Very wide new symbol:CPEAAPL

Specification to solve the problem by adding a symbol: CoverParenthesizedExpressionAndArrowParameterList, abbreviated as CPEAAL. CPEAAL indicates that it can be either ParenthesizedexList or ArrowParameterList, but it is not known which one to choose.

The production of CPEAAL is very wide, allowing any constructs that can appear in ParenthesizedExpression and ArrowParameterList:

<pre>
<i> CPEAAPL </i> :
  ( <i>Expression</i> )
  ( <i>Expression ,</i> )
  ( )
  ( <i>... BindingIdentifier</i> )
  ( <i>... BindingPattern</i> )
  ( <i>Expression</i> , <i>... BindingIdentifier</i> )
  ( <i>ArrowFunction</i> , <i>... BindingPattern</i> )
</pre>
Copy the code

For example, the following expressions are valid CPEAAPL:

// Valid ParenthesizedExpression and ArrowParameterList: (a, b) (a, b = 1) (1, 2, 3) (the function foo () {}) / / effective ArrowParameterList: (a) (a, b, a,... b) (a = 1, ... B) // CPEAAPL: (1,... b) (1, )Copy the code

The trailing comma and… Can only appear in ArrowParameterList. Some constructs (such as b = 1) are possible in both cases, but have different meanings: appearing in ParenthesizedExpression is an assignment, and appearing in ArrowParameterList is a parameter with a default value. Values and other PrimaryExpression (or parameter deconstruction mode) that are not valid parameter names can only appear in ParenthesizedExpression. But they can all be present in CPEAAPL.

Used in productionCPEAAPL

Now we can use this very broad CPEAAPL in the AssignmentExpression generation. (note: ConditionalExpression leads to PrimaryExpression through a long chain of production, which is not shown here.)

<pre>
<i>AssignmentExpression</i> :
  <i>ConditionalExpression</i>
  <i>ArrowFunction</i>
  ...

<i>ArrowFunction</i> :
  <i>ArrowParameters</i> => <i>ConciseBody</i>

<i>ArrowParameters</i> :
  <i>BindingIdentifier</i>
  <i>CPEAAPL</i>

<i>PrimaryExpression</i> :
  ...
  <i>CPEAAPL</i>
</pre>
Copy the code

Imagine that we are in the same situation again: after resolving to AssignmentExpression, the next tag is (. This time we can parse CPEAAPL, and we’ll see which production to use later. It doesn’t matter whether you parse ArrowFunction or ConditionalExpression, either way, the next symbol to parse is CPEAAPL!

After parsing CPEAAPL, you can decide which production to use in the original AssignmentExpression that contains the CPEAAPL. This is determined by the tag that follows CPEAAPL.

If the flag is =>, the following production is used:

<pre>
<i>AssignmentExpression</i> :
  <i>ArrowFunction</i>
</pre>
Copy the code

If the tag is something else, use this production:

<pre>
<i>AssignmentExpression</i> :
  <i>ConditionalExpression</i>
</pre>
Copy the code

Such as:

let x = (a, b) => { return a + b; }; / / ^ ^ ^ ^ ^ ^ / / CPEAAPL / / ^ ^ / / behind CPEAAPL mark let x = (a, 3); // ^^^^^^ // CPEAAPL // ^ // the tag following CPEAAPLCopy the code

At this point, you can leave CPEAAPL unchanged and continue parsing the rest of the program. For example, if this CPEAAPL is in ArrowFunction, you don’t need to see if it is a valid ArrowFunction argument list yet, you can check later. (Actual parsers could choose to do validation checks right now, but this is not required from a specification perspective.)

limitCPEAAPL

As previously shown, the CPEAAPL generation is very wide and allows for constructs that are not legal at all (e.g. (1,… A)). After parsing the program grammatically, you need to reject any illegal constructs.

The specification adds the following restrictions:

Static semantics: pre-error

PrimaryExpression : CPEAAPL

  • If CPEAAPL is not includedParenthesizedExpressionIt’s just a grammatical error.

Supplementary grammar

When working with the following production instance

PrimaryExpression : CPEAAPL

The explanation of CPEAAPL is refined by using the following grammar (refine) :

ParenthesizedExpression : ( Expression )

This means: if CPEAAPL appears in PrimaryExpression in the syntax tree, it is actually ParenthesizedExpression, which is its only valid production.

Expression can never be empty, so () is not a valid ParenthesizedExpression. Comma-separated lists such as (1, 2, 3) are created with the comma operator:

<pre>
<i>Expression</i> :
  <i>AssignmentExpression</i>
  <i>Expression</i> , <i>AssignmentExpression</i>
</pre>
Copy the code

Similarly, if CPEAAPL appears in ArrowParameters, the following restrictions apply:

Static semantics: pre-error

ArrowParameters : CPEAAPL

  • If CPEAAPL is not includedArrowFormalParametersIt’s just a grammatical error.

Supplementary grammar

When working with the following production instance

ArrowParameters : CPEAAPL

The explanation of CPEAAPL is refined by using the following grammar (refine) :

ArrowFormalParameters : ( UniqueFormalParameters )

Other inclusion grammar

In addition to CPEAAPL, the specification uses an inclusion grammar for other constructs that seem ambiguous.

ObjectAssignmentPattern, which appears in the argument list of the arrow function, uses ObjectLiteral as an inclusion grammar. This means that ObjectLiteral allows constructs that cannot occur in the actual ObjectLiteral.

<pre>
<i>ObjectLiteral</i> :
  ...
  { <i>PropertyDefinitionList</i> }

<i>PropertyDefinition</i> :
  ...
  <i>CoverInitializedName</i>

<i>CoverInitializedName</i> :
  <i>IdentifierReference Initializer</i>

<i>Initializer</i> :
  = <i>AssignmentExpression</i>
</pre>
Copy the code

Such as:

let o = { a = 1 }; Let f = ({a = 1}) => {return a; }; f({}); // return 1 f({a: 6}); / / return 6Copy the code

The asynchronous arrow function is also ambiguous when using finite forward lookup:

let x = async(a,
Copy the code

Call async or an asynchronous arrow function?

let x1 = async(a, b);
let x2 = async();
function async() { }

let x3 = async(a, b) => {};
let x4 = async();
Copy the code

To this end, grammar defines a symbol CoverCallExpressionAndAsyncArrowHead contains grammar, the principle is similar to CPEAAPL.

summary

This article shows how the specification defines inclusion grammars and uses them when current syntactic constructs cannot be identified based on finite lookup.

In particular, we looked at the distinction between arrow function arguments and bracketed expressions, and how the specification makes liberal use of inclusion grammar when confronted with unreadable constructs, and then limits them with static semantics.