The original address, if you are interested in or interested in American stocks can add me wechat: xiaobei060537, exchange 😝.

Vue template compilation principle

In fact, there are many important parts about the internal principle of VUE, such as change detection, template compilation, virtualDOM, and the overall running process.

I have written an article about the implementation principle of change detection called “Simple and Deep – VUE Change Detection Principles”.

Today, I will focus on the implementation principle of this part of template compilation separately.

In this article, I may not say too many details in the article, but I will clearly explain the overall principle of VUE on template compilation, mainly to let readers have a clear idea and understanding of the overall implementation principle of template compilation after reading this article.

The overall logic of Vue compilation principle is mainly divided into three parts, or can be said to be divided into three steps, these three parts are related:

  • The first step is toTemplate stringConverted toelement ASTs(Parser)
  • The second step is rightASTStatic node marking, mainly used for rendering optimization of virtual DOM (optimizer)
  • The third step is to useelement ASTsgeneraterenderFunction code string (code generator)

The parser

The main thing the parser does is convert template strings to Element ASTs, such as:

<div>
  <p>{{name}}</p>
</div>
Copy the code

This is what the simple template above looks like when converted to an Element AST:

{
  tag: "div"
  type: 1,
  staticRoot: false.static: false,
  plain: true,
  parent: undefined,
  attrsList: [],
  attrsMap: {},
  children: [
      {
      tag: "p"
      type: 1,
      staticRoot: false.static: false,
      plain: true,
      parent: {tag: "div". }, attrsList: [], attrsMap: {}, children: [{ type:2,
          text: "{{name}}".static: false,
          expression: "_s(name)"}}}]]Copy the code

Let’s use this simple example to show what happens inside the parser.

This section of the template string will be thrown into the while loop, and then a period of interception, the interception to each a piece of string parsing, until the final cut, also finished parsing.

The above simple template interception process looks like this:

<div>
  <p>{{name}}</p>
</div>
Copy the code
<p>{{name}}</p>
</div>
Copy the code
<p>{{name}}</p>
</div>
Copy the code
{{name}}</p>
</div>
Copy the code
</p>
</div>
Copy the code
</div>
Copy the code
</div>
Copy the code

What is that based on? In other words what are the rules for intercepting strings?

Of course there are

Just by determining whether the template string begins with a < we can tell whether the little string that we’re going to intercept is a tag or text.

An 🌰 :

{
  tagName: 'div',
  attrs: [],
  unarySlash: '',
  start: 0,
  end: 5
}
Copy the code

If you are curious about how to parse tagName and attrs using regular expressions, see the following demo code:

const ncname = '[a-zA-Z_][\\w\\-\\.]*' const qnameCapture = `((? :${ncname}\\:)? ${ncname})` const startTagOpen = newRegExp(`^<${qnameCapture}`) const startTagClose = /^\s*(\/?) >/let html = `<div></div>`
let index = 0
const start = html.match(startTagOpen)

const match = {
  tagName: start[1],
  attrs: [],
  start: 0
}
html = html.substring(start[0].length)
index += start[0].length
let end, attr
while(! (end = html.match(startTagClose)) && (attr = html.match(attribute))) { html = html.substring(attr[0].length)
  index += attr[0].length
  match.attrs.push(attr)
}
if (end) {
  match.unarySlash = end[1]
  html = html.substring(end[0].length)
  index += end[0].length
  match.end = index
}
console.log(match)
Copy the code

Stack

After the data (attrs, tagName, etc.) in the start tag is parsed by the re, an important thing to do is to maintain a stack.

So what does this stack do?

thisstackIs used to record a hierarchy, to record the depth of the DOM.

More precisely, when parsing a start tag or text, whatever it is, the last item in the stack is always the parentNode parent of the node being parsed.

The stack parser pushes the currently parsed node into the children of the parent node.

You can also set the parent property of the node currently being resolved to the parent node.

And that’s exactly what it did.

But you don’t just parse to the beginning of a tag and push the current tag to the stack.

Because in HTML there is a kind of closure and tag, such as input.

< p style = “box-sizing: border-box; color: RGB (50, 50, 50); line-height: 22px; font-size: 14px! Important; word-break: inherit! Important;”

Therefore, when parsing to the beginning of a tag, it is necessary to determine whether the currently parsed tag is autistic and tag. If not, the tag is pushed to the stack.

if(! unary) { currentParent = element stack.push(element) }Copy the code

Now that you have the HIERARCHY of the DOM, you can also parse out the start tags of the DOM, so that each start tag is parsed to generate an ASTElement (object that stores information about the current tag, such as attrs, tagName, etc.)

Push the current ASTElement to the children of parentNode, and set the parent attribute of the current ASTElement as the last item in the stack

currentParent.children.push(element)
element.parent = currentParent
Copy the code

<The first few cases

But not all strings beginning with < are opening tags. Strings beginning with < can be:

  • The start tag<div>
  • End tag</div>
  • HTML comments<! -- I'm a comment -->
  • Doctype <! DOCTYPE html>
  • Conditional comment (Downlevel-revealed conditional comment)

Of course, the most common things that parsers encounter during parsing are opening tags and closing tags and comments

Intercepting text

Continuing with the example above, the remaining template string after the opening tag of the div is parsed looks like this:

<p>{{name}}</p>
</div>
Copy the code

This time we find that the template string does not start with < during parsing.

What if the template string doesn’t start with a

In fact, if the string does not start with <, there are several possible cases:

I am a text < div > < / div >Copy the code

Or:

I am a text < / p >Copy the code

In either case, the text in front of the tag is parsed out. It’s not difficult to intercept the text. Here’s an example:

// The demo can be executed directly in the browser consoleConst HTML = 'I'm text </p>'let textEnd = html.indexOf('<')
const text = html.substring(0, textEnd)
console.log(text)
Copy the code

If < is part of the text, then the DEMO above is not what we want. For example:

a < b </p>
Copy the code

If this is the case, the demo will die, and the truncated text will be missing a portion of the text that vue has processed. Look at the following code:

let textEnd = html.indexOf('<')
let text, rest, next
if (textEnd >= 0) {
  rest = html.slice(textEnd)
  // The rest of the HTML that doesn't fit the tag must be text
  // And the text starts with <
  while(! endTag.test(rest) && ! startTagOpen.test(rest) && ! comment.test(rest) && ! conditionalComment.test(rest) ) {// < in plain text, be forgiving and treat it as text
    next = rest.indexOf('<', 1)
    if (next < 0) break
    textEnd += next
    rest = html.slice(textEnd)
  }
  text = html.substring(0, textEnd)
  html = html.substring(0, textEnd)
}
Copy the code

The logic of this code is that if the rest of the template string does not match the formatting rules of the tag after the text is truncated, then there must be incomplete text

Simply loop around the textEnd until the rest of the template string matches the tag’s rules and then extract the text from the template string at once.

Continuing with the example above, the remaining template strings currently look like this:

<p>{{name}}</p>
</div>
Copy the code

After the interception, the remaining template string looks like this:

<p>{{name}}</p>
</div>
Copy the code

Here’s what the intercepted text looks like:

""Copy the code

After intercepting, the text needs to be parsed, but before parsing the text, it needs to be preprocessed, which means simply processing the text first. Vue does this:

const children = currentParent.children
text = inPre || text.trim()
  ? isTextTag(currentParent) ? text : decodeHTMLCached(text)
  // only preserve whitespace if its not right after a starting tag: preserveWhitespace && children.length ? "' : 'Copy the code

This code means:

  • If the text is not empty, determine if the parent tag is script or style,
    1. If it is, it doesn’t matter.
    2. If you don’t need itdecodeThe following code, using the github he class librarydecodeHTMLmethods
  • If the text is empty, determine if there are any sibling nodes, i.eparent.children.lengthStudent: Is it zero
    1. Returns if it is greater than zero' '
    2. Returns if it is zero' '

It turns out that this time the text hits the last “”, so this time you don’t have to do anything and just move on to the next round of parsing

Continuing with the example above, the template string changes now look like this:

<p>{{name}}</p>
</div>
Copy the code

{{name}}</p>
</div>
Copy the code

So with the text that’s written above, this time the text that’s truncated is going to look like this: “{{name}}”

Parsing the text

In fact it is not difficult to parse text node, just need to push to currentParent. The text node children. Push (ast).

But text with variables is handled differently than plain text without variables.

The variable text is Hello {{name}} and the name is the variable.

Text without variables is like Hello Berwin, plain text that doesn’t have access to data.

Push the ast of the text node to the children of the parent node. For example:

children.push({
  type: 3, text: 'I am plain text'})Copy the code

Text with variables requires one more operation to parse the text variable:

const expression = parseText(text, delimiters) {{name}} => _s(name)
children.push({
  type: 2,
  expression,
  text
})
Copy the code

The expression is _s(name), so the last node to be pushed into currentparent-children looks like this:

{
  expression: "_s(name)",
  text: "{{name}}",
  type: 2
}
Copy the code

End tag processing

Now that the text is parsed, the remaining template string looks like this:

</p>
</div>
Copy the code

Html.indexof (‘<‘) === 0, and find that it starts with <. Then use the re to match the ending tag and cut it out.

And do a processing is to use the current after the tag name in the stack from forward, will find the positions of the stack back all the label is removed (mean, has been the end of the resolution to the current label, so it a subset of the affirmation is parsed, imagine the current label were closed, it is a subset of must have shut down, So I need to clear the current label position from the stack.)

The closing tag doesn’t have to be resolved, it just deletes the current tag from the stack.

If the last item in children is a blank “”, remove the last item:

if (lastNode && lastNode.type === 3&& lastNode.text === ' ' && ! inPre) { element.children.pop() }Copy the code

Because the last space is useless, for example:

<ul>
  <li></li>
</ul>
Copy the code

After parsing element ASTs in the example above, there is a space between ul’s closing tag and Li’s closing tag . This space is also part of the text node. Removing this space will render one less text node each time the DOM is rendered, saving some performance overhead.

Now that there are not many template strings left, it looks like this:

</div>
Copy the code

Then parse the text, which is a text node that is actually a space.

And then parse the closing tag again

</div>
Copy the code

Exit the while loop when parsing is complete.

After parsing, I got my element ASTs.

To summarize

In fact, the principle of such a template parser is not particularly difficult, the main part of the two parts, one is to intercept the string, one is to parse the string after the interception

Every truncated beginning of a tag is pushed to the stack, and when the end of the tag is parsed, it pops out. When all the strings are truncated, the parsing is finished.

In the above example is relatively simple, does not involve some cycle, what of, annotation processing these were not involved, but in fact this article to express the content of the is not to buckle details, if the buckle details may want to write a small book is enough, the number of words in an article may only enough to give you a general logic clear, I hope the students, If you are interested in the details, please comment below. Let’s discuss and learn together

The optimizer

The goal of the optimizer is to identify and label static nodes, which are nodes where the DOM does not need to change. For example:

<p> I'm a static node, I don't need to change </p>Copy the code

Marking a static node has two benefits:

  1. There is no need to create new nodes for static nodes each time you re-render
  2. Patching in the Virtual DOM can be skipped

The implementation principle of the optimizer is divided into two steps:

  • Step 1: Add all nodes recursivelystaticProperty to identify whether the node is static
  • Step 2: Tag all static root nodes

What is a static root node? A: A node whose children are all static nodes is a static root node. For example:

<ul> <li> I'm a static node, I don't need to change </li> <li> I'm a static node2I don't need to change </li> <li> I'm a static node3</li> </ul>Copy the code

Ul is the static root node.

How do I tag all nodesstaticAttribute?

It is not difficult for Vue to determine if a node is static:

  1. Make a mark based on whether it is a static node or notnode.static = isStatic(node)
  2. And then it goes aroundchildrenIf thechildrenIf a node is not a static node, change the flag of the current node tofalse:node.static = false.

How to tell if a node is static?

So isStatic how does this function determine static nodes?

function isStatic (node: ASTNode): boolean {
  if (node.type === 2) { // expression
    return false
  }
  if (node.type === 3) { // text
    return true
  }
  return!!!!! (node.pre || ( ! node.hasBindings &&// no dynamic bindings! node.if && !node.for && // not v-if or v-for or v-else! isBuiltInTag(node.tag) &&// not a built-in
    isPlatformReservedTag(node.tag) && // not a component! isDirectChildOfTemplateFor(node) &&Object.keys(node).every(isStaticKey)
  ))
}
Copy the code

To explain, when the parser parses a template string into an AST, it sets a type based on the text type:

type instructions
1 Element nodes
2 Dynamic text nodes with variables
3 Plain text nodes with no variables

So the logic in isStatic above is obvious, if type === 2 then it’s definitely not a static node return false, if type === 3 then it’s a static node return true.

If type === 1, it’s a little more complicated. There are a lot of criteria for determining whether an element node is static, so let’s look at them one by one.

If node.pre is true, the current node is static. Click on me to find out what node.pre is.

The secondnode.hasBindingsCan’t fortrue.

The Node. hasBindings attribute is set when the AST is converted by the parser. If attr starting with V -, @ or: is in the attrs of the current node, it will set Node. hasBindings to true.

const dirRE = /^v-|^@|^:/
if (dirRE.test(attr)) {
  // mark element as dynamic
  el.hasBindings = true
}
Copy the code

And element nodes cannot haveifforProperties.

Node. if and node.for are also set when the parser converts the AST.

When a node is found to use v-if during parsing, an if attribute is set to the current node during parsing.

This means that element nodes cannot use v-if v-for V-else instructions.

And element nodes cannot beslotcomponent.

And element nodes cannot be components.

Such as:

<List></List>
Copy the code

Cannot be a custom component like the one above

And the parent node of an element node cannot be a bandv-fortemplateTo view detailsClick on me.

And no additional attributes can appear on element nodes.

Type Tag attrsList attrsMap Plain Parent Children Attrs staticClass staticStyle If other attributes are present, the current node is not considered static.

Only nodes that meet all of the above conditions are considered static.

How do I mark all nodes?

The AST is a tree. How do we label all nodes static?

Another problem is that if an element node is static, it can’t just be a static node itself. If its children are not static nodes, then it is not a static node even if it meets the conditions mentioned above.

So in vue we have this line of code:

for (let i = 0, l = node.children.length; i < l; i++) {
  const child = node.children[i]
  markStatic(child)
  if(! child.static) {
    node.static = false}}Copy the code

MarkStatic can be used to mark a node, and the rule just mentioned above, vue.js will loop children, and then each different child node will loop its children using the same logic so that all the nodes will be marked.

Then, in the loop, if one of the children is not a static node, change the current node’s flag to false.

All the nodes on the AST are correctly marked.

How do I mark the static root node?

Marking the static root node is also a recursive process.

The implementation in Vue looks something like this:

function markStaticRoots (node: ASTNode, isInFor: boolean) {
  if (node.type === 1) {
    // For a node to qualify as a static root, it should have children that
    // are not just static text. Otherwise the cost of hoisting out will
    // outweigh the benefits and it's better off to just always render it fresh.
    if (node.static&& node.children.length && ! ( node.children.length ===1 &&
      node.children[0].type === 3
    )) {
      node.staticRoot = true
      return
    } else {
      node.staticRoot = false
    }
    if (node.children) {
      for (let i = 0, l = node.children.length; i < l; i++) { markStaticRoots(node.children[i], isInFor || !! node.for)}}}}Copy the code

This code actually means the same thing:

The fact that the current node is static and has children that are not a single static text node marks the current node as a root static node.

Well, maybe it’s a little convoluted. Reexplain.

When we labeled static nodes above, there was a section of logic that said the current node is truly static only if all of its children are static nodes.

So if we find that a node is static, then we can prove that all of its children are static, and we’re marking the static root node, so if a static node contains only one text node it’s not marked as static root.

This is also done for performance reasons, as Vue explained in his comments, if a node containing only static text is marked as the root node, the cost of doing so will outweigh the benefits

To summarize

The whole logic is to recurse the AST tree and find and label static nodes and static roots.

Code generator

The code generator uses element ASTs to generate the render function code string.

Using the template generated AST from the example at the beginning of this article to generate render looks like this:

{
  render: `with(this){return _c('div',[_c('p',[_v(_s(name))])])}`
}
Copy the code

After formatting, it looks like this:

with(this){
  return _c(
    'div',
    [
      _c(
        'p',
        [
          _v(_s(name))
        ]
      )
    ]
  )
}
Copy the code

The generated code string shows several function calls _c, _v, and _s.

_c corresponds to createElement, which creates an element.

  1. The first argument is an HTML tag name
  2. The second parameter, which is optional, is the data object corresponding to the attribute used on the element
  3. The third parameter ischildren

Such as:

A simple template:

<p title="Berwin" @click="c">1</p>
Copy the code

The generated code string is:

`with(this){return _c('p',{attrs:{"title":"Berwin"},on:{"click":c}},[_v("1"`)]}Copy the code

After formatting:

with(this){
  return _c(
    'p',
    {
      attrs:{"title":"Berwin"},
      on:{"click":c}
    },
    [_v("1")])}Copy the code

Click on me to learn more about createElement.

_v means to create a text node.

_s is the string in the return argument.

The overall logic of the code generator is to use element ASTs to recurse and then spell out strings like _c(‘div’,[_c(‘p’,[_v(_s(name))])]).

So how do you spell this string?

Look at the following code:

function genElement (el: ASTElement, state: CodegenState) {
  const data = el.plain ? undefined : genData(el, state)
  const children = el.inlineTemplate ? null : genChildren(el, state, true)
	
  let code = `_c('${el.tag}'${
    data ? `,${data}` : '' // data
  }${
    children ? `,${children}` : '' // children}) `return code
}
Copy the code

The _c parameter requires tagName, data, and children.

So the main logic of this code is to get data and children using genData and genChildren, and then to spell _c, and then to return _c(tagName, data, children).

So we are more concerned about the two issues:

  1. How is data generated (implementation logic of genData)?
  2. How are children generated (the implementation logic of genChildren)?

Let’s first look at the implementation logic of genData:

function genData (el: ASTElement, state: CodegenState): string {
  let data = '{'
  // key
  if (el.key) {
    data += `key:${el.key},`
  }
  // ref
  if (el.ref) {
    data += `ref:${el.ref},`
  }
  if (el.refInFor) {
    data += `refInFor:true`},// pre
  if (el.pre) {
    data += `pre:true`},/ /... There's a lot of things like that
  data = data.replace(/,$/, '') + '}'
  return data
}
Copy the code

As you can see, it is based on what attributes are on the current node on the AST, and then it is done differently for different attributes, and finally it is spelled out as a string ~

Then let’s look at how genChildren is implemented:

function genChildren (
  el: ASTElement,
  state: CodegenState
): string | void {
  const children = el.children
  if (children.length) {
    return `[${children.map(c => genNode(c, state)).join(',')}]`
  }
}

function genNode (node: ASTNode, state: CodegenState): string {
  if (node.type === 1) {
    return genElement(node, state)
  } if (node.type === 3 && node.isComment) {
    return genComment(node)
  } else {
    return genText(node)
  }
}
Copy the code

From the above code, we can see that the process of generating children is actually the children of the current node in the AST loop, and then execute genElement genComment genText for each item according to different node types. If genElement has children in the loop, so recursion, after the last lap will get a complete render function code string, something like the following.

"_c('div',[_c('p',[_v(_s(name))])])"
Copy the code

Finally, install the generated code into with.

export function generate (
  ast: ASTElement | void,
  options: CompilerOptions) :CodegenResult {
  const state = new CodegenState(options)
  // If ast is empty, create an empty div
  const code = ast ? genElement(ast, state) : '_c("div") 'return {
    render: `with(this){return ${code}}`
  }
}
Copy the code

About the code generator part to say here, in fact, the source code is far more than so simple, a lot of details I did not say, I only said a general process, interested in the specific details of the students can go to see the source to understand the details.

conclusion

In this article, we showed that vUE’s overall template compilation process is divided into three parts: a parser, an optimizer, and a code generator.

The function of the parser is to convert template strings to Element ASTs.

The job of the Optimizer is to find those static nodes and static root nodes and label them.

The code generator is used to generate render function code from element ASTs.

Here’s a picture:

The principle of a parser is to intercept strings in small pieces, and then maintain a stack to store the DOM depth. Every time a tag is intercepted, it is pushed to the stack, and a complete AST is parsed after all strings are intercepted.

The optimizer recursively marks all nodes to indicate whether they are static or not, and then recursively marks the static root node again.

Code generators work by recursively piecing together strings of code that execute a function, calling different generation methods depending on the node type. If an element node is found, a _c(tagName, data, children) function call string is spelled, and data and children are also spelled using attributes in the AST.

If there are children in the children, then we’re going to do it recursively.

Finally spell out a complete render function code.