1. Introduction

1.1 Blog Generator

​VuePress

VuePress is a minimalist static website generator. Development: A VuePress website is a one-page application powered by Vue, Vue Router, and WebPack. Build time: A server-side rendering (SSR) version of the application is created and the corresponding HTML is rendered through virtual access to each path

Advantages: 1. Server-side rendering, good SEO; 2.Vue architecture, good plug-in development experience

Disadvantages: 1. When using Vue syntax in MD files, you must follow the server rendering specification

1.2 How does a Browser display Markdown

Parsing markdown syntax files into HTML files, the browser renders the HTML directly.

Markdown-it: The most widely used markdown parser tool today

Simple to use

const md = require("markdown-it")(options);
const htmlStr = md.render('# test')
/ / get the < h1 > test < / h1 >
Copy the code

Markdown-it conversion effect preview

Markdown – it’s official website

Let’s see how Markdown-it completes the conversion from # test to <h1>test</h1>.

2. Markdown-it conversion principle

Conversion flow chart:

It can be seen that the conversion process is mainly divided into two steps:

  • Parsing the MD document to Tokens.
  • Render this token

2.1 Basic Category — Token & Ruler

In order to understand the principle of MarkdownIt, it is necessary to understand the two base classes – Ruler & Token.

Token class

The MD code went through a series of parser processes and became tokens.

Definition of Token:

// lib/token.js
function Token(type, tag, nesting) {
  // The token types, for example, paragraph_open, paragraph_close, and hr, go to 

,

, and
, respectively.
this.type = type; // Tag names, such as p, strong, ''(empty string represents text), etc this.tag = tag; // The attribute of an HTML tag element, if present, is a two-dimensional array such as [["href", "http://dev.nodeca.com"]] this.attrs = null; // Token position information, array has only two elements, the former is the start line, the latter is the end line. this.map = null; // Label type: 1 indicates an open label, 0 indicates a self-closing label, and -1 indicates a closed label. For example,

,


,

.
this.nesting = nesting; // Shrink the hierarchy. this.level = 0; / / token. Only tokens with type inline or image will have children. // Tokens also go through a parser to extract more detailed tokens this.children = null; // Place the contents between labels. this.content = ' '; // Some syntax specific markup. For example, "" indicates a code block. -" is a list. this.markup = ' '; // Token whose type is fence has info attribute // The token parses as' 'js'' belong to the token whose type is fence, its info = js this.info = ' '; // Plugins are used to store arbitrary data. this.meta = null; // ParserCore generates a token block of true, ParserInline generates a token block of false. this.block = false; // If true, the token will not be rendered. this.hidden = false; } Copy the code

Ruler class

There are many rule functions stored in the Ruler. The functions of rule are divided into two types:

  • One is parse rule, which parses strings passed in by users to generate tokens
  • The other is the Render rule, which, after the token is produced, invokes a different Render rule depending on the token’s type, eventually spitting out the HTML string.

The constructor of Ruler:

function Ruler() {
  this.__rules__ = [];
  this.__cache__ = null;
}
Copy the code

Rules, which holds all rule objects, has the following structure:

[{
  name: XXX,
  enabled: Boolean.// Whether to enable
  fn: Function(), // handle the function
  alt: [ name2, name3 ] // Name of the responsibility chain to which it belongs
}]
Copy the code

The cache stores information about the rule chain to determine the order of calling rules. The structure of the cache is as follows:

{Responsibility chain name: [rule1.fn, rule2.fn,...] }Copy the code

For example:

let ruler = new Ruler()
ruler.push('rule1', rule1Fn, {
  alt: 'chainA'
})
ruler.push('rule2', rule2Fn, {
  alt: 'chainB'
})
ruler.push('rule3', rule3Fn, {
  alt: 'chainB'
})
ruler.__compile__()

// We can get the following structure
ruler.__cache__ = {
  ' ': [rule1Fn, rule2Fn, rule3Fn],
  'chainA': [rule1Fn],
  'chainB': [rule2Fn, rule3Fn],
}
// Get three rule chains: '', 'chainA', 'chainB'.
Copy the code

2.2 the Parsing process

The main logic is in the ParserCore class.

ParserCore class

The main logic of the ParserCore class:

var _rules = [
  [ 'normalize'.require('./rules_core/normalize')], ['block'.require('./rules_core/block')], ['inline'.require('./rules_core/inline')], ['linkify'.require('./rules_core/linkify')], ['replacements'.require('./rules_core/replacements')], ['smartquotes'.require('./rules_core/smartquotes')]];function Core() {
  this.ruler = new Ruler();

  for (var i = 0; i < _rules.length; i++) {
    this.ruler.push(_rules[i][0], _rules[i][1]);
  }
}

Core.prototype.process = function (state) {
  var i, l, rules;

  // Get __cache__, get the rule link
  rules = this.ruler.getRules(' ');

  for (i = 0, l = rules.length; i < l; i++) { rules[i](state); }}; Core.prototype.State =require('./rules_core/state_core');
Copy the code

The prototype of the ParserCore class has a process method in which this.ruler. GetRules returns the cache attribute of the ruler class, so the ultimate goal of this method is to trigger all rules methods in cache link order.

The focus is on Rules, where each Rule either adds new tokens or modifies the original Token. Here are some Core Rules:

  • Normalize: normalize line breaks for MD documents; Converts the null character \u0000 to \uFFFD
  • Block: Identifies block tokens (Table, blockquote, Code, Fence, etc.) and Inline tokens. If the Token is a Block Token, start the Block Chain to process it.
  • Inline: The token of type ‘inline’ identified by the Block Rule is processed
  • linkify: Check whether the text token contains a different URL(HTTP or mailto). If so, Divide the original complete text tokens into three parts: text, link and text. (Actually, there are not three tokens because link_open and link_close are generated.)
  • replacements: Complete replacements such as (c) (c) → ©, +- → ±, while avoiding the object text contained in link
  • Smartquotes: Complete typography of quotes

The heading method under block Rules is used to parse the heading tags (H1-H6). Its syntax is mainly #, ##, ### and so on.

module.exports = function heading(state, startLine, endLine, silent) {
  var ch, level, tmp, token,
      pos = state.bMarks[startLine] + state.tShift[startLine],
      max = state.eMarks[startLine];

  // If the number of Spaces before a line exceeds 4, it is a code block
  if (state.sCount[startLine] - state.blkIndent >= 4) { return false; }

  // Returns the Unicode encoding for the specified position character in the string
  ch  = state.src.charCodeAt(pos);
	// Does not start with #, no conversion required
  if(ch ! = =0x23/ * # * / || pos >= max) { return false; }

  // Record the level of the title
  level = 1;
  
  ch = state.src.charCodeAt(++pos);
  
  // Count the number of # to calculate the title level
  while (ch === 0x23/ * # * / && pos < max && level <= 6) {
    level++;
    ch = state.src.charCodeAt(++pos);
  }

  // No conversion required for special scenarios (# number exceeds 6)
  if (level > 6|| (pos < max && ! isSpace(ch))) {return false; }

  // Silent is configured externally and is used for external control without conversion
  if (silent) { return true; }

  // Remove more Spaces (e.g. '###')
  max = state.skipSpacesBack(max, pos);
  tmp = state.skipCharsBack(max, 0x23, pos); / / #
  if (tmp > pos && isSpace(state.src.charCodeAt(tmp - 1))) {
    max = tmp;
  }

  state.line = startLine + 1;

  // Convert to token
  token        = state.push('heading_open'.'h' + String(level), 1);
  token.markup = '# # # # # # # #'.slice(0, level);
  token.map    = [ startLine, state.line ];

  token          = state.push('inline'.' '.0);
  token.content  = state.src.slice(pos, max).trim();
  token.map      = [ startLine, state.line ];
  token.children = [];

  token        = state.push('heading_close'.'h' + String(level), -1);
  token.markup = '# # # # # # # #'.slice(0, level);

  return true;
};
Copy the code

Results after conversion:

[{"type": "heading_open"."tag": "h1"."attrs": null."map": [
      0.1]."nesting": 1."level": 0."children": null."content": ""."markup": "#"."info": ""."meta": null."block": true."hidden": false
  },
  {
    "type": "inline"."tag": ""."attrs": null."map": [
      0.1]."nesting": 0."level": 1."children": [{"type": "text"."tag": ""."attrs": null."map": null."nesting": 0."level": 0."children": null."content": "test"."markup": ""."info": ""."meta": null."block": false."hidden": false}]."content": "test"."markup": ""."info": ""."meta": null."block": true."hidden": false
  },
  {
    "type": "heading_close"."tag": "h1"."attrs": null."map": null."nesting": -1."level": 0."children": null."content": ""."markup": "#"."info": ""."meta": null."block": true."hidden": false}]Copy the code

It can be simplified as a graph:At this point we get an array like an AST tree, which Markdown-it calls a token stream, and they are passed torenderer.

2.3 the Renderer process

The Renderer process is the process of converting Token flows into specific HTML.

Renderer’s main logic:

Renderer.prototype.render = function (tokens, options, env) {
  var i, len, type,
      result = ' ',
      rules = this.rules;

  for (i = 0, len = tokens.length; i < len; i++) {
    type = tokens[i].type;

    if (type === 'inline') {
      result += this.renderInline(tokens[i].children, options, env);
    } else if (typeofrules[type] ! = ='undefined') {
      result += rules[tokens[i].type](tokens, i, options, env, this);
    } else {
      result += this.renderToken(tokens, i, options, env); }}return result;
};
Copy the code

You can see that the Render function iterates through all the tokens, handing each token to renderInline, renderToken, and methods in the Rules array (rules has nine built-in methods).

Back to our example, # test parses to yield 3 tokens of type:

  • heading_open
  • inline
  • heading_close

Rules does not include the heading_open and heading_close methods, so renderer will execute the renderToken method when dealing with these two tokens.

Renderer.prototype.renderToken = function renderToken(tokens, idx, options) {
  var nextToken,
      result = ' ',
      needLf = false,
      token = tokens[idx];

  if (token.hidden) {
    return ' ';
  }

  if(token.block && token.nesting ! = = -1 && idx && tokens[idx - 1].hidden) {
    result += '\n';
  }

  // Add open or close tags
  result += (token.nesting === -1 ? '< /' : '<') + token.tag;
	
  // Add the tag attributes
  result += this.renderAttrs(token);
	
  // Self-closing label processing
  if (token.nesting === 0 && options.xhtmlOut) {
    result += '/';
  }

  if (token.block) {
    // Determine whether to break a line
    needLf = true;

    if (token.nesting === 1) {
      if (idx + 1 < tokens.length) {
        nextToken = tokens[idx + 1];

        if (nextToken.type === 'inline' || nextToken.hidden) {
          needLf = false;

        } else if (nextToken.nesting === -1 && nextToken.tag === token.tag) {
          needLf = false;
        }
      }
    }
  }

  result += needLf ? '>\n' : '>';

  return result;
};
Copy the code

RenderToken for heading_open and heading_close is processed by renderToken: <h1></h1>

The text token under the inline token is processed by default_rules.text of the built-in nine rules, resulting in copytext test

default_rules.text = function (tokens, idx /*, options, env */) {
  // Escape special characters
  return escapeHtml(tokens[idx].content);
};
Copy the code

The token stream is then processed by the rendering rules into the final HTML snippet < H1 >test</h1> and the Markdown-it work task is complete.

2.4 summary

The entire workflow of Markdown-IT is similar to that of the assembly line in the factory. We put md codes into the machine (parse rules) for processing, and then the Tokens are automatically processed along the assembly line into the next machine (Render rules). The final product (HTML code).

In a factory, if it is necessary to adjust the process of the assembly line, it is common to add machines with extra functions to the assembly line to deal with components and semi-finished products.

Consider: 🤔 How to modify the markdown-it conversion results?

3. The markdown – it plug-in

The Markdown-it plug-in is used to modify the transformation results.

Markdown -it is a variety of plug-ins: generate directories, generate anchor links, highlight code, recognize emojis, and more.

Examples of using markdown-it-anchor plugins automatically:

const md = require("markdown-it") ({}); md.use(require("markdown-it-anchor"), {
  permalink: true.permalinkBefore: true.permalinkSymbol: "§"});Copy the code

The effect

3.1 Understanding Plug-ins

The logic of MarkdownIt’s use is simple: the first argument passed in to call use is a function that is called and takes all arguments starting with the second argument.

MarkdownIt.prototype.use = function (plugin /*, params, ... * /) {
  var args = [ this ].concat(Array.prototype.slice.call(arguments.1));
  plugin.apply(plugin, args);
  return this;
};
Copy the code

So we can define the plug-in as an implementation of a function, and the implementation is to edit the token.

3.2 How to Write a Plug-in

There are two main ways to write plug-ins:

  • New or modified parsing rules
  • Add or modify renderer rules

Look at an example of adding a target=”_blank” attribute to a jump link

Implementation 1: Modify renderer rules

// If overrides, or proxies for the default renderer, remember the old renderer.
var defaultRender = md.renderer.rules.link_open || function(tokens, idx, options, env, self) {
  return self.renderToken(tokens, idx, options);
};

md.renderer.rules.link_open = function (tokens, idx, options, env, self) {
  // If you are sure that other plugins cannot add 'target' - discard the following checks:
  var aIndex = tokens[idx].attrIndex('target');

  if (aIndex < 0) {
    tokens[idx].attrPush(['target'.'_blank']); // Add a new attribute
  } else {
    tokens[idx].attrs[aIndex][1] = '_blank';    // Replace existing attribute values
  }

  // Pass the token to the default renderer.
  return defaultRender(tokens, idx, options, env, self);
};
Copy the code

Implementation two: Modifying parsing rules

/** markdown-it-for-inline package (add resolution rules for specific types of inline tokens) type: function params: - rule name (should be unique) - token type - function */
var iterator = require("markdown-it-for-inline");

md.use(
  iterator,
  "url_new_win"."link_open".function (tokens, idx) {
    var aIndex = tokens[idx].attrIndex("target");

    if (aIndex < 0) {
      tokens[idx].attrPush(["target"."_blank"]);
    } else {
      tokens[idx].attrs[aIndex][1] = "_blank"; }});Copy the code

Either way, parsing and renderer rules were used to add a record to the attrs attribute of the link_open token:

{[...type: 'link_open'.tag: 'a'.attrs: [["href"."http://dev.nodeca.com"], ["target"."_blank"]],... },... ]Copy the code

After parsing, you end up with output similar to this:

<a href="http://dev.nodeca.com" target="_blank">xxx</a>
Copy the code

3.3 Other Plug-ins

3.3.1 Emoji recognition

Markdown-it-emoji (Github Repository)

  • New parsing rule to match tokens whose type was inline (emoticons only appeared under that type of token);
  • Take token.content and match all short characters that satisfy Shortcuts with a re and replace them with expressions in defs;
  • Renderer rule returns token.content
// Emoji mapping
defs = {
  "angry": "😦"."blush": "😊"."broken_heart": "💔". };// Short character mapping
shortcuts = [
  angry:            [ '> : ('.'> : - ('].blush:            [ ', ").': -)"].broken_heart:     [ '< / 3'.'< 3 \ \'],... ]Copy the code

3.3.2 Automatically generate title anchor links

The markdown-it-anchor plug-in: Extracts anchor points from the title for quick positioning while reading the document.

Effect reference: Element UI

Implementation Principle: Insert tokens between heading_open and heading_close tokens, because there is an A link between the anchor points, that is, link_open, inline and link_close tokens.

Create anchor link related tokens:

export const headerLink = makePermalink((slug, opts, anchorOpts, state, idx) = > {
  const linkTokens = [
    Object.assign(new state.Token('link_open'.'a'.1), {
      attrs: [
        ...(opts.class ? [['class', opts.class]] : []),
        ['href', opts.renderHref(slug, state)], ... Object.entries(opts.renderAttrs(slug, state)) ] }), ... (opts.safariReaderFix ? [new state.Token('span_open'.'span'.1] : []),... state.tokens[idx +1].children, ... (opts.safariReaderFix ? [new state.Token('span_close'.'span', -1] : []),new state.Token('link_close'.'a', -1)
  ]

  state.tokens[idx + 1] = Object.assign(new state.Token('inline'.' '.0), {
    children: linkTokens
  })
})
Copy the code

3.3.3 other

  • How is the code highlighted
  • Automatic directory generation ([[]])
  • .

9. To summarize

This article uses a simple example of the # test conversion

test

and the source code of Markdown-it to explain how the markdown syntax is translated into HTML markup language step by step. You learned how to customize the resulting HTML tags by modifying the markdown-it transformation steps through plug-ins.

If have help to you, help point 👍🏻 oh!

Markdown – it use the demo

Reference:

Markdown – It source code analysis series