preface

Recently, I used Markdown-it more and developed some plug-ins. In this process, I studied the source code and finally wrote this article. Readers who need details can read the document on their own.

This article is divided into two parts: principle analysis and principle application (write plug-in).

Markdown – it principle

Enter a markdown code and get an HTML code as follows:

Let’s explain the process with a simple example: # I’m an example ->

I’m an example

The token stream is then taken by the renderer and pieced together into an HTML string through the processing of the various rendering rules.

The parser

Markdown-it has seven core rules built in. In the figure above, I use dashed lines for parsing rules because they can be enabled/disabled. In this article, we will only talk about the two core rules: block and inline.

The specification states:

We can think of a Markdown document as a series of blocks, which are structured elements such as paragraphs, block references, lists, headings, rules, and code blocks. Some blocks (such as block quotes and list items) can contain other blocks; Others (such as headings and paragraphs) contain inline content such as text, links, emphasis text, images, in-line code, etc.

Block structures always take precedence over inline structures in parsing. This means parsing can be done in two steps: 1. Identify the block structure of the Markdown document; 2. Parse paragraphs, headings, and other block lines as inline structures.

Note that the first step needs to process the rows sequentially, but the second step can be parallelized because inline parsing of one block element does not affect inline parsing of any other block.

There are two types of blocks: container blocks and leaf blocks. Container blocks can contain other blocks, but leaf blocks cannot.

Specific parsing will be conducted around the two dimensions of line and character.

For each line, there are three explanations:

  1. Used to close one or more block structures.

  2. Use to create one or more new block structures as children of the last opened block structure.

  3. You can add text to the last (deepest) open block structure remaining on the tree.

For our example, we create a heading block and then add the text content to it. The next line has no content, so the block is closed.

Characters include non-white space characters and Spaces (U+0020), tabs (U+0009), line feeds (U+000A), row tables (U+ 000B), page feeds (U+ 000C) or carriage returns (U+ 000D). We’re not going to do expansion here.

The rules you will encounter during this time are block, inline, heading, and text.

  1. Block rules are used to parse# I'm an example
  • Let’s start with the tokenize function, which contains eleven block rules.

  • Heading rules

  • Get heading_open, inline, heading_close

  1. Inline rules, which are used for parsingI'm an example
  • Enter the parse function, which contains four inline rules

  • Text rules

  • Get the token of text

After parsing, we get 3 + 1 tokens:

Token flow

The result here is not an AST tree, but an array, which Markdown-it calls a Token stream. Why is that?

The official explanation:

  • Tokens are a simple array. (AST is an object)

  • Open and closed labels can be isolated.

  • Inline Container is used as a special block Token object. It has nested tokens like bold, italic, text, and so on.

What’s the good of that? This allows both block and inline tokens to be processed in parallel.

After the token streams are generated, they are passed to the Renderer.

The renderer

It iterates through all tokens, passing each token to a rule with the same name as the token’s Type attribute. Markdown-it has nine rules built in: fences, in-line code, code blocks, HTML blocks, in-line HTML, images, hard line breaks, soft line breaks, and text.

Tokens whose type attribute is not part of the built-in rule will be passed into the renderToken as a normal token, which is not expanded here.

Back to our example:

Heading_open will be rendered as

Inline text is rendered as I am an example

Heading_close will be rendered as

Markdown – it plug-in

Some Markdown-it plug-ins take advantage of this principle.

markdown-it-container

This plugin allows you to support blocks of content: more commonly tips, warnings, and dangers. Used to emphasize a particular piece of content.

How does this work? We can infer the token flow of a content block from the previous introduction:

The first and third lines have block tokens, one for open and one for close. The second line is an inline token, where the content is inline.

Since the content block is inline, fences, inline code, code blocks, HTML blocks, inline HTML, images, hard line breaks, soft line breaks, and text are all supported.

In fact, we scan line by line, find a content block syntax that matches ::: tip, and start parsing it as a block structure until the line with ::: ends. For each line, it parses into paragraph_open, inline, and paragraph_close.

The parsed token stream ends up rendering

, several P tags,

, respectively.

markdown-it-anchor

This plug-in can extract anchor points from titles to quickly locate them when reading documents.

Insert a heading_open type token before a heading_open type token. This token is rendered as an anchor point.

In fact, there is more than one token inserted, because the anchor point is clickable, so it is actually an A link, namely link_open, inline, link_close tokens. Instead of being inserted before heading_open, it is inserted in an inline child between heading_open and heading_close, since # is level with Markdown syntax.

Note: 1. Since the title may contain special characters such as @#$, which will invalidate the URL hash, it is necessary to escape the hash value of the anchor point. 2. A title with the same name may appear, so you need to mark the hash

Add attributes to links

Here’s an example of how to write a plugin: Add the target=”_blank” attribute to all links.

There are two ways:

  1. Modify renderer rules
// If overrides, or proxies for the default renderer, remember the old renderer. var defaultRender = md.renderer.rules.link_open ||function(tokens, idx, options, env, self) {
  return self.renderToken(tokens, idx, options);
};

md.renderer.rules.link_open = function(tokens, idx, options, env, self) {var aIndex = tokens[idx].attrindex (tokens, idx, options, env, self)'target');

  if (aIndex < 0) {
    tokens[idx].attrPush(['target'.'_blank']); // Add a new attribute}else {
    tokens[idx].attrs[aIndex][1] = '_blank'; // Replace existing attribute values} // Pass the token to the default renderer.return defaultRender(tokens, idx, options, env, self);
};
Copy the code
  1. Modify the token
var iterator = require('markdown-it-for-inline');

var md = require('markdown-it')()
            .use(iterator, 'url_new_win'.'link_open'.function (tokens, idx) {
              var aIndex = tokens[idx].attrIndex('target');

              if (aIndex < 0) {
                tokens[idx].attrPush(['target'.'_blank']);
              } else {
                tokens[idx].attrs[aIndex][1] = '_blank'; }});Copy the code

The highlighted

The official documentation here uses highlight.js as an example, which involves complex techniques (mainly compiling syntax trees for various languages) and will not be explained here.

conclusion

Markdown -it as a classic JS markdown parsing library, the ideas and design can be carefully studied, aftertaste for a long time.