Interviewer: How does vue-Router parse URL routing parameters?

Xiao Ming:……

Vue-router parses path parameters into regular expressions using the path-to-regexp library. Next, we will introduce the basic usage and implementation principle of path-to-Regexp library based on version 6.2.0. The more interesting path-to-regexp library is small, but it implements a non-Turing-complete lexical analysis behind it. Good SAO, I love……

No more nonsense, just get in the car!!

Let’s take a look at the basic use of path-to-regexp, which converts string paths (such as /user/:name) into regular expressions, as shown in the following example:

const { pathToRegexp, parse, compile } = require('path-to-regexp');

const url = '/user/:id';

const keys = [];
const regexp = pathToRegexp(url, keys);

// /^\/user(? : \ / ([^ # \ \ /?] +? ) / # \ \ /? ? $/i
console.log(regexp);

/** * [ * { * name: 'id', * prefix: '/', * suffix: '', * pattern: '[^\\/#\\?]+?', * modifier: '' * } * ] */
console.log(keys);

// ['/user/10086', '10086', index: 0, input: '/user/10086', groups: undefined]
console.log(regexp.exec('/user/10086'));

// null
console.log(regexp.exec('/notuser/10086'));

const tokens = parse(url);

/** * [ * '/user', * { * name: 'id', * prefix: '/', * suffix: '', * pattern: '[^\\/#\\?]+?', * modifier: '' * } * ] */
console.log(tokens);

const toPath = compile(url, {
  encode: encodeURIComponent});const path1 = toPath({
  id: 123});// /user/123
console.log(path1);
Copy the code

Introduction to basic Structure

Open the source file and you can see that all implementations of the library are in the index.ts file in the SRC folder, as shown below:

Index. ts mainly implements several functions as API exposed:

export function parse() {}

export function compile<P extends object = object> () {}

export function tokensToFunction<P extends object = object> () {}

export function match<P extends object = object> () {}

export function regexpToFunction<P extends object = object> () {}

export function regexpToFunction<P extends object = object> () {}

export function tokensToRegexp() {}

export function pathToRegexp() {}
Copy the code

Let’s take a look at the implementation of the pathToRegexp function.

PathToRegexp implementation

The pathToRegexp function is the main implementation we use to convert a path string into a regular object:

/** * Format the given string and return a regular expression ** An empty array can be passed in for the keys, which will hold the * placeholder key descriptions. For example, using `/user/:id`, `keys` will * contain `[{ name: 'id', delimiter: '/', optional: false, repeat: false }]`. */
export function pathToRegexp(path: Path, keys? : Key[], options? : TokensToRegexpOptions & ParseOptions) {
  ArrayToRegexp = arrayToRegexp = arrayToRegexp = arrayToRegexp = arrayToRegexp = arrayToRegexp Call stringToRegexp to convert */
  if (path instanceof RegExp) return regexpToRegexp(path, keys);
  if (Array.isArray(path)) return arrayToRegexp(path, keys, options);
  return stringToRegexp(path, keys, options);
}
Copy the code

Implementation of stringToRegexp, which is the conversion logic to the path string:

/** * Creates a path regular object */ from a string input
function stringToRegexp(path: string, keys? : Key[], options? : TokensToRegexpOptions & ParseOptions) {
  // First use parse to create desired tokens
  // Then call tokensToRegexp to convert tokens into regular objects
  return tokensToRegexp(parse(path, options), keys, options);
}
Copy the code

The parse implementation

Parse calls lexer to parse the string, then parse it, and output the result of parsing:

/** * Parses the string */ of the original tag
export function parse(str: string, options: ParseOptions = {}) :Token[] {
  // Perform character segmentation, also known as lexical analysis
  const tokens = lexer(str);
  const { prefixes = ". /" } = options;
  const defaultPattern = ` [^${escapeString(options.delimiter || "/ #?")}] +? `;
  const result: Token[] = [];
  let key = 0;
  let i = 0;
  let path = "";

  const tryConsume = (type: LexToken["type"]): string | undefined= > {
    if (i < tokens.length && tokens[i].type === type) return tokens[i++].value;
  };

  const mustConsume = (type: LexToken["type") :string= > {
    const value = tryConsume(type);
    if(value ! = =undefined) return value;
    const { type: nextType, index } = tokens[i];
    throw new TypeError(`Unexpected ${nextType} at ${index}, expected ${type}`);
  };

  const consumeText = (): string= > {
    let result = "";
    let value: string | undefined;
    // tslint:disable-next-line
    // Combine multiple CHAR or ESCAPED_CHAR tokens into a continuous string
    while ((value = tryConsume("CHAR") || tryConsume("ESCAPED_CHAR"))) {
      result += value;
    }
    return result;
  };

  // Make tokens
  while (i < tokens.length) {
    const char = tryConsume("CHAR");
    const name = tryConsume("NAME");
    const pattern = tryConsume("PATTERN");

    // Handle tokens of type NAME or PATTERN
    if (name || pattern) {
      let prefix = char || "";

      if (prefixes.indexOf(prefix) === -1) {
        path += prefix;
        prefix = "";
      }

      if (path) {
        result.push(path);
        path = "";
      }

      // Add to the parse result
      result.push({
        name: name || key++,
        prefix,
        suffix: "".pattern: pattern || defaultPattern,
        modifier: tryConsume("MODIFIER") | |""
      });
      continue;
    }

    // Handle CHAR or ESCAPED_CHAR tokens
    const value = char || tryConsume("ESCAPED_CHAR");
    // Matches until tokens that are not of type (CHAR or ESCAPED_CHAR) stop
    if (value) {
      path += value;
      continue;
    }
    // Add the matched result to the parse result, and empty the matched result
    if (path) {
      result.push(path);
      path = "";
    }

    // Handle tokens of type OPEN and CLOSE
    Const regexp = pathToRegexp("/:attr1? {-:attr2}?" );
    const open = tryConsume("OPEN");
    if (open) {
      const prefix = consumeText();
      const name = tryConsume("NAME") | |"";
      const pattern = tryConsume("PATTERN") | |"";
      const suffix = consumeText();

      mustConsume("CLOSE");

      result.push({
        name: name || (pattern ? key++ : ""),
        pattern: name && ! pattern ? defaultPattern : pattern, prefix, suffix,modifier: tryConsume("MODIFIER") | |""
      });
      continue;
    }

    mustConsume("END");
  }

  return result;
}
Copy the code

As you can see, the process of Parse is to obtain all tokens through lexical analysis through Lexer and then consume tokens. The process of consumption is to iterate over all tokens, and consumption is based on the grammatical rules that the library wants to expose.

Lexer simple lexical analysis implementation

The main role of Lexer lexical analysis is to divide tokens according to rules as follows:

/** * Tokenize input string
function lexer(str: string) :LexToken[] {
  const tokens: LexToken[] = [];
  let i = 0;

  // Iterate over each character in turn
  while (i < str.length) {
    // Get the current character
    const char = str[i];

    // If the token is asterisk (asterisk), plus (plus), or question mark (?), it is divided into MODIFIER tokens
    if (char === "*" || char === "+" || char === "?") {
      tokens.push({ type: "MODIFIER".index: i, value: str[i++] });
      continue;
    }

    // If it is a \ symbol, the token is split as ESCAPED_CHAR
    if (char === "\ \") {
      tokens.push({ type: "ESCAPED_CHAR".index: i++, value: str[i++] });
      continue;
    }

    // If it is an OPEN curly brace, it is split as an OPEN token
    if (char === "{") {
      tokens.push({ type: "OPEN".index: i, value: str[i++] });
      continue;
    }

    // If it is a CLOSE curly brace, it is split into tokens of type CLOSE
    if (char === "}") {
      tokens.push({ type: "CLOSE".index: i, value: str[i++] });
      continue;
    }

    // If it is a colon, continue splitting the string after the colon
    if (char === ":") {
      let name = "";
      let j = i + 1;

      // Matches all digits, case, and hyphens using the Unicode value of the character
      /^[0-9a-za-z -]$/, which is not worse than Unicode
      while (j < str.length) {
        const code = str.charCodeAt(j);

        if (
          / / ` ` 0-9
          (code >= 48 && code <= 57) | |// `A-Z`
          (code >= 65 && code <= 90) | |// `a-z`
          (code >= 97 && code <= 122) | |/ / ` _ `
          code === 95
        ) {
          name += str[j++];
          continue;
        }

        break;
      }

      if(! name)throw new TypeError(`Missing parameter name at ${i}`);

      // Split the string matched after the colon into tokens of type NAME
      tokens.push({ type: "NAME".index: i, value: name });
      i = j;
      continue;
    }

    // If the current character is an open parenthesis
    if (char === "(") {
      /** * count count of parentheses, is based on the stack to determine whether the left parentheses match */
      let count = 1;
      let pattern = "";
      let j = i + 1;

      if (str[j] === "?") {
        throw new TypeError(`Pattern cannot start with "?" at ${j}`);
      }

      while (j < str.length) {
        // If it is a character beginning with \ then get \ plus the following character, for example
        if (str[j] === "\ \") {
          pattern += str[j++] + str[j++];
          continue;
        }

        if (str[j] === ")") {
          // Count minus one
          count--;
          // Stop the current token character matching if all the left and right parentheses have been matched
          if (count === 0) {
            j++;
            break; }}else if (str[j] === "(") {
          count++;
          // (user(? XXX)) requires that the capture group must begin with a question mark
          if (str[j + 1]! = ="?") {
            throw new TypeError(`Capturing groups are not allowed at ${j}`); }}// Matches a string that matches the rule
        pattern += str[j++];
      }

      if (count) throw new TypeError(`Unbalanced pattern at ${i}`);
      if(! pattern)throw new TypeError(`Missing pattern at ${i}`);

      // Split the pattern part of (pattern) into tokens of pattern type
      tokens.push({ type: "PATTERN".index: i, value: pattern });
      i = j;
      continue;
    }

    // Other characters are split into tokens of type CHAR
    tokens.push({ type: "CHAR".index: i, value: str[i++] });
  }

  // Finally add a token of type END
  tokens.push({ type: "END".index: i, value: "" });

  return tokens;
}
Copy the code

Lexer is the core implementation of this library, the main logic is as follows:

  • Iterating through the characters of a string one by one
  • Label different characters differently, for exampleMODIFIER, CHAREtc.
  • Matches different characters according to different rules
  • Stores the character data matched by each ruletokensAn array of
  • Return the result

And here’s the thing to note:

  • Instead of using a finite state machine, lexical segmentation is simply consumed after traversal.
  • Matches the string following the colonUnicodeValue comparison method, flat replacement into regular/^[0-9a-zA-Z-]$/Can also be, after personal testing, performance is notUnicodePoor judgment
  • lexerTo determine whether the left and right parentheses match the logic, directly usedcountThe same implementation also has a stack scheme.
  • i++The logic of autoincrement and subtraction, the difference between the plus sign before and after, does not make any difference in ordinary use, butIn the assignment is the plus sign before the first increment and then assign, plus sign after the first assignment and then increment

ArrayToRegexp implementation

ArrayToRegexp = arrayToRegexp = arrayToRegexp = arrayToRegexp = arrayToRegexp

/** * converts an array to a regular object */
function arrayToRegexp(
  paths: Array<string | RegExp>, keys? : Key[], options? : TokensToRegexpOptions & ParseOptions) :RegExp {
  const parts = paths.map(path= > pathToRegexp(path, keys, options).source);
  return new RegExp(` (? :${parts.join("|")}) `, flags(options));
}
Copy the code

PathToRegexp implements the following logic:

  • Go through all of thempathAnd call thepathToRegexpTo obtainpathCorresponding regular expression text
  • Use all the text|Pieced together
  • Call againnew RegExpGenerates a new re object

Here are some things to note:

  • Regular parenthesis(a)Represents a capture group, which stores the matched content for use
  • (? :)Represents a non-capture group, that is, only matches are performed and the matching results are not stored

The compile principle

Compile fills the path string with data, for example:

const toPath = compile("/user/:id", { encode: encodeURIComponent });

toPath({ id: 123 }); //=> "/user/123"
Copy the code

Its implementation is as follows:

/** * Compile a string to a template function for the path
export function compile<P extends object = object> (str: string, options? : ParseOptions & TokensToFunctionOptions) {
  // Call the parse function first to parse the path string
  // Call tokensToFunction to populate the string
  return tokensToFunction<P>(parse(str, options), options);
}
Copy the code

TokensToFunction implementation:

/** * Expose a method for transforming tokens into the path function. */
export function tokensToFunction<P extends object = object> (tokens: Token[], options: TokensToFunctionOptions = {}) :PathFunction<P> {
  const reFlags = flags(options);
  const { encode = (x: string) = > x, validate = true } = options;

  // Compile all the tokens into regexps.
  // This method is used to create regular objects based on tokens
  // The validation is called when the user specifies that the incoming data should be validated
  const matches = tokens.map(token= > {
    if (typeof token === "object") {
      // Create a non-captured regular expression
      return new RegExp(` ^ (? :${token.pattern}) $`, reFlags); }});// Returns a function that restores the data population to the path string when called by the user
  return (data: Record<string, any> | null | undefined) = > {
    let path = "";

    /** * iterates through all tokens in turn, and populates */ if the user has passed in data with the same key
    for (let i = 0; i < tokens.length; i++) {
      const token = tokens[i];

      if (typeof token === "string") {
        path += token;
        continue;
      }

      const value = data ? data[token.name] : undefined;
      const optional = token.modifier === "?" || token.modifier === "*";
      const repeat = token.modifier === "*" || token.modifier === "+";

      // If the data passed in is an array, tiling is performed
      if (Array.isArray(value)) {
        if(! repeat) {throw new TypeError(
            `Expected "${token.name}" to not repeat, but got an array`
          );
        }

        if (value.length === 0) {
          if (optional) continue;

          throw new TypeError(`Expected "${token.name}" to not be empty`);
        }

        for (let j = 0; j < value.length; j++) {
          const segment = encode(value[j], token);

          // If the user sets the check for incoming data, the regular check is performed
          if(validate && ! (matches[i]as RegExp).test(segment)) {
            throw new TypeError(
              `Expected all "${token.name}" to match "${token.pattern}", but got "${segment}"`
            );
          }

          // Populate the URL with data restore
          path += token.prefix + segment + token.suffix;
        }

        continue;
      }

      if (typeof value === "string" || typeof value === "number") {
        const segment = encode(String(value), token);

        if(validate && ! (matches[i]as RegExp).test(segment)) {
          throw new TypeError(
            `Expected "${token.name}" to match "${token.pattern}", but got "${segment}"`
          );
        }

        // Populate the path with data restore
        path += token.prefix + segment + token.suffix;
        continue;
      }

      if (optional) continue;

      const typeOfMessage = repeat ? "an array" : "a string";
      throw new TypeError(`Expected "${token.name}" to be ${typeOfMessage}`);
    }

    return path;
  };
}
Copy the code

If you remove all the checksum logic, the core logic is:

  • Go through all of them in turntoken
  • If the user has the same incomingkeyIs used for string padding
  • The string filled with concatenation is eventually returned

conclusion

  • Take a look at how the Node Attachment download service works in the Content-Disposition source code

  • Explain the implementation principle of NodeJs static file hosting service in “Send” source code

  • Explain the Node middleware model architecture in the source code of Connect

  • NodeJS techniques in live-Server source code

  • Ts Master: 22 examples to delve into Ts’s most obscure advanced type tools 👍 1.5K

  • Here are 28 Canvas libraries that will make you scream “wow!” 👍 852

  • In-depth explanation of VsCode advanced debugging and use skills in various scenarios

I am leng hammer, I and the front end of the story continues……