preface

When doing a Node monitoring system to do a mail alarm requirements, at this time you need to define a set of rules to write triggering alarm expression, this article will introduce how to write a simple expression parser. Attached is a screenshot of the interface showing an example of an expression.

Reference Book: Programming Language Implementation Patterns

Identify some ground rules

Before you start writing, you need to determine what capabilities your expression needs to have. The expression in this paper is the condition to judge whether to trigger the alarm, and the rule of the expression should not be too complicated. At the same time, since the user who writes the expression is front-end developer or Node developer, so the syntax is close to the JS expression syntax is most appropriate:

  • Supports variables, which start with @ and consist of letters and digits, for example, @load15
  • Supports constants: Boolean values, numbers, and strings
  • Support addition (+), subtraction (-), multiplication (*), division (/), mod (%) operations
  • Support congruence (===), unequal (! ==), greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=) and other comparison operations
  • Support and (&), or (| |), not (!) Equal logical operation
  • The extension string contains the include operation
  • Support for parentheses ()

Building an Abstract Syntax Tree (AST)

The purpose of constructing an abstract syntax tree is to design a data structure that illustrates the meaning of an expression. For example, @var > 5, as a string in the code, does not specify how it is evaluated.

If parsed into the following abstract syntax tree:

The root node is the operator and the child node is the operator, so we abstract an operation with a binary tree.

Of course, the above example is a relatively simple operation expression. Once the expression becomes complicated, the priority of the operation cannot be avoided. How does the binary tree represent the priority of the operation?

Let’s use two expressions to illustrate:

As can be seen from the figure, the operation order of the second expression is changed after the parentheses are added, and the corresponding abstract syntax tree is also changed. It is easy to see that: the higher the priority of the operation, the lower the position in the syntax tree.

An example of an AST data structure is the following expression

@load > 1 + 5
Copy the code

The parsed token is:

{
  "type": "LOGICAL_EXP"."operator": ">"."left": {
    "type": "VARIABLE"."value": "load"."raw": "@load"
  },
  "right": {
    "type": "BINARY_EXP"."operator": "+"."left": {
      "type": "NUMBER"."value": 1,
      "raw": "1"
    },
    "right": {
      "type": "NUMBER"."value": 5,
      "raw": "5"}}}Copy the code

Now that we’ve determined the structure of the abstract syntax tree, we can start thinking about how to parse expression strings.

Use the LL(1) recursively descending lexical parser

Tokenizing is the grammar of characters, similar to the way we use letters to form a word in a certain grammar.

The LL(1) recursive descending lexical parser is a top-down parser that looks forward to a lexical unit. Both L’s in LL are “left-to-right”. The first L means that the parser parses the input in “right-to-right” order, and the second L means that descending parsing also traverses child nodes from left to right.

This is the simplest parser, and it is easier to use this pattern when it meets your needs.

List of lexical units

Since we are looking ahead to a lexical unit, we should first list the lexical units that this expression can have. The expression in this article might have the following lexical units:

  • Variable lexical unit, flag: begins with “@”
  • Numeric lexical unit, symbol: beginning with 0-9 or “.”
  • String lexical unit, flag: begins with single or double quotation marks
  • Parenthesis lexical unit, symbol: begins with an open parenthesis
  • Unary operator lexical unit, marked with “!” “, “+”, “-“

Now we’re ready to start writing code.

Look for recursion points

An expression can be decomposed into multiple expressions without parentheses, such as the following expression:

5 * (3 + 2 * (5 + 6))
Copy the code

We can break it down into the following expressions:

5 * expression_a
3 + 2 * expression_b // expression_a 
5 + 6 // expression_b
Copy the code

The entire string represents an expression, and that’s the recursion point we’re looking for. Define a function eatExpression to parse an expression, and recurse eatExpression when parentheses are encountered until no lexical unit matches.

So let’s start coding.

class ExpressionParser

class ExpressionParser {
  constructor(expr) {
    if (typeofexpr ! = ='string') {
      throw new Error(`[expression-parser] constructor need a string parameter, but get [The ${typeof expr}] `);
    }
    this.expr = expr;
    this.parse();
  }

  parse() {
    this.index = 0;
    this.tokens = this.eatExpression();
    if (this.index < this.expr.length) {
      this.throwError('illegal character'The ${this.charAt()}"`); }}}const expression
const parser = new ExpressionParser(expression)
Copy the code

This is the parser class. The initial job is to save expression, and then use eatExpression to parse it. When we start parsing, we look at it bit by bit.

this.index // Marks the subscript of the character currently traversed

// Gets the current traversal character
charAt(index = this.index) {
  return this.expr.charAt(index);
}

// Get the Unicode encoding of the current character
charCodeAt(index = this.index) {
  return this.expr.charCodeAt(index);
}
Copy the code

ExpressionParser.prototype.eatExpression

This function is the entry point to the entire parser, and the idea of this function is important. An expression can be divided into two types:

  • Expressions with binary operators
  • Expressions without binary operators

Expressions without binary operators are easy to iterate over the set of lexical units, whereas expressions with binary operators and more than one are more complicated because the order of parsing is left to right, and the order of the operators is uncertain.

So how to solve it? Here is an example of a process flow to illustrate the core priority calculation idea:

The main idea is to use a stack to store the parsed token on the stack. When the parsed operator (unary operator parsed into token, here refers to binary operator) is parsed to the operator closest to the top of the stack. If the newly parsed operator is found to have a higher priority, it is directly pushed into the stack. Therefore, in the stack, the operators are guaranteed to be of low to high priority. Once the newly parsed operators are of lower priority, the tokens in the stack can be combined into a node until all the operators in the stack are of low to high priority. Finally, synthesize node from right to left to get a complete expression binary tree.

The core idea is to ensure that the stack priority from low to high, the following code to consolidate understanding:

eatExpression() {
    let left = this.eatToken();
    let operator = this.eatBinaryOperator();
    // Indicates that the operation tree has only the left side
    if(! operator) {return left;
    }
    let operatorInfo = {
        precedence: this.getOperatorPrecedence(operator), // Get operator priority
        value: operator,
    };
    let right = this.eatToken();
    if(! right) {this.throwError(`"${operator}"Operator should be followed by expression ');
    }
    const stack = [left, operatorInfo, right];
    // Get the next operator
    while (operator = this.eatBinaryOperator()) {
        const precedence = this.getOperatorPrecedence(operator);
        // If there is an illegal yuan FA
        if (precedence === 0) {
            break;
        }
        operatorInfo = {
            precedence,
            value: operator,
        };
        while (stack.length > 2 && precedence < stack[stack.length - 2].precedence) {
            right = stack.pop();
            operator = stack.pop().value;
            left = stack.pop();
            const node = this.createNode(operator, left, right);
            stack.push(node);
        }
        const node = this.eatToken();
        if(! node) {this.throwError(`"${operator}"Operator should be followed by expression ');
        }
        stack.push(operatorInfo, node);
    }
    let i = stack.length - 1;
    let node = stack[i];
    while (i > 1) {
        node = this.createNode(stack[i - 1].value, stack[i - 2], node);
        i -= 2;
    }
    return node;
}
Copy the code

createNode:

const LOGICAL_OPERATORS = ['| |'.'&'.'= = ='.'! = = '.'>'.'<'.'> ='.'< ='.'include']; . createNode(operator, left, right) {consttype = LOGICAL_OPERATORS.indexOf(operator) ! = =- 1 ? 'LOGICAL_EXP' : 'BINARY_EXP';
    return {
        type,
        operator,
        left,
        right,
    };
}
Copy the code

getOperatorPrecedence:

const BINARY_OPERATORS = {
    '| |': 1.'&': 2.'= = =': 6.'! = = ': 6.'<': 7.'>': 7.'< =': 7.'> =': 7.'+': 9.The '-': 9.The '*': 10.'/': 10.The '%': 10.include: 11}; . getOperatorPrecedence(operator) {return BINARY_OPERATORS[operator] || 0;
}
Copy the code

I believe that now you have a clear idea of the whole idea. Next, there is another important thing to talk about, which is the resolution of token

Token parsing

The process of token parsing can be imagined as having a subscript moving from left to right character by character, encountering a recognizable token opening flag, parsing the token, and then continuing to move the subscript until the entire string ends or no recognizable flag is encountered.

See the complete code to see how each type of token is matched.

Calculated value

After the token is resolved, we need to calculate the value of the expression. Since the variable exists, we need to provide a context for the variable to obtain the value, such as:

const expr = new ExpressionParser('@load > 5');
console.log(expr.valueOf({ load: 8 })); // true
Copy the code

Since we’ve already generated the expression into a binary tree, we can simply recursively iterate over each binary tree, and since it’s recursive, the lower the tree is computed earlier, which is consistent with the way we started designing priorities.

Complete code:

const OPEN_PAREN_CODE = 40; // ( const CLOSE_PAREN_CODE = 41; // ) const VARIABLE_BEGIN_CODE = 64; // @, start of variable const PERIOD_CODE = 46; //'. 'const SINGLE_QUOTE_CODE = 39; // single quote const DOUBLE_QUOTE_CODE = 34; // double quotes const SPACE_CODES = [32, 9, 10, 13]; // space // operator const UNARY_OPERATORS = {The '-': true.'! ': true.'+': true}; // const LOGICAL_OPERATORS = ['| |'.'&'.'= = ='.'! = = '.'>'.'<'.'> ='.'< ='.'include'];
const BINARY_OPERATORS = {
  '| |': 1,
  '&': 2.'= = =': 6, '! = = ': 6,
  '<': 7, '>': 7, '< =': 7, '> =': 7,
  '+': 9, The '-': 9,
  The '*': 10, '/': 10, The '%': 10, include: 11, }; Const getMaxKeyLen = const getMaxKeyLen =function getMaxKeyLen(obj) {
  let max = 0;
  Object.keys(obj).forEach((key) => {
    if(key.length > max && obj.hasOwnProperty(key)) { max = key.length; }});return max;
};
const maxBinaryOperatorLength = getMaxKeyLen(BINARY_OPERATORS);
const maxUnaryOperatorLength = getMaxKeyLen(UNARY_OPERATORS);

class ExpressionParser {
  constructor(expr) {
    if(typeof expr ! = ='string') {
      throw new Error(`[expression-parser] constructor need a string parameter, but get [${typeof expr}] `); } this.expr = expr; }parse() {
    this.index = 0;
    try {
      this.tokens = this.eatExpression();
      if(this.index < this.expr. Length) {throw new Error(' invalid character"${this.charAt()}"`);
      }
    } catch (error) {
      this.tokens = undefined;
      if (typeof this.onErrorCallback === 'function') {
        this.onErrorCallback(error.message, this.index, this.charAt());
      } else{ throw new Error(error.message); }}return this;
  }

  eatExpression() {
    let left = this.eatToken();
    letoperator = this.eatBinaryOperator(); // Indicates that the operation tree has only the left sideif(! operator) {return left;
    }
    letOperatorInfo = {precedence: enclosing getOperatorPrecedence (operator), / / to get the operator priority value: operator,};let right = this.eatToken();
    if(! right) { throw new Error(`"${operator}"Operator should be followed by expression '); } const stack = [left, operatorInfo, right]; // Get the next operatorwhile(operator = this.eatBinaryOperator()) { const precedence = this.getOperatorPrecedence(operator); // If there is an illegal yuan FAif (precedence === 0) {
        break;
      }
      operatorInfo = {
        precedence,
        value: operator,
      };
      while (stack.length > 2 && precedence < stack[stack.length - 2].precedence) {
        right = stack.pop();
        operator = stack.pop().value;
        left = stack.pop();
        const node = this.createNode(operator, left, right);
        stack.push(node);
      }
      const node = this.eatToken();
      if(! node) { throw new Error(`"${operator}"Operator should be followed by expression '); } stack.push(operatorInfo, node); }let i = stack.length - 1;
    let node = stack[i];
    while (i > 1) {
      node = this.createNode(stack[i - 1].value, stack[i - 2], node);
      i -= 2;
    }
    return node;
  }

  eatToken() {
    this.eatSpaces();
    const ch = this.charCodeAt();
    if(ch === VARIABLE_BEGIN_CODE) {// variablereturn this.eatVariable();
    } else if(this isDigit (ch) | | ch = = = PERIOD_CODE) {/ / numberreturn this.eatNumber();
    } else if(ch = = = SINGLE_QUOTE_CODE | | ch = = = DOUBLE_QUOTE_CODE) {/ / stringreturn this.eatString();
    } else if(ch === OPEN_PAREN_CODE) {// parenthesesreturn this.eatGroup();
    } else{// Check the list operator! + -let toCheck = this.expr.substr(this.index, maxUnaryOperatorLength);
      let toCheckLength;
      while (toCheckLength = toCheck.length) {
        if (
          UNARY_OPERATORS.hasOwnProperty(toCheck) &&
          this.index + toCheckLength <= this.expr.length
        ) {
          this.index += toCheckLength;
          return {
            type: 'UNARY_EXP', operator: toCheck, argument: this.eatToken(), }; } toCheck = toCheck.substr(0, --toCheckLength); }}}eatGroup() {
    this.index++; // eat "("
    const node = this.eatExpression();
    this.eatSpaces();
    const ch = this.charCodeAt();
    if(ch ! == CLOSE_PAREN_CODE) { throw new Error('Parentheses are not closed');
    } else {
      this.index++; // eat ")"
      returnnode; }}eatVariable() {
    const ch = this.charAt();
    this.index++; // eat "@"
    const start = this.index;
    while (this.index < this.expr.length) {
      const ch = this.charCodeAt();
      if (this.isVariablePart(ch)) {
        this.index++;
      } else {
        break;
      }
    }
    const identifier = this.expr.slice(start, this.index);
    return {
      type: 'VARIABLE',
      value: identifier,
      raw: ch + identifier,
    };
  }

  eatNumber() {
    let number = ' '; // Start with a numberwhile(this.isDigit(this.charCodeAt())) { number += this.charAt(this.index++); } / /'. 'At the beginningif (this.charCodeAt() === PERIOD_CODE) {
      number += this.charAt(this.index++);
      while(this.isDigit(this.charCodeAt())) { number += this.charAt(this.index++); }} // Scientific notationlet ch = this.charAt();
    if (ch === 'e' || ch === 'E') {
      number += this.charAt(this.index++);
      ch = this.charAt();
      if (ch === '+' || ch === The '-') {
        number += this.charAt(this.index++);
      }
      while(this.isDigit(this.charCodeAt())) { number += this.charAt(this.index++); } // If e + - is not followed by a number, an error is reportedif(! This.isdigit (this.charcodeat (this.index - 1))) {throw new Error(' Error ')${number}${this.charAt()}), lack of index '); }}return {
      type: 'NUMBER',
      value: parseFloat(number),
      raw: number,
    };
  }

  eatString() {
    let str = ' ';
    const quote = this.charAt(this.index++);
    let closed = false;
    while (this.index < this.expr.length) {
      let ch = this.charAt(this.index++);
      if (ch === quote) {
        closed = true;
        break;
      } else if (ch === '\ \') {
        // Check for all of the common escape codes
        ch = this.charAt(this.index++);
        switch (ch) {
          case 'n':
            str += '\n';
            break;
          case 'r':
            str += '\r';
            break;
          case 't':
            str += '\t';
            break;
          case 'b':
            str += '\b';
            break;
          case 'f':
            str += '\f';
            break;
          case 'v':
            str += '\x0B';
            break; default: str += ch; }}else{ str += ch; }}if(! Closed) {throw new Error(' character"${str}"Missing closing parenthesis'); }return {
      type: 'STRING',
      value: str,
      raw: quote + str + quote,
    };
  }

  eatBinaryOperator() {
    this.eatSpaces();
    let toCheck = this.expr.substr(this.index, maxBinaryOperatorLength);
    let toCheckLength = toCheck.length;
    while (toCheckLength) {
      if (
        BINARY_OPERATORS.hasOwnProperty(toCheck) &&
        this.index + toCheckLength <= this.expr.length
      ) {
        this.index += toCheckLength;
        return toCheck;
      }
      toCheck = toCheck.substr(0, --toCheckLength);
    }
    return false;
  }

  getOperatorPrecedence(operator) {
    return BINARY_OPERATORS[operator] || 0;
  }

  createNode(operator, left, right) {
    const type = LOGICAL_OPERATORS.indexOf(operator) !== -1
      ? 'LOGICAL_EXP'
      : 'BINARY_EXP';
    return {
      type,
      operator,
      left,
      right,
    };
  }

  isVariablePart(ch) {
    return(ch >= 65 && ch <= 90) || // A... Z (ch >= 97 && ch <= 122) || // a... z (ch >= 48 && ch <= 57); / / 0... 9 } isDigit(ch) {returnch >= 48 && ch <= 57; / / 0... 9}eatSpaces() {
    let ch = this.charCodeAt();
    while(SPACE_CODES.indexOf(ch) ! == -1) { ch = this.charCodeAt(++this.index); } } onError(callback) { this.onErrorCallback = callback;return this;
  }

  charAt(index = this.index) {
    return this.expr.charAt(index);
  }

  charCodeAt(index = this.index) {
    return this.expr.charCodeAt(index);
  }

  valueOf(scope = {}) {
    if (this.tokens == null) {
      return undefined;
    }
    const value = this.getNodeValue(this.tokens, scope);
    return!!!!! value; } getNodeValue(node, scope = {}) { const {type, value, left, right, operator } = node;
    if (type= = ='VARIABLE') {
      return scope[value];
    } else if (type= = ='NUMBER' || type= = ='STRING') {
      return value;
    } else if (type= = ='LOGICAL_EXP') { const leftValue = this.getNodeValue(left, scope); / / if it's logical operation && and | |, you may not need to parse the value on the rightif (operator === '&' && !leftValue) {
        return false;
      }
      if (operator === '| |'&&!!!!! leftValue) {return true;
      }
      const rightValue = this.getNodeValue(right, scope);
      switch (node.operator) {
        case '&': return leftValue && rightValue;
        case '| |': return leftValue || rightValue;
        case '>': return leftValue > rightValue;
        case '> =': return leftValue >= rightValue;
        case '<': return leftValue < rightValue;
        case '< =': return leftValue <= rightValue;
        case '= = =': return leftValue === rightValue;
        case '! = = ': returnleftValue ! == rightValue;case 'include': returnleftValue.toString && rightValue.toString && leftValue.toString().indexOf(rightValue.toString()) ! = = 1; // skip defaultcase}}else if (type= = ='BINARY_EXP') {
      const leftValue = this.getNodeValue(left, scope);
      const rightValue = this.getNodeValue(right, scope);
      switch (node.operator) {
        case '+': return leftValue + rightValue;
        case The '-': return leftValue - rightValue;
        case The '*': return leftValue * rightValue;
        case '/': return leftValue - rightValue;
        case The '%': return leftValue % rightValue;
        // skip default case}}else if (type= = ='UNARY_EXP') {
      switch (node.operator) {
        case '! ': return! this.getNodeValue(node.argument, scope);case '+': return +this.getNodeValue(node.argument, scope);
        case The '-': return -this.getNodeValue(node.argument, scope);
        // skip default case
      }
    }
  }
}

const expression = new ExpressionParser('@load + 3');
expression.onError((message, index, ch) => {
  console.log(message, index, ch);
}).parse();
console.log(expression.valueOf({ load: 5 }));
Copy the code