0 x000 overview

Again soon started learning basic computer things, such as computer composition principle, network principle, compiling principle, things like that, at present just in learning compiling principle, began to interested in a piece of this stuff, but the study of the theory of a bit dull, decided to change my ways, that is the first practice, encounter problems trying to solve, promote theory with practice. Originally intended to write a Chinese JS analysis, but it seems a little difficult, first to find a simple to do it, that is to parse the four operations, there is this project, declaration: this is a very simple project, this is a very simple project, this is a very simple project. Which used lexical analysis, grammar analysis, automata are simple way to achieve, after all, more vegetables.

0 x001 effect

  • Source address: Github

  • Functions:

    • Four in any order+ - * /Positive integer operation
    • support(a)
    • Universal front-end and back-end
    • Provides direct calculation functions
    • Provides four operational expressions to invert polish AST functions
    • Provide syntax analysis functions (temporarily only support up and down character determination)
  • Effect demonstration:

0 x002 implementation

Since it is very simple, no matter the theory used and the way of implementation must be very simple. To achieve this effect, there are three problems to overcome:

  1. How to implement priority calculation, for example* / ()The priority of is greater than+-.
  2. How to split strings, such as how to recognize numbers, symbols, and error characters, is known as morpheme.
  3. How to implement syntax checking, that is, to make the rules of an expression meet the requirements, for example+Following like followingdigitalor((there will be-As an operation, not a symbol).

0x003 Problem Solved 1: How to implement priority arithmetic

1. Ignore priorities for now

If there is no priority problem, it is very easy to implement a calculation, such as the following code can implement a simple addition and subtraction or multiplication and division (up to 10, more than one digit will cause problem 2, here is a simpler, avoid problem 2) :

        let calc = (input) => {
            let calMap = {
                '+': (num1, num2) => num1 + num2,
                The '-': (num1, num2) => num1 - num2,
                The '*': (num1, num2) => num1 * num2,
                '/': (num1, num2) => num1 / num2,
            }
            input = [...input].reverse()
            while (input.length >= 2) {
                let num1 = +input.pop()
                let op = input.pop()
                let num2 = +input.pop()
                input.push(calMap[op](num1, num2))
            }
            return input[0]
        }

        expect(calc('1 + 2 + 3 + 4 + 5-1')).toEqual(14)
        expect(calc('1 * 2 * 3/3')).toEqual(2)
Copy the code

Algorithm steps:

  • To break the input into a stack, each number has only one digit since it is less than 10:
    input = [...input].reverse()
    Copy the code
  • The first digit is the number, the second is the operator, and the third digit is the number:
    let num1 = +input.pop()
    let op = input.pop()
    let num2 = +input.pop()
    Copy the code
  • The operator pushes the result back onto the stack, creating a process until there is only one number left in the stack, or three at a time, so if the stack depth is <=2, that is the final result:
    while (input.length >= 2) {
        // ......
        input.push(calMap[op](num1, num2))
    }
    Copy the code

Animation demo:

2. Prioritize

But now we need to worry about the precedence, for example, */ has the highest precedence than +-, (), so how do we solve this, in fact, there is a solution, I use suffix expression, also called inverse Polish

  • Postfix expressions: postfix expressions place operators at the end of expressions, such as1 + 1Represented as11 +.
  • Infix expression: the so-called infix expression, in fact, we usually use the writing method, here do not do in-depth.
  • Prefix expressions are postfix expressions that place operators at the beginning of the expression, such as1 + 1Represented as+ 11I’m not going to go into that

The inverse Polish form can be seen in the following articles

  • Wiki- Inverse Polish notation
  • What is an inverse Polish expression

3. Reverse Polish priorities

In the inverse Polish formula, 1+1*2 can be converted to 112*+ :

 let calc = (input) => {
    let calMap = {
        '+': (num1, num2) => num1 + num2,
        The '-': (num1, num2) => num1 - num2,
        The '*': (num1, num2) => num1 * num2,
        '/': (num1, num2) => num1 / num2,
    }
    input = [...input].reverse()
    let resultStack = []
    while (input.length) {
        let token = input.pop()
        if (/[0-9]/.test(token)) {
            resultStack.push(token)
            continue
        }
        if (/[+\-*/]/.test(token)) {
            let num1 = +resultStack.pop()
            let num2 = +resultStack.pop()
            resultStack.push(calMap[token](num1, num2))
            continue}}return resultStack[0]
}
expect(calc(123 * '+')).toEqual(7)
Copy the code

After transformation, the calculation steps are as follows:

  1. Initialize a stack
        let resultStack = []
    Copy the code
  2. We take one bit at a time from the expression
    let token = input.pop()
    Copy the code
  3. If it is a number, it is pushed onto the stack
    if (/[0-9]/.test(token)) {
        resultStack.push(token)
        continue
    }
    Copy the code
  4. If it is an operator, it takes two numbers from the stack, performs the corresponding operation, and pushes the result onto the stack
    if (/[+\-*/]/.test(token)) {
        let num1 = +resultStack.pop()
        let num2 = +resultStack.pop()
        resultStack.push(calMap[token](num1, num2))
        continue
    }
    Copy the code
  5. If the expression is not empty, go to step 2. If the expression is empty, the number on the stack is the final result and the calculation is complete
    while (input.length) {
        // ...
    }
    return resultStack[0]
    Copy the code

Animation demo:

The inverse Polish form has two advantages:

  • You don’t care about operator priority
  • Remove the parentheses, for example(1 + 2) * (3 + 4)Can be converted to12 + 34 + *, according to the inverse Polish operation method can be completed

4. Suffix to suffix

This is the last minor problem for Question 1, which is implemented as follows:

let parse = (input) => {
            input = [...input].reverse()
            let resultStack = [], opStack = []
            while (input.length) {
                let token = input.pop()
                if (/[0-9]/.test(token)) {
                    resultStack.push(token)
                    continue
                }
                if (/[+\-*/]/.test(token)) {
                    opStack.push(token)
                    continue}}return [...resultStack, ...opStack.reverse()].join(' ')
        }

        expect(parse(`1+2-3+4-5`)).toEqual('12 + 3-4 + 5 -)
Copy the code

Prepare two stacks, one for the result, one for the stack operator, and finally concatenate the two stacks. The above implementation can convert 1+2-3+4-5 to 12+3-4+5-, but if priority is involved, it can’t do anything, for example

        expect(parse(`1+2*3`)).toEqual(123 * '+')
Copy the code

The conversion result of 1+2*3 should be 123*+, but in fact it is 123+*, */ has a higher priority than +, so the following changes should be made

 let parse = (input) => {
            input = [...input].reverse()
            let resultStack = [], opStack = []
            while (input.length) {
                let token = input.pop()
                if (/[0-9]/.test(token)) {
                    resultStack.push(token)
                    continue} / /if (/[+\-*/]/.test(token)) {
//                    opStack.push(token)
//                    continue
//                }
                if (/[*/]/.test(token)) {
                    while (opStack.length) {
                        let preOp = opStack.pop()
                        if (/[+\-]/.test(preOp)) {
                            opStack.push(preOp)
                            opStack.push(token)
                            token = null
                            break
                        } else {
                            resultStack.push(preOp)
                            continue
                        }
                    }
                    token && opStack.push(token)
                    continue
                }
                if (/[+\-]/.test(token)) {
                    while (opStack.length) {
                        resultStack.push(opStack.pop())
                    }
                    opStack.push(token)
                    continue}}return [...resultStack, ...opStack.reverse()].join(' ')
        }

        expect(parse(`1+2`)).toEqual('12 +')
        expect(parse(`1+2*3`)).toEqual(123 * '+')
Copy the code
  1. When the operator is* /, take out the top element of the stack to determine whether the priority of the elements in the stack is lower than* /If so, just push the operator inopStack, and exits, otherwise pushing the element out of the stack all the way inresultStack.
if(/ + \ [-] /. The test (preOp)) {opStack. Push (preOp) / / use the stack to make judgment here, so after judgment also have to go back... opStack.push(token) token = nullbreak
}else {
    resultStack.push(preOp)
    continue
}
Copy the code
  1. Note also that the stack is empty, requiring the operator to be pushed directly onto the stack.
    token && opStack.push(token)
    continue
Copy the code
  1. When the operator is+-Is the lowest priority, so simply push all operators off the stack
if (/[+\-]/.test(token)) {
    while (opStack.length) {
        resultStack.push(opStack.pop())
    }
    opStack.push(token)
    continue
}
Copy the code

The +-*/ priority problem has been solved so far, only the () priority problem is left, which is the highest priority, so it can be modified as follows:

if (/[+\-]/.test(token)) {
    while (opStack.length) {
        let op=opStack.pop()
        if (/\(/.test(op)){
            opStack.push(op)
            break
        }
        resultStack.push(op)
    }
    opStack.push(token)
    continue
}
if (/\(/.test(token)) {
    opStack.push(token)
    continue
}
if (/\)/.test(token)) {
    let preOp = opStack.pop()
    while(preOp ! = ='('&&opStack.length) {
        resultStack.push(preOp)
        preOp = opStack.pop()
    }
    continue
}
Copy the code
  1. When the operator is+-When no longer brainless pop up if it is(I’m not going to pop it
while (opStack.length) {
        let op=opStack.pop()
        if (/\(/.test(op)){
            opStack.push(op)
            break
        }
        resultStack.push(op)
    }
    opStack.push(token)
Copy the code
  1. When the operator is(And then push it inopStack
if (/\(/.test(token)) {
    opStack.push(token)
    continue
}
Copy the code
  1. When the operator is)And keep poppingopStacktoresultStackUntil you come across(.(Don’t pushresultStack
if (/\)/.test(token)) {
    let preOp = opStack.pop()
    while(preOp ! = ='('&&opStack.length) {
        resultStack.push(preOp)
        preOp = opStack.pop()
    }
    continue
}
Copy the code

Animation examples:

calc(parse(input))
Infix => suffix => calculation

0x004 Solve Problem 2: Split string

Although the big problem of infix => suffix => calculation has been solved above, the most basic problem has not been solved, that is, the input problem. In the process of solving problem 1 above, the input is just a simple cut, but also limited to 10. And then, the next thing to solve is this input problem, how do you split the input to meet the requirements?

  • Solution 1: re, although the re can be done as follows, do a simpledemoIt worked, but it wasn’t very good for grammar checking and things like that, so it wasn’t good, so I abandoned it
    (1 + 22) * (333 + 4444) `. Match (/ ([0-9] +) | | ([+ / - * /]) (\ () | (\))/g) / / / / output (11) ["("."1"."+"."22".")"."*"."("."333"."+"."4444".")"]
    Copy the code
  • Solution 2: Character by character analysis
    while(input.length){
        let token = input.pop()
        if(/[0-9]/.test(token)) // Enter the number analysisif(/[+\-*/\(\)]/.test(token))Copy the code

Next try solution 2 to solve this problem:

1 Define the node structure

When we split, we don’t just store values, but store each node into a similar structure, which can be represented by objects:

{
    type:' ',
    value:' '
}
Copy the code

Where, type is the node type, and all possible types in the four operations can be summarized as follows:

    TYPE_NUMBER: 'TYPE_NUMBER'// Number TYPE_LEFT_BRACKET:'TYPE_LEFT_BRACKET', // (
    TYPE_RIGHT_BRACKET: 'TYPE_RIGHT_BRACKET', // )
    TYPE_OPERATION_ADD: 'TYPE_OPERATION_ADD', // +
    TYPE_OPERATION_SUB: 'TYPE_OPERATION_SUB', // -
    TYPE_OPERATION_MUL: 'TYPE_OPERATION_MUL', // *
    TYPE_OPERATION_DIV: 'TYPE_OPERATION_DIV'/ / /Copy the code

Value is the corresponding real value, such as 123, +, -, *, and /.

2 Digital processing

If it is a number, continue reading until it is not a number, place all the read results in value, and join the queue.

if (token.match(/[0-9]/)) {
    let next = tokens.pop()
    while(next ! == undefined) {if(! next.match(/[0-9]/))break
        token += next
        next = tokens.pop()
    }
    result.push({
        type: type.TYPE_NUMBER,
        value: +token
    })
    token = next
}
Copy the code

3 Symbol processing

First, define a symbol and type mapping table. If it is not in the table, it indicates that the input is abnormal and an exception is thrown. If it is, it indicates that the input is normal and you can join the queue.

const opMap = {
    '(': type.TYPE_LEFT_BRACKET,
    ') ': type.TYPE_RIGHT_BRACKET,
    '+': type.TYPE_OPERATION_ADD,
    The '-': type.TYPE_OPERATION_SUB,
    The '*': type.TYPE_OPERATION_MUL,
    '/': type.TYPE_OPERATION_DIV
}
let type = opMap[token]
if (!type) throw `error input: ${token}`
result.push({
    type,
    value: token,
})
Copy the code

4 summarizes

Parse (tokenize())) calc(parse(tokenize()))) can now parse(tokenize()).

0x005 Resolve Problem 3: Syntax check

Syntax to detect the problem to be solved is to judge the correctness of the input, whether meet the rules of the arithmetic, the thought of here with a similar machine, but simple to explode, and can only do one step judge grammar ~ ~ to define a table, the table can appear behind defines a node type, for example, + can only appear Numbers or behind (and so on.

let syntax = {
    [type.TYPE_NUMBER]: [
        type.TYPE_OPERATION_ADD,
        type.TYPE_OPERATION_SUB,
        type.TYPE_OPERATION_MUL,
        type.TYPE_OPERATION_DIV,
        type.TYPE_RIGHT_BRACKET
    ],
    [type.TYPE_OPERATION_ADD]: [
        type.TYPE_NUMBER,
        type.TYPE_LEFT_BRACKET
    ],
    //...
}
Copy the code

Then we can easily use the following syntax determination methods:

 while (tokens.length) {
    // ...
    let next = tokens.pop()
    if(! syntax[token.type].includes(next.type)) throw `syntax error:${token.value} -> ${next.value}`
    // ...
 }
Copy the code

For (), the reference count is used, if (, then +1, if), then -1. When detecting the end, it is good to determine the count:

    // ...
    if (token.type === type.TYPE_LEFT_BRACKET) {
        bracketCount++
    }
    // ...
    if (next.type === type.TYPE_RIGHT_BRACKET) {
        bracketCount--
    }
    // ...
    if (bracketCount < 0) {
        throw `syntax error: toooooo much ) -> )`
    }
    // ...
Copy the code

0 x006 summary

  • There are some problems with this article:
    1. I don’t know why I use the inverse Polish form, I just know that there is a solution, and I just use it, not derive the solution from the problem.
    2. Not enough writing skills, not enough cool.
  • There are also some problems with this implementation:
    1. Not completely with the idea of compiler principles to achieve, but their own solutions, first practice, then understand the problem.
    2. It doesn’t reference other people’s implementations too much, it’s kind of a closed-door feeling.
  • Think about:
    1. for(a)Maybe you can use a recursive approach to enter(a)Then start over with a new expression parsing
    2. Not enough thinking, not enough unit testing coverage, not enough holes to know where

In short: the article ends here, there are a lot of not detailed places please forgive me, more exchanges, grow together.

0 x007 resources

  • Principles of Compilation
  • The source code
  • Principle, an animation software