This is the 27th day of my participation in the August More Text Challenge
1. Decrypt the arithmetic verification code
In the first two years of my work, I wrote some crawler programs, including the commodity data of JINGdong and the video resources of Today’s Film and Television. Some resources are easy to climb, just send an HTTP request and the server will return data to you without any processing. But for a few more precious data, the server will do “crawler” processing, I used the crawl third-party website article when they met, fortunately the somebody else’s crawler mechanism is simple: give a picture, picture is inside a “arithmetic”, you must enter the correct answer to arithmetic, the server will respond to the entire contents of the article. The arithmetic problems are very simple four operations, the kind elementary school students can do, so they are easy to crack.
Cracking of thinking is very simple, it is my first call the baidu “OCR” image recognition into text, so that I can get an expression string, such as “25 + 32”, analytical expressions for the next step is to write code calculation value, then the result value as a parameter to the request interface, data will be able to get to the article.
Call Baidu OCR recognition here will not speak, that is Baidu do work, I am only responsible for the tuning interface. This article focuses on how to parse an “arithmetic expression” in “interpreter mode” and evaluate the resulting value.
Now suppose we get the following expression:
1 + 2 + 3Copy the code
If we examine this expression, it has two types of elements: operands and operators. Operands are symbols such as: 1, 2, and 3. They represent only one value and do not require any processing. Therefore, they are also called terminal symbols. Operators are symbols like +, and we need to write algorithms to deal with them. We need two operands for each operator, otherwise the formula won’t work. Operators are also called “non-terminal symbols.” The common point of both types of elements is that they are parsed. The difference is that all operands have the same function. They represent only one value and can therefore be represented by a class. But different operators need to be interpreted by different algorithms, so different classes must be defined, addition requires an addition parser, and subtraction requires a subtraction parser.
After analysis, we tried to describe the process in code. The class diagram design is as follows:
Expression is an abstraction of lexical elements. VarExpression resolves operands, SymbolExpression resolves operators, AddExpression resolves addition and SubExpression resolves subtraction.
Once the parsing is done, we also need to arrange the sequence of operations from left to right, multiplying and dividing first, then adding and subtracting (for space reasons, multiplying and subtracting are not considered), and we also need to save the resulting value of the expression calculation, so we also need a wrapper class Calculator.
Parser abstract Expression:
abstract class Expression {
// Interpret the expression and get the result
abstract int interpreter(a);
}
Copy the code
The operand parser VarExpression, which is quite simple, transforms parsed operand characters into numbers:
class VarExpression extends Expression{
String key;
@Override
int interpreter(a) {
returnInteger.valueOf(key); }}Copy the code
The abstract operator parser SymbolExpression, where each operator must correspond to the left and right operands or else the formula cannot be computed:
abstract class SymbolExpression extends Expression {
Expression left;// Left expression
Expression right;// Right expression
}
Copy the code
Addition resolver AddExpression:
class AddExpression extends SymbolExpression{
@Override
public int interpreter(a) {
returnleft.interpreter() + right.interpreter(); }}Copy the code
Subtraction resolver SubExpression:
class SubExpression extends SymbolExpression{
@Override
public int interpreter(a) {
returnleft.interpreter() - right.interpreter(); }}Copy the code
Now that you’ve parsed the code, it’s time to prioritize the operations. So if we look at this expression again,
1 + 2 + 3Copy the code
How should a computer perform this operation? What kind of data structure should you use? The author draws a schematic diagram to illustrate the implementation process:When it encounters an operand, it pushes the top element off the stack, evaluates with the next operand, and pushes the result onto the stack. Repeat the process until the final element on the stack is the final result. As a result,Calculator
Classes undoubtedly use stack structures to maintain order of execution:
class Calculator {
Expression expression;
String exp;/ / expression
Calculator(String exp) {
this.exp = exp;
Stack<Expression> stack = new Stack<>();
char[] chars = exp.toCharArray();// Consider only the units digit
for (int i = 0; i < chars.length; i++) {
VarExpression varExpression = new VarExpression(String.valueOf(chars[i]));
switch (chars[i]) {
// When the operation symbol is encountered, the first number is removed from the stack, and then the next number is added to the stack
case '+':
stack.push(new AddExpression(stack.pop(), new VarExpression(String.valueOf(chars[++i]))));
break;
case The '-':
stack.push(new SubExpression(stack.pop(), new VarExpression(String.valueOf(chars[++i]))));
break;
default:
// If a number is encountered, it is pushed directly
stack.push(varExpression);
}
}
expression = stack.pop();
}
/ / calculate
public void exec(a){
print("(" + exp + ") ="+ expression.interpreter()); }}Copy the code
The client calls like this:
new Calculator("1 + 2 + 3").exec();
Copy the code
If the third party website added “multiply and divide”, I could simply subclass SymbolExpression and implement the parse algorithm for multiplication and division. It was very nice, so I could safely crawl the data.
This is the interpreter mode!
2. Definition of interpreter schema
Given a language, define a representation of its grammar and define an interpreter that uses that representation to interpret sentences in the language.
- AbstractExpression: Abstract interpreter, in which the specific interpretation algorithm is done by subclasses.
- TerminalExpression: A TerminalExpression that implements interpretive operations associated with elements ina grammar. Usually an interpreter pattern has only one TerminalExpression, but multiple instances.
- NonTerminalExpression: A non-terminal expression that corresponds to each rule ina grammar.
- Context: environment role.
The interpreter pattern is rarely used in practice, at least by developers who rarely write an interpreter by hand. The interpreter is complex to write, debug, and maintain. For the above example, it is perfectly possible to use a mature and powerful third-party library such as JEP, or to use a scripting language such as shell to replace the interpreter mode.
3. Advantages and disadvantages of the interpreter pattern
The interpreter is a simple syntax analysis tool. The most significant advantage of the interpreter is that it is very extensible. To modify the grammar rules, you only need to modify the algorithm of the corresponding non-terminal expression.
disadvantages
- Each syntax produces a non-terminal expression, and too many syntax results in a bloated number of classes, which can be difficult to maintain.
- The interpreter mode uses recursive invocation methods that are cumbersome to debug.
- The interpreter pattern requires a lot of looping and recursion and is less efficient to execute.
4. To summarize
In practice, you will rarely need to write an interpreter by hand, because it raises efficiency, performance, and maintenance issues. If the syntax is slightly more complex, the interpreter will be difficult to write. If you do use an interpreter, give priority to mature three-way libraries on the market, such as Expression4J, MESP, and JEP, which are powerful and reasonably efficient and can perform most mathematical operations.