A few days ago I wrapped the lexer and named Bklexer. Currently Bklexer supports Go/C++/Python respectively.

The code is in the GitHub project: click on the project page

Learn from each version of the try_lexer code. Take Go as an example:

Package main import (" FMT ""strconv" "./bklexer") func main() {fmt.println ("Test Code:") Code: =" declare variable = PI * 100 - The fda \ n1024 * 4 * 3.14 # # # \ n123 "FMT. Println (code) FMT. Println (" -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --") lexer: = BKLexer.NewLexer() lexer.AddRule("\\d+\\.\\d*", "FLOAT") lexer.AddRule("\\d+", "INT") lexer.AddRule("[\\p{L}\\d_]+", "NAME") lexer.AddRule("\\+", "PLUS") lexer.AddRule("\\-", "MINUS") lexer.AddRule("\\*", "MUL") lexer.AddRule("/", "DIV") lexer.AddRule("=", "ASSIGN") lexer.AddRule("#[^\\r\\n]*", "COMMENT") lexer.AddIgnores("[ \\f\\t]+") lexer.Build(code) for true { token := lexer.NextToken() if (token.TType ! = BKLexer.TOKEN_TYPE_EOF) { fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n", token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col) } if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) { break } } }

Packages including Bklexer were introduced first

import (
    "fmt"
    "strconv"
    "./bklexer"
)
  • FMT is used for printout
  • Strconv is used to optimize the display of literals
  • ./bklexer introduces the bklexer package

Instantiate the lexer and set the rules

lexer := BKLexer.NewLexer()
lexer.AddRule("\\d+\\.\\d*", "FLOAT")
lexer.AddRule("\\d+", "INT")
lexer.AddRule("[\\p{L}\\d_]+", "NAME")
lexer.AddRule("\\+", "PLUS")
lexer.AddRule("\\-", "MINUS")
lexer.AddRule("\\*", "MUL")
lexer.AddRule("/", "DIV")
lexer.AddRule("=", "ASSIGN")
lexer.AddRule("#[^\\r\\n]*", "COMMENT")
lexer.AddIgnores("[ \\f\\t]+")
  • NewLexerInstantiate the lexer
  • AddRuleAdd a matching rule with a regular expression as the parameter and the corresponding type name
  • AddIgnoresUse to set the content of characters to be ignored

Build and loop match

lexer.Build(code) for true { token := lexer.NextToken() if (token.TType ! = BKLexer.TOKEN_TYPE_EOF) { fmt.Printf("%s\t%s\tt%d\t%d\t%d,%d\n", token.Name, strconv.Quote(token.Source), token.TType, token.Position, token.Row, token.Col) } if (token.TType == BKLexer.TOKEN_TYPE_EOF || token.TType == BKLexer.TOKEN_TYPE_ERROR) { break } }

Using the Build method, Build with code as an argument, then loop through the nextToken method to get the NextToken and print the relevant information. It is important to note that the type of Token should be checked to determine whether it is EOF or ERROR to terminate.

The running results are as follows

Test Code: Declare variables = PI * 1024 * 100 - the fda 4 * 3.14 # # # 123 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the NAME "statement" t3 0, 0, 0 the NAME "variable" 7 0, 3 t3 Assign "=" T8 14 0,6 NAME "PI" T3 16 0,8 MUL "*" T6 19 0,11 INT "100" T2 21 0,13 MINUS "-" T5 25 0,17 NAME "FDA" T3 27 0,19 NEWLINE "\n" t0 30 0,22 INT "1024" t2 31 1,0 MUL "*" t6 36 1,5 INT "4" t2 38 1,7 MUL "*" t6 40 1,9 FLOAT "3.14" t1 42 1,11 COMMENT "### "t9 47 1,16 NEWLINE "\n" t0 51 1,20 INT" t2 52 2,0

The next section, “Recursive Down Algorithm Realizes CALC”, welcome to pay attention.