This is the third day of my participation in the August More text Challenge. For details, see: August More Text Challenge

Introduction to the

Lex stands for Lexical Lexical Analyzar and is a tool for generating a scanner that recognizes Lexical patterns in text. Lex is strongly coupled to C. A.l file is passed through the common Lex program, which generates an output file of C. These files are compiled into executable versions of the lexical analyzer. Lex calls each scanned word a token, which can have many classes. In contrast to natural languages, every word in English is a token, and there are many classes of tokens. For example, non(noun) is a class token, and apple is a specific token belonging to this type. For a programming language, the number of tokens is very limited, unlike the hundreds of thousands of words in a natural language like English. The lex tool generates a yylex function that yacc calls to know what type of token it gets, but the type of token is defined in yacc. The input file for lex is usually named.l, and the output file from lex xx.l is lex.yy. C

Regular expressions, regular expressions, Re

Define regular expressions with Lex

character meaning
A-Z,a-z,0-9 Characters and numbers that form part of a pattern
. Matches any character except \n.
Used to specify a range
[] A set of characters that match any character in the parentheses and if the first character is ^ then it represents a negative pattern.
* Matches zero or more of the above patterns
+ Matches one or more of the above patterns
? Matches zero or one of the above patterns
$ Matches the end of a line as the last character in the pattern
{x,y} Indicate the number of times A pattern may occur. For example, A{1,3} indicates that A may occur 1 to 3 times.
Used to escape metacharacters
^ Used to indicate negation
“< some symbols >” The literal meaning of the character. Metacharacters have.
/ Forward matching. If a “/” is followed by a subsequent expression in the matched template, only the parts of the template that precede the “/” are matched. For example, if you enter A01, then A0 in the template A0/1 matches.
(a) Group a series of regular expressions

For example,

expression meaning
joke[rs] Match jokes or jokers.
Shis + A {1, 2} Match AAshis, Ashis, AAshiss, Ashiss, Ashisss…
(A[b-e])+ Matches 0 or 1 of all the characters from b to e following the occurrence of A.

Regular type

number [0-9]+

id [A-Za-z]+[A-Za-z0-9_]*

Lex programming

  1. Specifying that a pattern correlation is an action in a format that Lex can understand.
  2. Running Lex on this file generates the C code for the scanner.
  3. Compile link C code to generate an executable scanner.

A Lex program is divided into three segments: the first contains global declarations for C and Lex, the second contains schemas (C code), and the third contains supplementary C functions. For example, the third paragraph usually has the main() function. These segments are bounded by %%.

        Definition section
        %%
        Rules section
        %%
        C code section
Copy the code
  • The Definition Section can contain C’s various include, define, and other statements, but must be enclosed in %{%}. L file, you can put the pre-defined regular expression: minus “-” and the token definition. The method is: token regular expression. Then in the Rules Section you can refer to the regular expression with {symbol}
        %{
        int wordCount = 0;
        %}
        chars [A-za-z_'."]
        numbers ([0-9])+
        delim [" "\n\t]
        whitespace {delim}+
        words {chars}+
        %%
Copy the code
  • In the Rules section. l file, the Rules are the actions of each regular expression, which usually return a token. The actions are extended with {} and described in C. The code can do anything you want
        {words} { wordCount++; /*
        increase the word count by one*/ }
        {whitespace} { /* do
        nothing*/ }
        {numbers} { /* one may
        want to add some processing here*/ }
        %%
Copy the code
  • C Code Section main function
void main() { yylex(); // This function starts the analysis. It is automatically generated by Lex. printf(" No of words: %d\n", wordCount); } /* This function is called at the end of the file (or input). If the function returns a value of 1, the parsing stops. So it can be used to parse multiple files. The code can be written in the third paragraph, which can parse multiple files. The method is to use yyin file Pointers to point to different files until all files are parsed. Finally, yywrap() can return 1 to indicate the end of parsing. */ int yywrap() { return 1; }Copy the code

Compile operation

Runtime environment

Flex is preinstalled on the ubuntu15.10 lex/flex machine. Use the same GCC as the flex command

compile

C GCC lex.yy. C -o yourname -ll # create executable program -ll to link lex library./youname < xxx.c # Run program to redirect the input stream to the file to be analyzedCopy the code