Looking at the previous section, we have learned that the parse table and drive algorithms are at the heart of the LR parser.

In the process of analysis, the parser is always based on the current status on the top of the stack, the rest of the input end of the first character table query analysis, to determine the change pattern of action and implement, modify the contents of the stack and the rest of the input, from one pattern to another pattern, so until completion analysis (or an error).

Let’s examine how to construct the DFA from the grammar — this is the first step in constructing the LR(0), SLR(1) analysis tables.

DFA is constructed by subset method from NFA

The previous blog explained how to construct an NFA, and with an NFA you can construct a DFA using a subset method.

The construction method is similar to the NFA to DFA of lexical analysis, which will not be repeated here. The following are some characteristics of the DFA.

Each state in the DFA is a set of states of the original NFA (because the DFA is constructed by the NFA using the subset method). Since each state of NFA corresponds to a project, each state in DFA is also a set of projects.

The initial state of DFA is obtained by the ε- closure of the original NFA initial state, and all states in DFA are its final states.

(Constructing lexical analysis of DFA by analogy subset method — the final states of DFA are those states containing the final states of the original NFA, and the original NFA already has all states as final states, that is, the links marked by the path from the initial state to any state are the live prefixes recognized by the DFA. Similarly, the original handle identification state is still applicable here, namely 1, 8, 10, 6, 7, 11, 9 in the figure)

The NFA that recognizes live prefixes differs from the original lexical ANALYSIS NFA in one important way: the lexical analysis NFA status number is just a number, whereas the status number here corresponds to the item — each number corresponds to an item, and the state transition relationship is based on these items.

So.. It is also possible to construct a DFA directly from a project

The DFA that recognizes live prefixes is constructed directly from the LR(0) project

LR(0) itemset specification family: The totality of the itemset (states) that make up a DFA that recognizes a grammar live prefix, called the LR(0) itemset specification family for this grammar. This family of specifications provides the basis for building a class of LR(0) and SLR parsers.

In fact, the specification family is the total number of states we need to construct for the live prefixed DFA — the sum of all the rectangle states in the DFA above. Because each state of the DFA is an LR(0) item set, it is called the LR(0) item set specification family.

If the problem asks you to compute an LR(0) itemset specification family, that is to compute a DFA that recognizes live prefixes. With DFA, we can construct LR(0) and SLR(1) analysis tables.

We will classify and name the project first.

  • The item with the dot at the right end (such as A → α., A is not the opening symbol of the extension grammar) is called “specification item”, and the specification action can be carried out in the state of the specification item.
  • The specification items of the grammar opening symbol S’ are called “acceptance items” (e.g. S’ → α.);
  • Items of the form A → α. A β are called “move forward items”, where A is the terminator. In this state, if the next input is terminal A, the move can be performed;
  • Items of the form A → α.Bβ are called “items to be excluded”, where B is A non-terminal. The meaning of the item to be contracted is [waiting for B to be stipulated] : if you want to carry out the stipulation according to the production mode, you have to find a way to get B out first, and then you can carry out the stipulation after seeing B. B is a non-terminal, but not a terminal in the input sequence. Therefore, if you want to use this production for specification, you need to complete the right part of the production — you need to reduce the symbol sequence derived from B to B in the input sequence.

Construct them

Find the extension generalized grammar G’

G’ = G ∪ {S’ → S}

In fact, a new non-terminal S’ is introduced as a new opening symbol of the extension grammar, and a new production S’ → S is introduced

CLOSURE & GO

  1. CLOSURE(I) : The entire item reached from item set I without any grammatical notation (the same meaning as the ε- CLOSURE(I) in lexical analysis)
  2. GO(I, X) : all items that can be reached from the dharma symbol X of I (just reach, more than one step is ok. Therefore, we need to calculate the directly arrived first, and then find the closure on the basis of being able to directly arrive. Therefore, it is clear that GO and smove of lexical analysis NFA are functionally unequal: GO contains closure calculation. X is just a grammatical symbol, terminal or non-terminal.
Ex. :

For the following grammar G, if I={S →.e}

S' → E
E → aA | bB
A → cA | d
B → cB | d
Copy the code
  • Strives for the CLOSURE (I)

    CLOSURE(I) = {S’ →.e, E →.aa, E →.bb} CLOSURE(I) = {S’ →.e, E →.aa, E →.bb}

    First, S’ →.e is an element of the item set itself, which naturally can be reached without any grammatical notation. The last two items are reachable from the initial state through the ε edge, which also satisfies the requirement that items arrive from item set I without passing any grammatical symbols. (Imagine the NFA diagram here.)

  • Please GO (I, a)

    GO(I, a) = { E→a.A, A→.cA, A→.d }

    First, E→a.A is a project that can be reached directly from the project in I through a. The next two items are reachable by the E→a.A ε edge. This is also consistent with the definition of “all items that [can] reach from I the dharma symbol A”.

Construct them

Construct the DFA of the following grammar G’

E '- E E - E - T | T T - T F * F | F - - F | idCopy the code

Firstly, CLOSURE({E’ →.e}) is solved and taken as the initial state of the whole DFA, i.e. :

I0 = CLOSURE (- >. E} {E) = {E ‘- >. E, E – E – T, E -, T, T – T * F, T – > F, F -. – F, F -. Id}

Take I0 as the initial state, which is essentially the picture on the left, and draw a big rectangle and put the items in I0. So this whole box is the initial state of our DFA. Then DFA can be constructed step by step from the initial state. This step by step construction process is nothing more than [calculate one by one which grammar symbols can be used to obtain which new items from each item in the initial state], and the new item can naturally form a new state, which is connected with the original state by transferring the grammar symbols used.

Repeat the operation using the same rules (spreading the pie), and finally get the DFA as shown in the figure below

This DFA can then be used to identify live prefixes. Unlike the lexical analysis DFA, any state of this DFA is the final state. This means that the marks on the path formed from the initial state to any state are all connected by a live prefix that CAN be recognized by DFA.

For example, for state 3 (I3), the recognized live prefixes are F and e-f; The only live prefixes recognized by state 6 are E-

Now, let’s go back and look at the process of moving into the specification and find something interesting

Indeed, at any given moment, the contents of the stack are associated with analysis tables and automata. We can try to read elements in the stack at any time (from the bottom of the stack to the top of the stack), and the sequence of state numbers and grammar symbols can match a path in the DFA on the left side of the figure above.

The DFA guides the next analysis

  1. DFA recognizes different live prefixes for each state.
  2. At any time the parser is in a state I of the DFA when parsing a syntactically structured input sequence, the live prefix recognized by state I appears on the stack and state I is at the top of the stack. It is the LR(0) item contained in state I that the parser guides the next step of analysis;
  3. LR(0) items present in state I are valid for live prefixes recognized by state I.

Ok, DFA looks like a good thing, so let’s assume that the top of the stack is 5, how does the DFA guide the next analysis?

Effective project

If there is the most right derived: S ‘= * > alpha A omega = > alpha beta 1 beta 2 omega, says the project and A beta 1. Beta 2 effective for live prefix alpha beta 1.

αAω => αβ1β2ω => αβ1β2ω => αβ1β2ω => αβ1β2ω

I think so: everything that has been read and pushed is all live prefixes. Live prefixes are prefixes that are still active — the live prefix may become a handle, depending on what symbol is added to it — that is, live prefixes can become the right part of a production after some new symbol is added, and can be reduced by this production.

For the symbol string αβ1β2ω, it looks like it could be specified using A → β1β2. The read/write head reads the input sequence symbol by symbol, and then pushes each symbol to the next. When the stack is αβ1 (i.e. the active prefix is αβ1), if we already know that there is an item: A → β1.β2, then if the next person reads β2, the top of the stack will form A handle to the production, and we can use the next item of the item, A → β1β2. Protocol is in place.

This actually reflects the guiding role of DFA in analysis.

All live prefixes of the item set are valid for all items in the same item set. That is, each item in the project set has equal rights to direct the next steps.

The significance of effective projects:

  1. The analysis of the project so far is correct;
  2. Can guide the next analysis:
    1. A β (moveable input) : if the current remaining input is terminal A, move to A;
    2. B → β. (Optional) : according to the production formula B → β specification.