yacc

Parser operation

The algorithm yacc uses to go from the specification to the C code for the parser is complex and is not discussed here. The parser itself is relatively simple and an understanding of how it works, while not essential, makes treatment of error recovery and ambiguities easier.

The parser produced by yacc consists of a finite-state machine with a state stack. The parser is also capable of reading and remembering the next input token, called the ``look-ahead token''. The current state is always the one on the top of the stack. The states of the finite-state machine are given small integer labels. In addition to the state stack, a parallel stack, the value stack, holds the values returned by the lexical analyzer and the actions. Initially, the machine is in state 0 (that is, the state stack contains only state 0) and no look-ahead token has been read.

The machine has only five actions available: shift, reduce, goto, accept, and error. The goto is always performed as a component of a reduce action. The parser operates in the following manner:

Based on its current state, the parser decides whether it needs a look-ahead token to choose the action to be taken. If so, it calls yylex() to obtain the next token.
Using the current state and the look-ahead token if needed, the parser decides on its next action and carries it out. This may result in states being pushed onto or popped off the stack, and in the look-ahead token being processed or left alone.

When yacc is invoked with the -v option, a file called y.output is produced with a human-readable description of the parser. The actions referred to in the discussion that follows are taken from such a description file.

Shift action

The shift action is the most common one the parser takes. A shift occurs whenever a token is recognized. Whenever a shift action is taken, there is always a look-ahead token. A shift is represented in y.output as follows (assume that the current state is 56):

   LOOP   shift 34

This says that in state 56, if the look-ahead token is LOOP, then the current state (56) is pushed down on the stack, and state 34 becomes the current state, that is, it is pushed on the stack. In addition, the look-ahead token is cleared, and the variable yylval is pushed on to the value stack.

Reduce and goto actions

A reduce action takes place when the parser determines that all the items on the right-hand side of some grammar rule have been seen. In a reduce action, all the states that were put on the stack while the right-hand side of the rule was being recognized are popped from the stack. Then a new state is put on the stack, based on the symbol to the left of the rule, the state that is currently at the top of the stack, and sometimes the look-ahead token. Suppose the following rule is being reduced:

   A   :   x  y  z    ;

The reduce action depends on the symbol to the left of the colon and the number of symbols on the right-hand side (in this case, three). This reduction first involves popping three states off the top of the stack. (In general, the number of states popped equals the number of symbols on the right side of the rule.) In effect, these states were the ones put on the stack while recognizing x, y, and z and no longer serve any useful purpose. After these states have been popped, the state that is on top of the stack is the one the parser was in before it recognized any of the symbols on the right side of the rule. Next, something similar to a shift of 'A' using this uncovered state is performed. A new state is obtained and pushed onto the stack, and parsing continues. This action is called a goto. There are differences between a goto and an ordinary shift of a token. In particular, the look-ahead token is cleared by a shift but is not affected by a goto. Sometimes, but not usually, it is necessary for the parser to refer to the look-ahead token to decide whether or not to reduce.

In effect, the reduce action turns back the clock in the parse, popping the states off the stack so that the stack has the same contents as before the symbols on the right side of the rule were recognized. The parser then treats the symbol on the left side of the rule as if it were an input token and performs an action accordingly. Note that if the rule has an empty right-hand side, no states are popped off the stack.

The reduce actions are associated with individual grammar rules. In the y.output file, these rules are given small integer numbers, which could lead to some confusion. The following action refers to ``grammar rule'' 18:

   .   reduce 18

This action refers to state 34:

   LOOP   shift 34

In any case, the state, which is uncovered when symbols are popped off the stack on a reduce, will contain an entry such as the following:

   A   goto 20

If the left side of the current rule consists of the symbol ``A'', this action causes state 20 to be pushed onto the stack.

The reduce action is also important in the treatment of user-supplied actions and values. When a rule is reduced, the code supplied with the rule is executed before the stack is adjusted. When a shift takes place, the external variable yylval is copied onto the value stack. A reduction takes place after the action code associated with the rule is carried out. When the goto action is done, the external variable yyval is copied onto the value stack. The pseudo-variables $1, $2, and so on, refer to the value stack.

Accept action

The other two parser actions are conceptually much simpler. The accept action indicates that the entire input has been seen and that it matches the specification. This action appears only when the look-ahead token is the end-marker, and indicates that the parser has done its job.

The occurrence of accept can be simulated in an action by use of the macro YYACCEPT. The YYACCEPT macro causes yyparse() to return the value 0, indicating a successful parse. Here is an example of its use:

   quest   :       wealth
           |       love
           |       holygrail
                   {
                           YYACCEPT;
                   }
           ;

Error action

The error action, on the other hand, represents a place where the parser can no longer continue parsing according to the specification. The input tokens it has seen (together with the look-ahead token) cannot be followed by anything that would result in a legal input. The parser reports an error and attempts to recover the situation and resume parsing. Error recovery is discussed later.

The error parser action can be simulated in a action code by use of the macro YYERROR. YYERROR causes the parser to behave as if the current input symbol had been a syntax error. The function yyerror() is called and error recovery takes place. Here is an example of such usage:

   seq:    first second third
           {
                   if ($1<$2 || $2<$3) {
                           printf("Values out of order!\n");
                           YYERROR;
                   }
           }

Interpreting the y.output file

Consider the following yacc specification:

   %token  DING  DONG  DELL
   %%
   rhyme   :   sound  place
           ;
   sound   :   DING  DONG
           ;
   place   :   DELL
           ;

The y.output file corresponding to the preceding grammar (with some statistics stripped off the end) is:

state 0 $accept : \_rhyme $end

DING shift 3 . error

rhyme goto 1 sound goto 2

state 1 $accept : rhyme\_$end

$end accept . error

state 2 rhyme : sound\_place

DELL shift 5 . error

place goto 4

state 3 sound : DING\_DONG

DONG shift 6 . error

state 4 rhyme : sound place\_ (1)

. reduce 1

state 5 place : DELL\_ (3)

. reduce 3

state 6 sound : DING DONG\_ (2)

. reduce 2

The actions for each state are specified, with a description of the parsing rules processed in each state. The underscore (_) character in a rule separates the symbols that have been seen from those that are expected to follow. The following input can be used to track the operations of the parser:

   DING  DONG  DELL

The input is processed in the following steps:

Initially, the current state is state 0. The parser needs to refer to the input to decide between the actions available in state 0, so the first token, DING, is read and becomes the look-ahead token. The action in state 0 on DING is shift 3, so state 3 is pushed onto the stack and the look-ahead token is cleared.
The next token, DONG, is read and becomes the look-ahead token. The action in state 3 on the token DONG is shift 6, so state 6 is pushed onto the stack and the look-ahead is cleared. The stack now contains 0, 3, and 6.
In state 6, without even consulting the look-ahead, the parser reduces by rule 2:
```
   sound  :   DING  DONG
```
In the reduction, two states, 6 and 3, are popped off the stack, uncovering state 0.
Now sound, the left side of rule 2, has just been recognized. Consulting the description of state 0, we see that there is a goto on sound:
```
   sound   goto 2
```
This causes state 2 to be pushed onto the stack and become the current state.
In state 2, the next token, DELL, is read. The action is shift 5, so state 5 is pushed onto the stack, which now has 0, 2, and 5 on it. The look-ahead token is cleared.
In state 5, the only action is to reduce by rule 3:
```
   place   :    DELL
```
This rule has one symbol on the right-hand side, so one state, 5, is popped off, and state 2 is uncovered.
There is a goto on place in state 2; this causes the state to become 4. Now, the stack contains 0, 2, and 4.
In state 4, the only action is to reduce by rule 1:
```
   rhyme   :    sound  place
```
There are two symbols on the right, so the top two states are popped off, uncovering state 0.
In state 0, there is a goto on rhyme, causing the parser to enter state 1.
In state 1, the end-marker, indicated by $end, is obtained when the input is read. The accept action in state 1 (when the end-marker is seen) successfully ends the parse.