|
|
The rules section is made up of one or more grammar rules, each of which has the form:
lname : definition ;The purpose of the rule is to define lname in terms of other symbols. In the example, lname is a non-terminal symbol, as are all symbols that appear to the left of the colon in a rule. The definition of name can consist of a sequence of terminal symbols, a sequence of other non-terminal symbols, or a sequence of both terminal and non-terminal symbols. Non-terminal symbols that appear in the definition of another symbol are still regarded as non-terminal symbols in that context. The colon and the semicolon are yacc delimiters: the colon separates the non-terminal symbol on the left from its definition and the semicolon must be the last character in the rule.
If actions are associated with the rule, they can appear between the colon and the semicolon.
Symbols can be any length and are composed of letters, dots, underscores, and digits, although a digit cannot be the first character of a symbol. Uppercase and lowercase letters are distinct. The NULL character (\0 or 0) should never be used in a grammar rule.
A literal in a definition consists of a character enclosed in single quotes ('). Literal characters must be passed to the parser by the lexical analyzer, and are considered to be tokens. As in C, the backslash (\) is an escape character within literals; all the C escape sequences are recognized. The following escape sequences are understood by yacc:
´\n´ | newline | ||
´\r´ | return | ||
´\´´ | single quote (´) | ||
´\\´ | backslash (\) | ||
´\t´ | tab | ||
´\b´ | backspace | ||
´\f´ | form feed | ||
´\nnn' | a character in octal notation |
If there are several grammar rules with the same left-hand side, the vertical bar (|) can be used to combine them, thereby eliminating the need to rewrite the left-hand side. The semicolon at the end of a rule is dropped before a vertical bar. The following constructions are equivalent:
A : B C D ; A : E F ; A : G ;and
A : B C D | E F | G ;It is not necessary for all grammar rules with the same left side to appear together in the grammar rules section, although if they do, the input is more readable and easier to change.
If a non-terminal symbol matches the empty string, this can be indicated by an empty definition, such as the following:
epsilon : ;
Depending on the situation, you can recognize constructs using either the lexical analyzer or grammar rules. For example, the following rules can be used to define the symbol month:
month : ´J´ ´a´ ´n´ ; month : ´F´ ´e´ ´b´ ; . . . month : ´D´ ´e´ ´c´ ;In this example, month is a non-terminal symbol. The lexical analyzer only needs to recognize individual letters, and therefore may be very simple. However, such low-level rules are wasteful and result in a complicated specification. To avoid this problem, have the lexical analyzer recognize strings such as ``January'' and return an indication that a ``month'' token was seen. In that case, ``month'' is a terminal symbol and the detailed rules are not needed.
In some cases, yacc fails to produce a parser when given a set of specifications. The specifications may be self-contradictory or they may require a more powerful recognition mechanism than that available to yacc. The former cases represent design errors; the latter often can be corrected by making the lexical analyzer more powerful or by rewriting some of the grammar rules.