|
|
The lex(CP) utility is a program generation tool for constructing lexical analyzers. Lexical analyzers produced by lex are designed to work with yacc parsers. lex generates a lexical analyzer, which is called through a call to a function yylex.
The following example of a lex specification for a lexical analyzer can be used with either of the parsers for recognizing dates, generated from the yacc specifications given earlier in this chapter.
%{ #include <stdlib.h> #include "y.tab.h" char *p; %} mon January|February|March|April|May|June|July|August| September|October|November|December %% {mon} { p=(char *)calloc(strlen(yytext)+1,sizeof(char)); strcpy(p,yytext); yylval.text=p; return(t_MONTH); } [0-9]{1,2} { yylval.ival=atoi(yytext); return(t_DAY); } [0-9]{4} { yylval.ival=atoi(yytext); return(t_YEAR); } \, { return ','; }The analyzer returns a token t_MONTH, t_DAY, or t_YEAR when it recognizes the corresponding sequence of characters. The lexical analyzer associates a t_DAY or t_YEAR token with an integer value and a t_MONTH token with a character string. The tokens are declared in the yacc specification and subsequently defined in the file y.tab.h, generated by yacc with the -d option.
The previous example illustrates the use of yylval. This variable is defined as a C union having a member called text to point to character strings and a member ival to hold an integer value. This definition was performed in the yacc specification.
In the action for the first pattern, the lexical analyzer puts the value of the string that it matches into the array yytext. A copy is made of yytext, and yylval.text is assigned a pointer to this value. If a pointer to yytext had been assigned to yylval.text, a problem could arise because the value in yytext could get overwritten by the lexical analyzer by the time yylval was used by the parser.
The actions for the second and third patterns convert the matched string
to an integer and assign yylval.ival this value.