|
|
#include <sys/type.h> #include <regex.h>int regcomp(regex_t *preg, const char *pattern, int cflags);
int regexec(const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[], int eflags);
size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
void regfree(regex_t *preg);
regcomp(S) compiles pattern and stores the compiled results in the structure pointed to by preg.
regexec(S) compares the null-terminated string with the compiled regular expression preg, and stores the information about matches in the array pmatch.
regerror(S) maps the error return values from regcomp( ) and regexec( ) to more meaningful error messages.
regfree(S) frees any memory allocated by regcomp( ) for preg, but does not free the preg structure itself.
Two structure types are used in regular expression matching: the structure type regex_t and the structure type regmatch_t. The structure type regex_t is used to hold compiled regular expressions and must have the following members:
Member Type | Member Name | Description |
size_t |
re_nsub
| Number of parenthesized |
subexpressions in pattern. |
Member Type | Member Name | Description |
regoff_t |
rm_so
| Byte offset, start of string |
to start of substring, | ||
regoff_t |
rm_eo
| Byte offset of character |
immediately after end of substring, | ||
from the start of string. |
Any pattern to be matched must first be compiled by
regcomp( ).
regcomp( )
compiles pattern and stores the compiled regular expressions
in the structure pointed to by preg.
regcomp( )
sets re_nsub
of the structure
to the number of parenthesized subexpressions
found in pattern.
Subexpressions in pattern are enclosed by escaped parentheses as \( ... \) in basic regular expressions or by parentheses as ( ... ) in extended regular expressions. The i-th subexpression begins at the i-th matched open parenthesis from the left, with i counting from 1. The pattern itself as a whole is also a subexpression and is labeled as the 0-th subexpression.
regexec( ) compares the null-terminated string pointed to by string with preg, the compiled regular expression generated by regcomp( ) from pattern. If there is a match, regexec( ) returns zero and records the match in pmatch, an array of structures. If there is no match or if an error occurred, regexec( ) returns non-zero.
The array pointed to by pmatch consists of at least nmatch elements. Each element of the array is a structure of the type regmatch_t. If nmatch is zero, pmatch will be ignored by regexec( ) entirely. The flag REG_NOSUB (see below) also causes regexec( ) to ignore its pmatch argument.
The routine regexec( ) fills in the i-th element of the array with offsets of i-th substring that corresponds to i-th parenthesized subexpression of pattern. (Only one matched substring for each subexpression is recorded. See rule 1 below.) Offsets in pmatch[0] identify the substring matching the entire regular expression, strictly also a subexpression. If the total matches are no more than nmatch, regexec( ) fills the unused elements of the array, up to pmatch[nmatch-1], with -1 (pmatch may point to an array larger than nmatch entries). If pattern contains more than nmatch subexpressions, only the first nmatch matched substrings are recorded.
A subexpression is said not to participate in the match when
When matching a basic regular expression or extended regular expression against string, any particular parenthesized subexpression of pattern might participate in the match of more than one substring, or it might not match any substring at all, even though the pattern as a whole still matches.
In order to determine which substring's byte offsets are to be recorded in pmatch when information is stored about matches, the rules below are followed by regexec( ) when matching regular expressions:
For example, for the pattern ((a)|(c)) matched against the string "aa", the subexpression (a) participates in two matches, but only the second match is recorded. In addition, although the pattern matches the whole string, the subexpression (c) does not match anything. The contents of pmatch would therefore be:
/* the whole expression */ pmatch[0].rm_so = 0; pmatch[0].rm_eo = 2; /* the outermost parentheses */ pmatch[1].rm_so = 1; pmatch[0].rm_eo = 2; /* the (a) subexpression */ pmatch[2].rm_so = 1; pmatch[2].rm_eo = 2; /* the (c) subexpression */ pmatch[3].rm_so = -1; pmatch[3].rm_eo = -1;As another example, consider matching the pattern (a)b against the string "b". The subexpression (a) matches an empty string at offset 0 of string "b" and thus:
pmatch[1].rm_so = 0; pmatch[1].rm_se = 0;If the pattern were b(a) instead, the subexpression (a) would match the empty string at the end of "b" and the contents of pmatch would be:
pmatch[1].rm_so = 1; pmatch[1].rm_se = 1;
The default behavior for compiling regular expressions and the subsequent matching to the string can be modified by two groups of flags: cflags, the argument to regcomp( ), and eflags, the argument to regcomp( ).
The value of cflags is formed from a bitwise inclusive OR of zero or more of the following flags, defined in <regex.h>:
The value of eflags is formed from the bitwise inclusive OR of zero or more of the following flags, also defined in <regex.h>:
When the REG_NOSUB flag is set,
regexec( )
will report only success or failure of a match,
and ignore its pmatch argument.
This flag also causes
regcomp( )
to set re_nsub
to an implementation
defined value (which should not be used and never be changed
through any other means) when the regular expression is compiled.
If REG_NEWLINE is not set in cflags, a newline character in pattern or string will be treated as any other ordinary character. If REG_NEWLINE is set, the newline character (in pattern or string) will still be treated as an ordinary character except that it is also given the properties of a line delimiter. Specifically, the newline character will have some additional special properties in the following three situations when REG_NEWLINE is set:
Meanings of error return values from regcomp( ) and regexec( ) may not be very obvious. regerror( ) provides mappings from these error codes to more meaningful error message strings. The message string produced by regerror( ) corresponds to errcode, the first argument to regerror( ). The value of errcode must be the last non-zero return value of either regcomp( ) or regexec( ) with the given value of preg, also passed to regerror( ). If any other value of errcode is passed to regerror( ), the contents of the generated string is undefined.
If preg is null but errcode is a value returned by a previous call to regcomp( ) or regexec( ), the routine regerror( ) still generates an error message string corresponding to the value of errcode, though the content might not be as detailed.
The buffer pointed to by errbuf is used to hold the generated string. The size of the buffer is errbuf_size bytes. If the string, including the terminating null, is longer than errbuf_size bytes, regerror( ) will truncate the string and null-terminate the result.
If errbuf_size is zero, regerror( ) returns the size of the entire error message string. Nothing is placed into the buffer and the errbuf argument is ignored entirely.
If regexec( ) can find a match, it returns zero. Otherwise, it returns REG_NOMATCH when no match can be found (or REG_ENOSYS if the function is not supported). As an SCO OpenServer specific extension, regexec( ) may also return REG_BADPAT. This may happen when, for example, the data in the compiled regular expression is corrupted and passed to regexec( ).
The routine regerror( ) returns the number of bytes required to hold the entire corresponding error message string upon successful completion. Otherwise, it returns zero to indicate that the routine is not supported.
regfree( ) does not return any value.
If the routine regexec( ) or regfree( ) is given a preg not returned by regcomp( ), the result is undefined. This includes the situations when a preg has been freed by regfree( ) or returned by a failed regcomp( ).
The return value REG_BADPAT by regexec( ) are SCO OpenServer specific. To retain XPG4 compliance for an application code when these features are used, compile the code conditionally, as in:
switch( regexec(&preg, string, nmatch, pmatch, eflags)) { case 0: /* code for match */ break; case REG_NOMATCH: /* code for no match */ break; . . . #ifdef _POSIX_SOURCE case REG_BADPAT: /* handle invalid regular expression error */ break; #endif /* REG_BADPAT */ . . . }
The routine
regexec(S)
is conformant with:
X/Open CAE Specification, System Interfaces and Headers,
Issue 4, 1992
with SCO OpenServer specific extensions
that are maintained by The SCO Group.
.
# include <sys/types.h> # include <regex.h> /* * match string against the extended regular expressions in pattern * * return 0 for match, non zero for no match or error */To produce a more verbal error message, an application could call regerror(S) twice, the first time to determine the buffer size for the error message string and the second time to put the string into the buffer created for the string, as inint match(const char *string, char *pattern) { int status; regex_t preg;
if ( (status=regcomp(&preg, pattern, REG_EXTENDED|REG_NOSUB)) != 0 ) { return (status); /* flag error */ }
status = regexec(&preg, string, (size_t) 0, NULL, 0); regfree(&preg); return (status); }
extern char *errbuf; . . . if ( (status=regcomp(&preg, pattern, REG_EXTENDED|REG_NOSUB)) != 0 ) { /* there is an error: status != 0 first, find out the size of the error message string */ buf_size = regerror(status, preg, (char *) NULL, (size_t) 0); /* then create a buffer for the message string */ errbuf = (char *)malloc(buf_size); /* and put the string into buffer before return */ regerror(status, preg, errbuf, buf_size); /* flag the error */ return (status); } . . .