|
|
awk provides regular expressions for pattern matching; the syntax of UNIX system expressions is described in ``Regular expressions''.
The simplest regular expression is a string of characters matching only itself: that is, the string is a literal. In awk, a regular expression is typically enclosed within slashes in order to label it as a regular expression as opposed to an awk command, as follows:
/Asia/This program points to all input records that contain the substring ``Asia''; if a record contains ``Asia'' as part of a larger string like ``Asian'' or ``Pan-Asiatic'', it is also printed.
awk provides the full range of UNIX system regular expression metacharacters; see ``Regular expressions'' for a detailed explanation. (In addition, awk recognizes the escape sequences listed in ``The echo command''.) awk also provides the regular expression operators shown in ``awk regular expression operators''.
awk regular expression operators
Operator | Meaning |
---|---|
~ | matches |
!~ | does not match |
$4 ~ /Asia/ { print $1 }This program prints the first field of all lines in which the fourth field does not match ``Asia'':
$4 !~ /Asia/ { print $1 }awk interprets any string or variable on the right side of a ~ or !~ as a regular expression. For example:
$2 !~ /^[0-9]+$/This sample program can be rewritten as follows:
BEGIN { digits = "^[0-9]+$" } $2 !~ digitsSuppose you wanted to search for a string of characters such as ^[0-9]+$. When a literal quoted string like "^[0-9]+$" is used as a regular expression, one extra level of backslashes is needed to protect regular expression metacharacters. This is because one level of backslashes is removed when a string is originally parsed. If a backslash is needed in front of a character to turn off its special meaning in a regular expression, then that backslash needs a preceding backslash to protect it in a string.
For example, suppose we want to match strings containing ``b'' followed by a dollar sign. The regular expression for this pattern is b\$. To create a string to represent this regular expression, add one more backslash, as follows:
"b\\$"The two regular expressions on each of the following lines are equivalent:
x ~ "b\\$" x ~ /b\$/ x ~ "b\$" x ~ /b$/ x ~ "b$" x ~ /b$/ x ~ "\\t" x ~ /\t/A summary of the regular expressions and the substrings they match is given in ``awk regular expressions''. The unary operators , +, and ? have the highest precedence, with concatenation next, and then alternation (|). All operators are left-associative. The r stands for any regular expression.
awk regular expressions
Expression | Matches |
---|---|
char | any non-metacharacter char |
\char | character char literally |
^ | beginning of string |
$ | end of string |
. | any character but newline |
[s] | any character in set s |
[^s] | any character not in set s |
r | zero or more rs |
r+ | one or more rs |
r? | zero or one r |
(r) | r |
r1 r2 | r1 then r2 (concatenation) |
r1|r2 | r1 or r2 (alternation) |