Regular expressions

Precedence in regular expressions

Occasionally circumstances arise where a regular expression can match two or more strings in a target. In general, the leftmost, then the longest, string is selected; that is: if two matches overlap, the one starting to the left is selected, and if two matches starting at the same character position exist then the longest one is selected.

Precedence in the way that regular expressions are resolved can be forced by using the () grouping operator. For example,

John( Dixon)?

matches the regular expression ``John'' followed by zero or one instances of the regular expression ``Dixon''. You can use brackets in conjunction with the vertical bar to group alternatives. For example, factor(ies|y) matches the words ``factory'' and ``factories'' slightly more economically than the equivalent regular expression factories|factory. In the event that no ``|''s are present and there is only one ``*'', ``+'', or ``?'', the effect is that the longest possible match is chosen. So ``ab*'', presented with ``xabbbby'', will match ``abbbb''. Note that if ``ab*'' is tried against ``xabyabbbz'', it will match ``ab'' just after ``x'', due to the begins-earliest rule.

The decision on where to start the match is the first choice to be made, hence subsequent choices must respect it even if this leads them to less-preferred alternatives.

Next topic: Regular expression summary
Previous topic: Regular expression grouping

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003