|Important||This document may not represent best practices for current development, links to downloads and other resources may no longer be valid. Current recommended version can be found here.|
Alternation and Grouping
Alternation uses the | character to allow a choice between two or more alternatives. For example, you can expand the chapter heading regular expression to return more than just chapter headings. However, it is not as straightforward as you might think. Alternation matches the largest possible expression on either side of the | character.
You might think that the following expression matches either Chapter or Section followed by one or two digits occurring at the beginning and ending of a line:
Unfortunately, the above regular expression matches either the word Chapter at the beginning of a line, or the word Section and whatever numbers follow Section at the end of the line. If the input string is Chapter 22, the above expression only matches the word Chapter. If the input string is Section 22, the expression matches Section 22.
To make the regular expressions more responsive, you can use parentheses to limit the scope of the alternation, that is, to make sure that it applies only to the two words Chapter and Section. However, parentheses are also used to create subexpressions and possibly capture them for later use, something that is covered in the section on backreferences. By adding parentheses in the appropriate places of the above regular expression, you can make the regular expression match either Chapter 1 or Section 3.
The following regular expression uses parentheses to group Chapter and Section so the expression works properly:
Although these expressions work properly, the parentheses around Chapter|Section also cause either of the two matching words to be captured for future use. Since there is only one set of parentheses in the above expression, there is only one captured submatch. This submatch can be referred to by using the $1-$9 properties of the RegExp object.
In the above example, you merely want to use the parentheses to group a choice between the words Chapter and Section. To prevent the match from being saved for possible later use, place ?: before the regular expression pattern inside the parentheses. The following modification provides the same capability without saving the submatch:
In addition to the ?: metacharacters, two other non-capturing metacharacters create something called lookahead matches. A positive lookahead, which is specified using ?=, matches the search string at any point where a matching regular expression pattern in parentheses begins. A negative lookahead, which is specified using ?!, matches the search string at any point where a string not matching the regular expression pattern begins.
For example, suppose you have a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Suppose further that you need to update the document by changing all references to Windows 95, Windows 98, and Windows NT to Windows 2000. The following regular expression, which is an example of a positive lookahead, matches Windows 95, Windows 98, and Windows NT:
/Windows(?=95 |98 |NT )/
Once a match is found, the search for the next match begins immediately following the matched text without including the characters in the look-ahead. For example, if the above expression matched Windows 98, the search resumes after Windows not after 98.