Quantifiers in JScript
If you cannot specify the number of characters that comprise a match, regular expressions support the concept of quantifiers. These quantifiers let you specify how many times a given component of a regular expression must occur for a match to be true.
Character | Description |
---|---|
* | Matches the preceding character or subexpression zero or more times. For example, zo* matches z and zoo. * is equivalent to {0,}. |
+ | Matches the preceding character or subexpression one or more times. For example, zo+ matches zo and zoo, but not z. + is equivalent to {1,}. |
? | Matches the preceding character or subexpression zero or one time. For example, do(es)? matches the do in do or does. ? is equivalent to {0,1} |
{n} | n is a nonnegative integer. Matches exactly n times. For example, o{2} does not match the o in Bob but matches the two o's in food. |
{n,} | n is a nonnegative integer. Matches at least n times. For example, o{2,} does not match the o in Bob and matches all the o's in foooood. o{1,} is equivalent to o+. o{0,} is equivalent to o*. |
{n,m} | m and n are nonnegative integers, where n <= m. Matches at least n and at most m times. For example, o{1,3} matches the first three o's in fooooood. o{0,1} is equivalent to o?. Note that you cannot put a space between the comma and the numbers. |
Since chapter numbers could easily exceed nine in a large input document, you need a way to handle two or three digit chapter numbers. Quantifiers give you that capability. The following regular expression matches chapter headings with any number of digits:
/Chapter [1-9][0-9]*/
Notice that the quantifier appears after the range expression. Therefore, it applies to the entire range expression that, in this case, specifies only digits from 0 through 9, inclusive.
The + quantifier is not used here because there does not necessarily need to be a digit in the second or subsequent position. The ? character also is not used because it limits the chapter numbers to only two digits. You want to match at least one digit following Chapter and a space character.
If you know that chapter numbers are limited to only 99 chapters, you can use the following expression to specify at least one but not more than two digits.
/Chapter [0-9]{1,2}/
The disadvantage of the above expression is that a chapter number greater than 99 will still only match the first two digits. Another disadvantage is that Chapter 0 would match. Better expressions for matching only two digits are the following:
/Chapter [1-9][0-9]?/
or
/Chapter [1-9][0-9]{0,1}/
The *, +, and ? quantifiers are all referred to as greedy because they match as much text as possible. However, sometimes you just want a minimal match.
For example, you may be searching an HTML document for an occurrence of a chapter title enclosed in an H1 tag. That text appears in your document as:
<H1>Chapter 1: Introduction to Regular Expressions</H1>
The following expression matches everything from the opening less than symbol (<) to the greater than symbol (>) that closes the H1 tag.
/<.*>/
If you only want to match the opening H1 tag, the following, non-greedy expression matches only <H1>.
/<.*?>/
By placing the ? after a *, +, or ? quantifier, the expression is transformed from a greedy to a non-greedy, or minimal, match.