Character Matching

The period (.) matches all but one single printing or non-printing character in a string. The exception is a newline character (\n). The following regular expression matches aac, abc, acc, adc, and so on, as well as a1c, a2c, a-c, and a#c:

/a.c/

To match a string containing a file name in which a period (.) is part of the input string, precede the period in the regular expression with a backslash (\) character. To illustrate, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only let you match any single character. You may want to match specific characters from a list. For example, you might want to find chapter headings that are expressed numerically (Chapter 1, Chapter 2, and so on).

Bracket Expressions

To create a list of matching characters, place one or more individual characters within square brackets ([ and ]). When characters are enclosed in brackets, the list is called a bracket expression. Within brackets, as anywhere else, an ordinary character represents itself, that is, it matches an occurrence of itself in the input text. Most special characters lose their meaning when they occur inside a bracket expression. Here are some exceptions:

  • The ] character ends a list if it is not the first item. To match the ] character in a list, place it first, immediately following the opening [.

  • The \ character continues to be the escape character. To match the \ character, use \\.

Characters enclosed in a bracket expression match only a single character for the position in the regular expression. The following regular expression matches Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/Chapter [12345]/

Notice that the word Chapter and the space that follows are fixed in position relative to the characters within brackets. The bracket expression is used to specify only the set of characters that matches the single character position immediately following the word Chapter and a space. That is the ninth character position.

To express the matching characters using a range instead of the characters themselves, use the hyphen (-) character to separate the beginning and ending characters in the range. The character value of the individual characters determines the relative order within a range. The following regular expression contains a range expression that is equivalent to the bracketed list shown above.

/Chapter [1-5]/

When a range is specified in this manner, both the starting and ending values are included in the range. It is important to note that the starting value must precede the ending value in Unicode sort order.

To include the hyphen character in a bracket expression, do one of the following:

  • Escape it with a backslash:

    [\-]
    
  • Put the hyphen character at the beginning or the end of the bracketed list. The following expressions match all lowercase letters and the hyphen:

    [-a-z]
    [a-z-]
    
  • Create a range in which the beginning character value is lower than the hyphen character and the ending character value is equal to or greater than the hyphen. Both of the following regular expressions satisfy this requirement:

    [!--]
    [!-~]
    

To find all characters not in the list or range, place the caret (^) character at the beginning of the list. If the caret character appears in any other position within the list, it matches itself. The following regular expression matches chapter headings with numbers greater than 5:

/Chapter [^12345]/

In the examples above, the expression matches any digit character in the ninth position except 1, 2, 3, 4, or 5. So, for example, Chapter 7 is a match and so is Chapter 9.

The above expressions can be represented using the hyphen character (-):

/Chapter [^1-5]/

A typical use of a bracket expression is to specify matches of any upper- or lowercase alphabetic characters or any digits. The following expression specifies such a match:

/[A-Za-z0-9]/

See Also

Other Resources

Introduction to Regular Expressions