Regular Expressions as a Language

The regular expression language is designed and optimized to manipulate text. The language comprises two basic character types: literal (normal) text characters and metacharacters. The set of metacharacters gives regular expressions their processing power.

You are probably familiar with the ? and * metacharacters used with the DOS file system to represent any single character or group of characters. The DOS file command COPY *.DOC A: commands the file system to copy any file with a .DOC file name extension to the disk in drive A. The metacharacter * stands in for any file name in front of the file name extension .DOC. Regular expressions extend this basic idea many times over, providing a large set of metacharacters that make it possible to describe very complex text-matching expressions with relatively few characters.

For example, the regular expression \s2000, when applied to a body of text, matches all occurrences of the string "2000" that are preceded by any white-space character, such as a space or a tab.

Note   If you are using C++, C#, or JScript, special escape characters, such as \s, must be preceded by an additional backslash (for example, "

\\s2000"

) to signal that the backslash in the escape character is a literal character. Otherwise, the regular expression engine treats the backslash and the s in

\s

as two separate operators. You do not have to add the backslash if you are using Visual Basic .NET. If you are using C#, you can use C# literal strings, which are prefixed with @ and disable escaping (for example,

@"\\s2000"

).

Regular expressions can also perform searches that are more complex. For example, the regular expression (?<char>\w)\k<char>, using named groups and backreferencing, searches for adjacent paired characters. When applied to the string "I'll have a small coffee" it finds matches in the words "I'll", "small", and "coffee". (For details on this regular expression, see Backreferences.)

The following sections detail the set of metacharacters that define the .NET Framework regular expression language and show how to use the regular expression classes to implement regular expressions in your applications.