Export (0) Print
Expand All

Character Escapes in Regular Expressions

The backslash (\) in a regular expression indicates one of the following:

  • The character that follows it is a special character, as shown in the table in the following section. For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary, \t represents a tab, and \x020 represents a space.

  • A character that otherwise would be interpreted as an unescaped language construct should be interpreted literally. For example, a brace ({) begins the definition of a quantifier, but a backslash followed by a brace (\{) indicates that the regular expression engine should match the brace. Similarly, a single backslash marks the beginning of an escaped language construct, but two backslashes (\\) indicate that the regular expression engine should match the backslash.

Note Note

Character escapes are recognized in regular expression patterns but not in replacement patterns.

The following table lists the character escapes supported by regular expressions in the .NET Framework.

Character or sequence

Description

All characters except for the following:

. $ ^ { [ ( | ) * + ? \

These characters have no special meanings in regular expressions; they match themselves.

\a

Matches a bell (alarm) character, \u0007.

\b

In a [character_group] character class, matches a backspace, \u0008. (See Character Classes in Regular Expressions.) Outside a character class, \b is an anchor that matches a word boundary. (See Anchors in Regular Expressions.)

\t

Matches a tab, \u0009.

\r

Matches a carriage return, \u000D. Note that \r is not equivalent to the newline character, \n.

\v

Matches a vertical tab, \u000B.

\f

Matches a form feed, \u000C.

\n

Matches a new line, \u000A.

\e

Matches an escape, \u001B.

\ nnn

Matches an ASCII character, where nnn consists of two or three digits that represent the octal character code. For example, \040 represents a space character. This construct is interpreted as a backreference if it has only one digit (for example, \2) or if it corresponds to the number of a capturing group. (See Backreference Constructs in Regular Expressions.)

\x nn

Matches an ASCII character, where nn is a two-digit hexadecimal character code.

\c X

Matches an ASCII control character, where X is the letter of the control character. For example, \cC is CTRL-C.

\u nnnn

Matches a UTF-16 code unit whose value is nnnn hexadecimal.

Note Note

The Perl 5 character escape that is used to specify Unicode is not supported by the .NET Framework. The Perl 5 character escape has the form \x{####…}, where #### is a series of hexadecimal digits. Instead, use \unnnn.

\

When followed by a character that is not recognized as an escaped character, matches that character. For example, \* matches an asterisk (*) and is the same as \x2A.

The following example illustrates the use of character escapes in a regular expression. It parses a string that contains the names of the world's largest cities and their populations in 2009. Each city name is separated from its population by a tab (\t) or a vertical bar (| or \u007c). Individual cities and their populations are separated from each other by a carriage return and line feed.

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string delimited = @"\G(.+)[\t\u007c](.+)\r?\n";
      string input = "Mumbai, India|13,922,125\t\n" + 
                            "Shanghai, China\t13,831,900\n" + 
                            "Karachi, Pakistan|12,991,000\n" + 
                            "Dehli, India\t12,259,230\n" + 
                            "Istanbul, Turkey|11,372,613\n";
      Console.WriteLine("Population of the World's Largest Cities, 2009");
      Console.WriteLine();
      Console.WriteLine("{0,-20} {1,10}", "City", "Population");
      Console.WriteLine();
      foreach (Match match in Regex.Matches(input, delimited))
         Console.WriteLine("{0,-20} {1,10}", match.Groups[1].Value, 
                                            match.Groups[2].Value);
   }
}
// The example displyas the following output: 
//       Population of the World's Largest Cities, 2009 
//        
//       City                 Population 
//        
//       Mumbai, India        13,922,125 
//       Shanghai, China      13,831,900 
//       Karachi, Pakistan    12,991,000 
//       Dehli, India         12,259,230 
//       Istanbul, Turkey     11,372,613

The regular expression \G(.+)[\t|\u007c](.+)\r?\n is interpreted as shown in the following table.

Pattern

Description

\G

Begin the match where the last match ended.

(.+)

Match any character one or more times. This is the first capturing group.

[\t\u007c]

Match a tab (\t) or a vertical bar (|).

(.+)

Match any character one or more times. This is the second capturing group.

\r?\n

Match zero or one occurrence of a carriage return followed by a new line.

Show:
© 2014 Microsoft