Character Escapes

The backslash (\) in a regular expression indicates one of the following:

  • The character that follows it is a special character, as shown in the table in the following section. For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary, \t represents a tab, and \x020 represents a space.

  • A character that otherwise would be interpreted as an unescaped language construct should be interpreted literally. For example, a brace ({) begins the definition of a quantifier, but a backslash followed by a brace (\{) indicates that the regular expression engine should match the brace. Similarly, a single backslash marks the beginning of an escaped language construct, but two backslashes (\\) indicate that the regular expression engine should match the backslash.

Note

Character escapes are recognized in regular expression patterns but not in replacement patterns.

Character Escapes in the .NET Framework

The following table lists the character escapes supported by regular expressions in the .NET Framework.

Character or sequence

Description

All characters except for the following:

. $ ^ { [ ( | ) * + ? \

These characters have no special meanings in regular expressions; they match themselves.

\a

Matches a bell (alarm) character, \u0007.

\b

In a [character_group] character class, matches a backspace, \u0008. (See Character Classes.) Outside a character class, \b is an anchor that matches a word boundary. (See Anchors in Regular Expressions.)

\t

Matches a tab, \u0009.

\r

Matches a carriage return, \u000D. Note that \r is not equivalent to the newline character, \n.

\v

Matches a vertical tab, \u000B.

\f

Matches a form feed, \u000C.

\n

Matches a new line, \u000A.

\e

Matches an escape, \u001B.

\nnn

Matches an ASCII character, where nnn consists of up to three digits that represent the octal character code. For example, \040 represents a space character. However, this construct is interpreted as a backreference if it has only one digit (for example, \2) or if it corresponds to the number of a capturing group. (See Backreferences.)

\xnn

Matches an ASCII character, where nn is a two-digit hexadecimal character code.

\cX

Matches an ASCII control character, where X is the letter of the control character. For example, \cC is CTRL-C.

\unnnn

Matches a Unicode character, where nnnn is a four-digit hexadecimal code point.

Note

The Perl 5 character escape that is used to specify Unicode is not supported by the .NET Framework. The Perl 5 character escape has the form \x{####…}, where ####… is a series of hexadecimal digits. Instead, use \unnnn.

\

When followed by a character that is not recognized as an escaped character, matches that character. For example, \* matches an asterisk (*) and is the same as \x2A.

An Example

The following example illustrates the use of character escapes in a regular expression. It parses a string that contains the names of the world's largest cities and their populations in 2009. Each city name is separated from its population by a tab (\t) or a vertical bar (| or \u007c). Individual cities and their populations are separated from each other by a carriage return and line feed.

Imports System.Text.RegularExpressions

Module Example
   Public Sub Main()
      Dim delimited As String = "\G(.)[\t\u007c](.)\r?\n" 
      Dim input As String = "Mumbai, India|13,922,125" + vbCrLf + _
                            "Shanghai, China" + vbTab + "13,831,900" + vbCrLf + _
                            "Karachi, Pakistan|12,991,000" + vbCrLf + _
                            "Dehli, India" + vbTab + "12,259,230" + vbCrLf + _
                            "Istanbul, Turkey|11,372,613" + vbCrLf
      Console.WriteLine("Population of the World's Largest Cities, 2009")
      Console.WriteLine()
      Console.WriteLine("{0,-20} {1,10}", "City", "Population")
      Console.WriteLine()
      For Each match As Match In Regex.Matches(input, delimited)
         Console.WriteLine("{0,-20} {1,10}", match.Groups(1).Value, _
                                            match.Groups(2).Value)
      Next                          
   End Sub 
End Module 
' The example displays the following output: 
'       Population of the World's Largest Cities, 2009 
'        
'       City                 Population 
'        
'       Mumbai, India        13,922,125 
'       Shanghai, China      13,831,900 
'       Karachi, Pakistan    12,991,000 
'       Dehli, India         12,259,230 
'       Istanbul, Turkey     11,372,613
using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string delimited = @"\G(.)[\t\u007c](.)\r?\n";
      string input = "Mumbai, India|13,922,125\t\n" + 
                            "Shanghai, China\t13,831,900\n" + 
                            "Karachi, Pakistan|12,991,000\n" + 
                            "Dehli, India\t12,259,230\n" + 
                            "Istanbul, Turkey|11,372,613\n";
      Console.WriteLine("Population of the World's Largest Cities, 2009");
      Console.WriteLine();
      Console.WriteLine("{0,-20} {1,10}", "City", "Population");
      Console.WriteLine();
      foreach (Match match in Regex.Matches(input, delimited))
         Console.WriteLine("{0,-20} {1,10}", match.Groups[1].Value, 
                                            match.Groups[2].Value);
   }
}
// The example displyas the following output: 
//       Population of the World's Largest Cities, 2009 
//        
//       City                 Population 
//        
//       Mumbai, India        13,922,125 
//       Shanghai, China      13,831,900 
//       Karachi, Pakistan    12,991,000 
//       Dehli, India         12,259,230 
//       Istanbul, Turkey     11,372,613

The regular expression \G(.)[\t|\u007c](.)\r?\n is interpreted as shown in the following table.

Pattern

Description

\G

Begin the match where the last match ended.

(.)

Match any character one or more times. This is the first capturing group.

[\t\u007c]

Match a tab (\t) or a vertical bar (|).

(.)

Match any character one or more times. This is the second capturing group.

\r?\n

Match zero or one occurrence of a carriage return followed by a new line.

See Also

Concepts

Regular Expression Language - Quick Reference

Change History

Date

History

Reason

May 2010

Corrected the regular expression pattern in the example.

Customer feedback.

January 2010

Revised extensively.

Information enhancement.