Export (0) Print
Expand All

Regex Class

Represents an immutable regular expression.

Namespace:  System.Text.RegularExpressions
Assembly:  System (in System.dll)

[SerializableAttribute]
public ref class Regex : ISerializable

The Regex class represents the .NET Framework's regular expression engine. It can be used to quickly parse large amounts of text to find specific character patterns; to extract, edit, replace, or delete text substrings; or to add the extracted strings to a collection to generate a report.

NoteNote:

If your primary interest is to validate a string by determining whether it conforms to a particular pattern, you can use the System.Configuration::RegexStringValidator class.

To use regular expressions, you define the pattern that you want to identify in a text stream by using the syntax documented in Regular Expression Language - Quick Reference. Next, you can optionally instantiate a Regex object. Finally, you perform some operation, such as replacing text that matches the regular expression pattern, or identifying a pattern match.

Regex vs. String Methods

The System::String class includes several search and comparison methods that you can use to perform pattern matching with text. For example, the String::Contains, String::EndsWith, and String::StartsWith methods determine whether a string instance contains a specified substring; and the String::IndexOf, String::IndexOfAny, String::LastIndexOf, and String::LastIndexOfAny methods return the starting position of a specified substring in a string. Use the methods of the System::String class when you are searching for a specific string. Use the Regex class when you are searching for a specific pattern in a string. For more information and examples, see .NET Framework Regular Expressions.

Static vs. Instance Methods

After you define a regular expression pattern, you can provide it to the regular expression engine in either of two ways.

  • By instantiating a Regex object that represents the regular expression. To do this, you pass the regular expression pattern to a Regex constructor. A Regex object is immutable; when you instantiate a Regex object with a regular expression, that object's regular expression cannot be changed.

  • By supplying both the regular expression and the text to search to a static (Shared in Visual Basic) Regex method. This enables you to use a regular expression without explicitly creating a Regex object.

All Regex pattern identification methods include both static and instance overloads.

The regular expression engine must compile a particular pattern before the pattern can be used. Because Regex objects are immutable, this is a one-time procedure that occurs when a Regex class constructor or a static method is called. To eliminate the need to repeatedly compile a single regular expression, the regular expression engine caches the compiled regular expressions used in static method calls. As a result, regular expression pattern-matching methods offer comparable performance for static and instance methods.

Important noteImportant Note:

In the .NET Framework versions 1.0 and 1.1, all compiled regular expressions, whether they were used in instance or static method calls, were cached. Starting with the .NET Framework 2.0, only regular expressions used in static method calls are cached.

However, the system of caching implemented by the regular expression engine can adversely affect performance in the following two cases:

  • When you use static method calls with a large number of regular expressions. By default, the regular expression engine caches the 15 most recently used static regular expressions. If your application uses more than 15 static regular expressions, some regular expressions must be recompiled. To prevent this recompilation, you can increase the Regex::CacheSize property to an appropriate value.

  • When your application instantiates new Regex objects with regular expressions that have previously been compiled. For example, the following code defines a regular expression to locate duplicated words in individual lines of a text stream. Although the example uses a single regular expression, it instantiates a new Regex object to process each line of text. This results in the recompilation of the regular expression with each iteration of the loop.

    No code example is currently available or this language may not be supported.

    To prevent recompilation, the application should instantiate a single Regex object that is accessible to all code that requires it, as shown in the following rewritten example.

    No code example is currently available or this language may not be supported.

Performing Regular Expression Operations

Whether you decide to instantiate a Regex object and call its methods or call static methods, the Regex class offers the following pattern-matching functionality:

  • Validation of a match. You call the IsMatch method to determine whether a match is present.

  • Retrieval of a single match. You call the Match method to retrieve a Match object that represents the first match in a string or in part of a string. Subsequent matches can be retrieved by calling the Match::NextMatch method.

  • Retrieval of all matches. You call the Matches method to retrieve a System.Text.RegularExpressions::MatchCollection object that represents all the matches found in a string or in part of a string.

  • Replacement of matched text. You call the Replace method to replace matched text. The replacement text can also be defined by a regular expression. In addition, some of the Replace methods include a MatchEvaluator parameter that enables you to programmatically define the replacement text.

  • Creation of a string array that is formed from parts of an input string. You call the Split method to split an input string at positions that are defined by the regular expression.

In addition to its pattern-matching methods, the Regex class includes several special-purpose methods:

  • The Escape method escapes any characters that may be interpreted as regular expression operators in a regular expression or input string.

  • The Unescape method removes these escape characters.

The CompileToAssembly method creates an assembly that contains predefined regular expressions. The .NET Framework contains examples of these special-purpose assemblies in the System.Web.RegularExpressions namespace.

The following example uses a regular expression to check for repeated occurrences of words in a string. The regular expression \b(?<word>\w+)\s+(\k<word>)\b can be interpreted as shown in the following table.

Pattern

Description

\b

Start the match at a word boundary.

(?<word>\w+)

Match one or more word characters up to a word boundary. Name this captured group word.

\s+

Match one or more white-space characters.

(\k<word>)

Match the captured group that is named word.

\b

Match a word boundary.

#using <System.dll>

using namespace System;
using namespace System::Text::RegularExpressions;
int main()
{
   // Define a regular expression for repeated words.
   Regex^ rx = gcnew Regex( "\\b(?<word>\\w+)\\s+(\\k<word>)\\b",static_cast<RegexOptions>(RegexOptions::Compiled | RegexOptions::IgnoreCase) );

   // Define a test string.        
   String^ text = "The the quick brown fox  fox jumped over the lazy dog dog.";

   // Find matches.
   MatchCollection^ matches = rx->Matches( text );

   // Report the number of matches found.
   Console::WriteLine( "{0} matches found.", matches->Count );

   // Report on each match. 
   for each (Match^ match in matches)
   {
      String^ word = match->Groups["word"]->Value;
      int index = match->Index;
      Console::WriteLine("{0} repeated at position {1}", word, index);   
   }
}

The following example illustrates the use of a regular expression to check whether a string either represents a currency value or has the correct format to represent a currency value. In this case, the regular expression is built dynamically from the NumberFormatInfo::CurrencyDecimalSeparator, CurrencyDecimalDigits, NumberFormatInfo::CurrencySymbol, NumberFormatInfo::NegativeSign, and NumberFormatInfo::PositiveSign properties for the user's current culture. If the system's current culture is en-US, the resulting regular expression is ^\s*[\+-]?\s?\$?\s?(\d*\.?\d{2}?){1}$. This regular expression can be interpreted as shown in the following table.

Pattern

Description

^

Start at the beginning of the string.

\s*

Match zero or more white-space characters.

[\+-]?

Match zero or one occurrence of either the positive sign or the negative sign.

\s?

Match zero or one white-space character.

\$?

Match zero or one occurrence of the dollar sign.

\s?

Match zero or one white-space character.

\d*

Match zero or more decimal digits.

\.?

Match zero or one decimal point symbol.

\d{2}?

Match two decimal digits zero or one time.

(\d*\.?\d{2}?){1}

Match the pattern of integral and fractional digits separated by a decimal point symbol at least one time.

$

Match the end of the string.

In this case, the regular expression assumes that a valid currency string does not contain group separator symbols, and that it has either no fractional digits or the number of fractional digits defined by the current culture's CurrencyDecimalDigits property.

No code example is currently available or this language may not be supported.

Because the regular expression in this example is built dynamically, we do not know at design time whether the current culture's currency symbol, decimal sign, or positive and negative signs might be misinterpreted by the regular expression engine as regular expression language operators. To prevent any misinterpretation, the example passes each dynamically generated string to the Escape method.

The Regex class is immutable (read-only) and is inherently thread safe. Regex objects can be created on any thread and shared between threads. For more information, see Thread Safety.

Windows 7, Windows Vista, Windows XP SP2, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP Starter Edition, Windows Server 2008 R2, Windows Server 2008, Windows Server 2003, Windows Server 2000 SP4, Windows Millennium Edition, Windows 98, Windows CE, Windows Mobile for Smartphone, Windows Mobile for Pocket PC, Xbox 360, Zune

The .NET Framework and .NET Compact Framework do not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.

.NET Framework

Supported in: 3.5, 3.0, 2.0, 1.1, 1.0

.NET Compact Framework

Supported in: 3.5, 2.0, 1.0

XNA Framework

Supported in: 3.0, 2.0, 1.0

Date

History

Reason

orc_cpub16

Corrected regular expression pattern in second example to handle white space.

Customer feedback.

August 2009

Extensively revised the Remarks section and added examples.

Information enhancement.

October 2008

Added a note that the RegexStringValidator class can be used to validate strings.

Customer feedback.

Community Additions

ADD
Show:
© 2014 Microsoft