Export (0) Print
Expand All

Regular Expression Language - Quick Reference

Regular expressions provide a powerful, flexible, and efficient method for processing text. The extensive pattern-matching notation of regular expressions allows you to quickly parse large amounts of text to find specific character patterns; to validate text to ensure that it matches a predefined pattern (such as an e-mail address); to extract, edit, replace, or delete text substrings; or to add the extracted strings to a collection in order to generate a report. For many applications that deal with strings or that parse large blocks of text, regular expressions are an indispensable tool.

The centerpiece of text processing with regular expressions is the regular expression engine, which is represented by the System.Text.RegularExpressions.Regex object in the .NET Framework for Silverlight. At a minimum, processing text using regular expressions requires that the regular expression engine be provided with the following two items of information:

  • The regular expression pattern to identify in the text.

    In the .NET Framework for Silverlight, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5 regular expressions and adds some additional features such as right-to-left matching. For details, see Regular Expressions as a Language and Regular Expression Language Elements in the .NET Framework documentation.

  • The text to parse for the regular expression pattern.

    Note Note:

    The implementation of the regular expression engine in the .NET Framework for Silverlight is identical to that in the .NET Framework. The single exception is that the .NET Framework for Silverlight does not support compiled regular expressions, which are predefined regular expression patterns that are stored in stand-alone assemblies together with dedicated regular expression engines that process text using those regular expression patterns.

The methods of the Regex class allow you to perform the following operations:

For an overview of the regular expression object model, see Regular Expression Classes in the .NET Framework documentation.

The following example illustrates the power of regular expressions combined with the flexibility offered by the .NET Framework's globalization features. It uses the DateTimeFormatInfo object to determine the format of currency values in the system's current culture. It then uses that information to dynamically construct a regular expression that extracts currency values from the text. For each match, it extracts the subgroup that contains the numeric string only, converts it to a Decimal value, and calculates a running total.


Imports System.Collections.Generic
Imports System.Globalization
Imports System.Text.RegularExpressions

Public Module Example
   Public Sub Demo(outputBlock As System.Windows.Controls.TextBlock)
      ' Define text to be parsed.
      Dim input As String = "Office expenses on 2/13/2008:" + vbCrLf + _
                            "Paper (500 sheets)                      $3.95" + vbCrLf + _
                            "Pencils (box of 10)                     $1.00" + vbCrLf + _
                            "Pens (box of 10)                        $4.49" + vbCrLf + _
                            "Erasers                                 $2.19" + vbCrLf + _
                            "Ink jet printer                        $69.95" + vbCrLf + vbCrLf + _
                            "Total Expenses                        $ 81.58" + vbCrLf
      ' Get current culture's DateTimeFormatInfo object.
      Dim nfi As NumberFormatInfo = CultureInfo.CurrentCulture.NumberFormat
      ' Assign needed property values to variables.
      Dim currencySymbol As String = nfi.CurrencySymbol
      Dim symbolPrecedesIfPositive As Boolean = CBool(nfi.CurrencyPositivePattern Mod 2 = 0)
      Dim groupSeparator As String = nfi.CurrencyGroupSeparator
      Dim decimalSeparator As String = nfi.CurrencyDecimalSeparator

      ' Form regular expression pattern.
      Dim pattern As String = Regex.Escape(CStr(IIf(symbolPrecedesIfPositive, currencySymbol, ""))) + _
                              "\s*[-+]?" + "([0-9]{0,3}(" + groupSeparator + "[0-9]{3})*(" + _
                              Regex.Escape(decimalSeparator) + "[0-9]+)?)" + _
                              CStr(IIf(Not symbolPrecedesIfPositive, currencySymbol, "")) 
      outputBlock.Text += "The regular expression pattern is: " + vbCrLf
      outputBLock.Text += "   " + pattern + vbCrLf      

      ' Get text that matches regular expression pattern.
      Dim matches As MatchCollection = Regex.Matches(input, pattern, RegexOptions.IgnorePatternWhitespace)               
      outputBlock.Text += String.Format("Found {0} matches. ", matches.Count) + vbCrLf

      ' Get numeric string, convert it to a value, and add it to List object.
      Dim expenses As New List(Of Decimal)

      For Each match As Match In matches
         expenses.Add(Decimal.Parse(match.Groups.Item(1).Value))      
      Next

      ' Determine whether total is present and if present, whether it is correct.
      Dim total As Decimal
      For Each value As Decimal In expenses
         total += value
      Next

      If total / 2 = expenses(expenses.Count - 1) Then
         outputBlock.Text += String.Format("The expenses total {0:C2}.", expenses(expenses.Count - 1))
      Else
         outputBlock.Text += String.Format("The expenses total {0:C2}.", total)
      End If   
   End Sub
End Module
' The example displays the following output:
'       The regular expression pattern is:
'          \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*\.?[0-9]+)
'       Found 6 matches.
'       The expenses total $81.58.


On a computer whose current culture is en-US, the example dynamically builds the regular expression \$\s*[-+]?([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?). This regular expression pattern can be interpreted as follows:

\$

Look for a single occurrence of the dollar symbol ($) in the input string. The regular expression pattern string includes a backslash to indicate that the dollar symbol is to be interpreted literally rather than as a regular expression anchor. (The $ symbol alone would indicate that the regular expression engine should try to begin its match at the end of a string.) To ensure that the current culture's currency symbol is not misinterpreted as a regular expression symbol, the example calls the Escape method to escape the character.

\s*

Look for zero or more occurrences of a white-space character.

[-+]?

Look for zero or one occurrence of either a positive sign or a negative sign.

([0-9]{0,3}(,[0-9]{3})*(\.[0-9]+)?)

The outer parentheses around this expression define it as a capturing group or a subexpression. If a match is found, information about this part of the matching string can be retrieved from the second Group object in the GroupCollection object returned by the Match.Groups property. (The first element in the collection represents the entire match.)

[0-9]{0,3}

Look for zero to three occurrences of the decimal digits 0 through 9.

(,[0-9]{3})*

Look for zero or more occurrences of a group separator followed by three decimal digits.

\.

Look for a single occurrence of the decimal separator.

[0-9]+

Look for one or more decimal digits.

(\.[0-9]+)?

Look for zero or one occurrence of the decimal separator followed by at least one decimal digit.

If each of these subpatterns is found in the input string, the match succeeds, and a Match object that contains information about the match is added to the MatchCollection object.

Community Additions

ADD
Show:
© 2014 Microsoft