正则表达式类

更新:2007 年 11 月

以下各节介绍 .NET Framework 正则表达式类。

Regex

Regex 类表示不可变(只读)的正则表达式。它还包含各种静态方法,允许在不显式创建其他类的实例的情况下使用其他正则表达式类。

下面的代码示例创建了 Regex 类的实例并在初始化对象时定义一个简单的正则表达式。请注意,使用了附加的反斜杠作为转义字符,它将 \s 匹配字符类中的反斜杠指定为原义字符。

' Declare object variable of type Regex.
Dim r As Regex 
' Create a Regex object and define its regular expression.
r = New Regex("\s2000")
// Declare object variable of type Regex.
Regex r; 
// Create a Regex object and define its regular expression.
r = new Regex("\\s2000"); 

Match

Match 类表示正则表达式匹配操作的结果。下面的示例使用 Regex 类的 Match 方法返回 Match 类型的对象,以便找到输入字符串中的第一个匹配项。此示例使用 Match 类的 Match.Success 属性来指示是否已找到匹配。

' Create a new Regex object.
Dim r As New Regex("abc") 
' Find a single match in the input string.
Dim m As Match = r.Match("123abc456") 
If m.Success Then
    ' Print out the character position where a match was found. 
    Console.WriteLine("Found match at position " & m.Index.ToString())
End If
  ' The example displays the following output:
  '       Found match at position 3      
 // Create a new Regex object.
 Regex r = new Regex("abc"); 
 // Find a single match in the string.
 Match m = r.Match("123abc456"); 
 if (m.Success) 
 {
     // Print out the character position where a match was found. 
     Console.WriteLine("Found match at position " + m.Index);
 }
// The example displays the following output:
//       Found match at position 3      

MatchCollection

MatchCollection 类表示成功的非重叠匹配项的序列。该集合为不可变(只读)的,并且没有公共构造函数。MatchCollection 的实例是由 Regex.Matches 方法返回的。

下面的示例通过 Regex 类的 Matches 方法,使用在输入字符串中找到的所有匹配项填充 MatchCollection。该示例将此集合复制到一个字符串数组和一个整数数组中,其中字符串数组用以保存每个匹配项,整数数组用以指示每个匹配项的位置。

 Dim mc As MatchCollection
 Dim results As New List(Of String)
 Dim matchposition As New List(Of Integer)

 ' Create a new Regex object and define the regular expression.
 Dim r As New Regex("abc")
 ' Use the Matches method to find all matches in the input string.
 mc = r.Matches("123abc4abcd")
 ' Loop through the match collection to retrieve all 
 ' matches and positions.
 For i As Integer = 0 To mc.Count - 1
     ' Add the match string to the string array.
     results.Add(mc(i).Value)
     ' Record the character position where the match was found.
     matchposition.Add(mc(i).Index)
 Next i
 ' List the results.
 For ctr As Integer = 0 To Results.Count - 1
   Console.WriteLine("'{0}' found at position {1}.", _
                     results(ctr), matchposition(ctr))  
 Next
' The example displays the following output:
'       'abc' found at position 3.
'       'abc' found at position 7.
MatchCollection mc;
List<string> results = new List<string>();
List<int> matchposition = new List<int>();

// Create a new Regex object and define the regular expression.
Regex r = new Regex("abc"); 
// Use the Matches method to find all matches in the input string.
mc = r.Matches("123abc4abcd");
// Loop through the match collection to retrieve all 
// matches and positions.
for (int i = 0; i < mc.Count; i++) 
{
   // Add the match string to the string array.   
   results.Add(mc[i].Value);
   // Record the character position where the match was found.
   matchposition.Add(mc[i].Index);   
}
// List the results.
for(int ctr = 0; ctr <= results.Count - 1; ctr++)
   Console.WriteLine("'{0}' found at position {1}.", results[ctr], matchposition[ctr]);   

// The example displays the following output:
//       'abc' found at position 3.
//       'abc' found at position 7.

GroupCollection

GroupCollection 类表示被捕获的组的集合,并在单个匹配项中返回该捕获组的集合。该集合为不可变(只读)的,并且没有公共构造函数。GroupCollection 的实例在 Match.Groups 属性返回的集合中返回。

下面的示例查找并输出由正则表达式捕获的组的数目。有关如何提取组集合的每一成员中的各个捕获项的示例,请参见下面一节的 Capture Collection 示例。

' Define groups "abc", "ab", and "b".
Dim r As New Regex("(a(b))c") 
Dim m As Match = r.Match("abdabc")
Console.WriteLine("Number of groups found = " _
                  & m.Groups.Count)
' The example displays the following output:
'       Number of groups found = 3
// Define groups "abc", "ab", and "b".
Regex r = new Regex("(a(b))c"); 
Match m = r.Match("abdabc");
Console.WriteLine("Number of groups found = " + m.Groups.Count);
// The example displays the following output:
//       Number of groups found = 3

CaptureCollection

CaptureCollection 类表示捕获的子字符串的序列,并返回由单个捕获组所执行的捕获集。由于限定符,捕获组可以在单个匹配中捕获多个字符串。Captures 属性(CaptureCollection 类的对象)作为 MatchGroup 类的成员提供,目的是便于对捕获的子字符串的集合进行访问。

例如,如果使用正则表达式 ((a(b))c)+(其中 + 限定符指定一个或多个匹配)从字符串“abcabcabc”中捕获匹配,则子字符串的每一匹配的 GroupCaptureCollection 将包含三个成员。

下面的示例使用正则表达式 (Abc)+ 来查找字符串“XYZAbcAbcAbcXYZAbcAb”中的一个或多个匹配项。该示例阐释了使用 Captures 属性来返回多组捕获的子字符串。

Dim counter As Integer
Dim m As Match
Dim cc As CaptureCollection
Dim gc As GroupCollection

' Look for groupings of "Abc".
Dim r As New Regex("(Abc)+") 
' Define the string to search.
m = r.Match("XYZAbcAbcAbcXYZAbcAb")
gc = m.Groups

' Display the number of groups.
Console.WriteLine("Captured groups = " & gc.Count.ToString())

' Loop through each group.
Dim i, ii As Integer
For i = 0 To gc.Count - 1
    cc = gc(i).Captures
    counter = cc.Count

    ' Display the number of captures in this group.
    Console.WriteLine("Captures count = " & counter.ToString())

    ' Loop through each capture in the group.            
    For ii = 0 To counter - 1
        ' Display the capture and its position.
        Console.WriteLine(cc(ii).ToString() _
            & "   Starts at character " & cc(ii).Index.ToString())
    Next ii
Next i
' The example displays the following output:
'       Captured groups = 2
'       Captures count = 1
'       AbcAbcAbc   Starts at character 3
'       Captures count = 3
'       Abc   Starts at character 3
'       Abc   Starts at character 6
'       Abc   Starts at character 9  
   int counter;
   Match m;
   CaptureCollection cc;
   GroupCollection gc;

   // Look for groupings of "Abc".
   Regex r = new Regex("(Abc)+"); 
   // Define the string to search.
   m = r.Match("XYZAbcAbcAbcXYZAbcAb"); 
   gc = m.Groups;

   // Display the number of groups.
   Console.WriteLine("Captured groups = " + gc.Count.ToString());

   // Loop through each group.
   for (int i=0; i < gc.Count; i++) 
   {
      cc = gc[i].Captures;
      counter = cc.Count;

      // Display the number of captures in this group.
      Console.WriteLine("Captures count = " + counter.ToString());

      // Loop through each capture in the group.
      for (int ii = 0; ii < counter; ii++) 
      {
         // Display the capture and its position.
         Console.WriteLine(cc[ii] + "   Starts at character " + 
              cc[ii].Index);
      }
   }
}
// The example displays the following output:
//       Captured groups = 2
//       Captures count = 1
//       AbcAbcAbc   Starts at character 3
//       Captures count = 3
//       Abc   Starts at character 3
//       Abc   Starts at character 6
//       Abc   Starts at character 9  

Group

Group 类表示来自单个捕获组的结果。因为 Group 可以在单个匹配中捕获零个、一个或更多的字符串(使用限定符),所以它包含 Capture 对象的集合。因为 Group 继承自 Capture,所以可以直接访问最后捕获的子字符串(Group 实例本身等价于 Captures 属性返回的集合的最后一项)。

通过对由 Groups 属性返回的 GroupCollection 对象进行索引返回 Group 的实例。索引器可以是组号,在使用 "(?<groupname>)" 分组构造时则是捕获组的名称。例如,可以在 C# 代码中使用 Match.Groups[groupnum] 或 Match.Groups["groupname"],或者在 Visual Basic代码中使用 Match.Groups(groupnum) 或 Match.Groups("groupname")。

下面的代码示例使用嵌套的分组构造来将子字符串捕获到组中。

 Dim matchposition As New List(Of Integer)
 Dim results As New List(Of String)
 ' Define substrings abc, ab, b.
 Dim r As New Regex("(a(b))c") 
 Dim m As Match = r.Match("abdabc")
 Dim i As Integer = 0
 While Not (m.Groups(i).Value = "")    
    ' Add groups to string array.
    results.Add(m.Groups(i).Value)     
    ' Record character position. 
    matchposition.Add(m.Groups(i).Index) 
     i += 1
 End While

 ' Display the capture groups.
 For ctr As Integer = 0 to results.Count - 1
    Console.WriteLine("{0} at position {1}", _ 
                      results(ctr), matchposition(ctr))
 Next                     
' The example displays the following output:
'       abc at position 3
'       ab at position 3
'       b at position 4
List<int> matchposition = new List<int>();
List<string> results = new List<string>();
// Define substrings abc, ab, b.
Regex r = new Regex("(a(b))c"); 
Match m = r.Match("abdabc");
for (int i = 0; m.Groups[i].Value != ""; i++) 
{
   // Add groups to string array.
   results.Add(m.Groups[i].Value); 
   // Record character position.
   matchposition.Add(m.Groups[i].Index); 
}

// Display the capture groups.
for (int ctr = 0; ctr < results.Count; ctr++)
   Console.WriteLine("{0} at position {1}", 
                     results[ctr], matchposition[ctr]);
// The example displays the following output:
//       abc at position 3
//       ab at position 3
//       b at position 4

下面的代码示例使用命名的分组构造,从包含“DATANAME:VALUE”格式的数据的字符串中捕获子字符串,正则表达式通过冒号“:”拆分数据。

Dim r As New Regex("^(?<name>\w+):(?<value>\w+)")
Dim m As Match = r.Match("Section1:119900")
Console.WriteLine(m.Groups("name").Value)
Console.WriteLine(m.Groups("value").Value)
' The example displays the following output:
'       Section1
'       119900
Regex r = new Regex("^(?<name>\\w+):(?<value>\\w+)");
Match m = r.Match("Section1:119900");
Console.WriteLine(m.Groups["name"].Value);
Console.WriteLine(m.Groups["value"].Value);
// The example displays the following output:
//       Section1
//       119900

Capture

Capture 类包含来自单个子表达式捕获的结果。

下面的示例在 Group 集合中循环,从 Group 的每一成员中提取 Capture 集合,并且将变量 posnlength 分别分配给找到每一字符串的初始字符串中的字符位置,以及每一字符串的长度。

 Dim r As Regex
 Dim m As Match
 Dim cc As CaptureCollection
 Dim posn, length As Integer

 r = New Regex("(abc)+")
 m = r.Match("bcabcabc")
 Dim i, j As Integer
 i = 0
 Do While m.Groups(i).Value <> ""
    Console.WriteLine(m.Groups(i).Value)
    ' Grab the Collection for Group(i).
    cc = m.Groups(i).Captures
    For j = 0 To cc.Count - 1

       Console.WriteLine("   Capture at position {0} for {1} characters.", _ 
                         cc(j).Length, cc(j).Index)
       ' Position of Capture object.
       posn = cc(j).Index
       ' Length of Capture object.
       length = cc(j).Length
    Next j
    i += 1
 Loop
' The example displays the following output:
'       abcabc
'          Capture at position 6 for 2 characters.
'       abc
'          Capture at position 3 for 2 characters.
'          Capture at position 3 for 5 characters.
Regex r;
Match m;
CaptureCollection cc;
int posn, length;

r = new Regex("(abc)+");
m = r.Match("bcabcabc");
for (int i=0; m.Groups[i].Value != ""; i++) 
{
   Console.WriteLine(m.Groups[i].Value);
   // Capture the Collection for Group(i).
   cc = m.Groups[i].Captures; 
   for (int j = 0; j < cc.Count; j++) 
   {
      Console.WriteLine("   Capture at position {0} for {1} characters.", 
                        cc[j].Length, cc[j].Index);
      // Position of Capture object.
      posn = cc[j].Index; 
      // Length of Capture object.
      length = cc[j].Length; 
   }
}
// The example displays the following output:
//       abcabc
//          Capture at position 6 for 2 characters.
//       abc
//          Capture at position 3 for 2 characters.
//          Capture at position 3 for 5 characters.

请参见

参考

System.Text.RegularExpressions

其他资源

.NET Framework 正则表达式