Example: Scanning for HREFs

The following example searches an input string and prints out all the href="…" values and their locations in the string.

The Regex Object

Because the Regex object is used in the DumpHRefs method, which can be called multiple times from user code, the static (Shared in Visual Basic) Regex.Match(String, String, RegexOptions) method is used. This enables the regular expression engine to cache regular expressions and avoids the overhead of instantiating a new Regex object each time the method is called. A Match object is then used to iterate through all matches in the string. In this example, the metacharacter \s matches any space character, and \S matches any non-space character.

Private Sub DumpHRefs(inputString As String) 
   Dim m As Match
   Dim HRefPattern As String = "href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S))"

   m = Regex.Match(inputString, HRefPattern, _ 
                   RegexOptions.IgnoreCase Or RegexOptions.Compiled)
   Do While m.Success
      Console.WriteLine("Found href {0} at {1}.", _
                        m.Groups(1), m.Groups(1).Index)
      m = m.NextMatch()
   Loop    
End Sub
private static void DumpHRefs(string inputString) 
{
   Match m;
   string HRefPattern = "href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S))";

   m = Regex.Match(inputString, HRefPattern, 
                   RegexOptions.IgnoreCase | RegexOptions.Compiled);
   while (m.Success)
   {
      Console.WriteLine("Found href " + m.Groups[1] + " at " 
         + m.Groups[1].Index);
      m = m.NextMatch();
   }   
}

The following example then illustrates a call to the DumpHRefs method.

Public Sub Main()
   Dim inputString As String = "My favorite web sites include:</P>" & _
                               "<A HREF=""https://msdn2.microsoft.com"">" & _
                               "MSDN Home Page</A></P>" & _
                               "<A HREF=""https://www.microsoft.com"">" & _
                               "Microsoft Corporation Home Page</A></P>" & _
                               "<A HREF=""https://blogs.msdn.com/bclteam"">" & _
                               ".NET Base Class Library blog</A></P>"
   DumpHRefs(inputString)                     
End Sub 
' The example displays the following output: 
'       Found href https://msdn2.microsoft.com at 43 
'       Found href https://www.microsoft.com at 102 
'       Found href https://blogs.msdn.com/bclteam/) at 176
public static void Main()
{
   string inputString = "My favorite web sites include:</P>" +
                        "<A HREF=\"https://msdn2.microsoft.com\">" +
                        "MSDN Home Page</A></P>" +
                        "<A HREF=\"https://www.microsoft.com\">" +
                        "Microsoft Corporation Home Page</A></P>" +
                        "<A HREF=\"https://blogs.msdn.com/bclteam\">" +
                        ".NET Base Class Library blog</A></P>";
   DumpHRefs(inputString);                     

}
// The example displays the following output: 
//       Found href https://msdn2.microsoft.com at 43 
//       Found href https://www.microsoft.com at 102 
//       Found href https://blogs.msdn.com/bclteam at 176

Match Result Class

The results of a search are stored in the Match class, which provides access to all the substrings extracted by the search. It also remembers the string being searched and the regular expression being used, so it can use them to perform another search starting where the last one ended.

Explicitly Named Captures

In traditional regular expressions, capturing parentheses are automatically numbered sequentially. This leads to two problems. First, if a regular expression is modified by inserting or removing a set of parentheses, all code that refers to the numbered captures must be rewritten to reflect the new numbering. Second, because different sets of parentheses often are used to provide two alternative expressions for an acceptable match, it might be difficult to determine which of the two expressions actually returned a result.

To address these problems,the Regex class supports the syntax (?<name>…) for capturing a match into a specified slot (the slot can be named using a string or an integer; integers can be recalled more quickly). Thus, alternative matches for the same string all can be directed to the same place. In case of a conflict, the last match dropped into a slot is the successful match. (However, a complete list of multiple matches for a single slot is available. See the Group.Captures collection for details.)

See Also

Other Resources

.NET Framework Regular Expressions