How to: Identify Hyperlinks in an HTML String in Visual Basic

This example demonstrates a simple regular expression for identifying hyperlinks in an HTML document.

Example

This example uses the regular expression <A[^>]*?HREF\s*=\s*"([^"]+)"[^>]*?>([\s\S]*?)<\/A>, which means:

  1. The string "<A", followed by

  2. The smallest set of zero or more characters that does not include the character ">", followed by

  3. The string "HREF", followed by

  4. Zero or more space characters, followed by

  5. The character "=", followed by

  6. Zero or more space characters, followed by

  7. The quotation-mark character, followed by

  8. The set of characters that does not include the quotation-mark character (captured), followed by

  9. The quotation-mark character, followed by

  10. The smallest set of zero or more characters that does not include the character ">", followed by

  11. The character ">", followed by

  12. The smallest set of zero or more characters (captured), followed by

  13. The string "</A>".

The Regex object is initialized with the regular expression, and specified to be case-insensitive.

The Regex object's Matches method returns a MatchCollection object that contains information about all the parts of the input string that the regular expression matches.

    ''' <summary>Identifies hyperlinks in HTML text.</summary>
    ''' <param name="htmlText">HTML text to parse.</param>
    ''' <remarks>This method displays the label and destination for
    ''' each link in the input text.</remarks>
    Sub IdentifyLinks(ByVal htmlText As String)
        Dim hrefRegex As New Regex( 
            "<A[^>]*?HREF\s*=\s*""([^""]+)""[^>]*?>([\s\S]*?)<\/A>", 
            RegexOptions.IgnoreCase)
        Dim output As String = ""
        For Each m As Match In hrefRegex.Matches(htmlText)
            output &= "Link label: " & m.Groups(2).Value & vbCrLf
            output &= "Link destination: " & m.Groups(1).Value & vbCrLf
        Next
        MsgBox(output)
    End Sub

This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Imports Statement (.NET Namespace and Type).

See Also

Concepts

Example: Scanning for HREFs

Other Resources

Parsing Strings in Visual Basic