How to: Identify Text in an HTML String in Visual Basic

This example demonstrates how to use a simple regular expression to remove tags from an HTML document.

Example

HTML tags can be matched with the regular expression \<[^\>]+\>, which means:

  1. The character "<", followed by

  2. A set of one or more characters, not including the ">" character, followed by

  3. The character ">".

This example uses the shared Regex.Replace method to replace all matches of the tag regular expression with the empty string.

    ''' <summary>Removes the tags from an HTML document.</summary>
    ''' <param name="htmlText">HTML text to parse.</param>
    ''' <returns>The text of an HTML document without tags.</returns>
    ''' <remarks></remarks>
    Function GetTextFromHtml(ByVal htmlText As String) As String
        Dim output As String = Regex.Replace(htmlText, "\<[^\>]+\>", "")
        Return output
    End Function

This example requires that you use the Imports statement to import the System.Text.RegularExpressions namespace. For more information, see Imports Statement (.NET Namespace and Type).

See Also

Tasks

How to: Identify Hyperlinks in an HTML String in Visual Basic

How to: Strip Invalid Characters from a String

Other Resources

Parsing Strings in Visual Basic