Enhanced Split Function for Text Parsing

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

 

Wade Tai
Microsoft Corporation

Created: November 1999

Applies to: Microsoft Office 2000

Summary: The Split function is an exciting new feature of Microsoft Office 2000. However, it doesn't handle consecutive delimiters, punctuation, and other characters that should not be included with words. This article demonstrates one solution to this problem. (4 printed pages)

The Split function is a feature of Visual Basic® for Applications (VBA) in Microsoft® Office 2000. By default, it takes a string argument, divides the string into elements by using the space character as the delimiter, and returns a string array. For example, if you pass in the following sentence:

"This is awesome"

each element of the resulting array contains a word:

"This"
"is"
"awesome"

VBA programmers have long awaited such a function because it makes text parsing so much easier. Now you can use the Split function to parse user input, command-line arguments, and delimited text files. However, there are a few things to be aware of before you use the Split function. For instance, the Split function doesn't recognize consecutive delimiters and punctuation. Assume we pass in the following string (and note the extra spaces in the text):

" This  is awesome!"

The resulting elements are:

""
"This"
""
"is"
"awesome!"

The good news is that if you are only interested in the words, it takes only a few more lines of VBA code to parse the string correctly. In the following example, we create a new public function, also called Split. You don't need to overload Split or make it public, but if you already have code that uses the Split function, this is a good way to minimize code changes in your existing application.

To demonstrate how the function works, create a new module in your document and paste the following code into it. The new Split function replaces each tab, punctuation, and grouping symbol with a space character. Next, consecutive space characters are replaced with a single space character, and the loop continues until no more consecutive space characters are left in the string. Finally, the VBA Split function is called to parse the string. An optional delimiter can be specified.

Public Function Split(ByVal InputText As String, _
         Optional ByVal Delimiter As String) As Variant

' This function splits the sentence in InputText into
' words and returns a string array of the words. Each
' element of the array contains one word.

    ' This constant contains punctuation and characters
    ' that should be filtered from the input string.
    Const CHARS = ".!?,;:""'()[]{}"
    Dim strReplacedText As String
    Dim intIndex As Integer

    ' Replace tab characters with space characters.
    strReplacedText = Trim(Replace(InputText, _
         vbTab, " "))

    ' Filter all specified characters from the string.
    For intIndex = 1 To Len(CHARS)
        strReplacedText = Trim(Replace(strReplacedText, _
            Mid(CHARS, intIndex, 1), " "))
    Next intIndex

    ' Loop until all consecutive space characters are
    ' replaced by a single space character.
    Do While InStr(strReplacedText, "  ")
        strReplacedText = Replace(strReplacedText, _
            "  ", " ")
    Loop

    ' Split the sentence into an array of words and return
    ' the array. If a delimiter is specified, use it.
    'MsgBox "String:" & strReplacedText
    If Len(Delimiter) = 0 Then
        Split = VBA.Split(strReplacedText)
    Else
        Split = VBA.Split(strReplacedText, Delimiter)
    End If
End Function

To see how our Split function works, we'll pass in various strings to see how they're parsed. First, add the following function to the module you created earlier.

Public Function CountWords(InputText As String) As Long
' This function returns the long integer
' number of words in InputText.
    Dim astrWords() As String

    ' Split the input string into an array of words.
    astrWords = Split(InputText)

    ' Return the number of elements in the array.
    CountWords = UBound(astrWords) - _
        LBound(astrWords) + 1
End Function

We will use this function to count the words after the string is parsed. Next, add the following subroutine to the module you created earlier.

Private Sub SplitTest()
    Dim strTest As String
    strTest = Chr(9) & "   Is this    a   " _
        & String(2, 9) _
        & "([long] and   'boring')" & vbTab _
        & "   sentence?  " & String(3, Asc(vbTab))
    MsgBox """" & strTest & """" & vbCr & vbLf & "Words:" _
        & CountWords(strTest)
        
    strTest = vbTab & " word " & vbTab
    MsgBox """" & strTest & """" & vbCr & vbLf & "Words:" _
        & CountWords(strTest)
        
    strTest = ""
    MsgBox """" & strTest & """" & vbCr & vbLf & "Words:" _
        & CountWords(strTest)
        
    strTest = " "
    MsgBox """" & strTest & """" & vbCr & vbLf & "Words:" _
        & CountWords(strTest)
        
    strTest = String(5, Asc(vbTab))
    MsgBox """" & strTest & """" & vbCr & vbLf & "Words:" _
        & CountWords(strTest)
End Sub

Run the SplitTest subroutine. Notice how punctuation characters, brackets, and parentheses are ignored. Consecutive tab and space characters are treated as one. You can uncomment the line:

'MsgBox "String:" & strReplacedText

in the Split function to see the resulting string that is passed to the VBA Split function. You can also use the companion VBA function, Join, to reassemble the string from the substrings in the array. With your new Split function, parsing text is easier than ever!