Lightning Strings
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
VBA Tech
Fast, Undocumented String-handling Techniques
By Steven Roman, Ph.D.
I recently finished writing a book entitled Win32 API
Programming with Visual Basic (O'Reilly & Assoc., 1999). The most
frequently asked question in connection with this book is "Why would I, as
a VB/VBA programmer, want to use the Win32 API?"
There are many ways to answer this question. The
following are a few:
- Using the Win32 API, a VB programmer can manipulate the user
interface more completely than with VB alone. For instance, the Win32
API makes it relatively easy to add tab stops or horizontal scroll bars
to a list box, or use a bitmap as a menu item.
- The Win32 API allows the VB/VBA programmer to get more information
about the state of the system - information such as the version of the
operating system, a list of installed printers and fonts, or the number
of buttons on the mouse. It can also be used to get a list of all open
windows or all running applications.
- The Win32 API can be used to dig deeply into the operating system.
For instance, it can be used to subclass a control to change its
behavior, to hook the operating system in order to watch for keystrokes
or mouse actions and possibly alter their behavior, or to extract data
from controls in foreign processes. You can even force one application
to run code written from another application.
The book delves into all these aspects of the Win32 API
and more. In this article, however, I want to show you a very simple, but
very important, use of the Win32 API: sorting VB strings.
FIGURE 1 shows the results of a simple program that
sorts two string arrays: an array consisting of 100 short strings, each of
length 10,000 characters, and an array consisting of 100 long strings,
each of length 100,000 characters.
FIGURE 1: A sorting program.
The application uses two methods for sorting. Both
methods (slow and quick) use the same simple sorting algorithm, which puts
the smallest string in the first position, then puts the second smallest
string in the second position, and so on. The pseudocode is:
For i = 1 To NumStrings
For j = i + 1 To NumStrings
If strings(i) > strings(j) Then
swap string(i) and string(j)
End If
Next
Next
This sorting algorithm isn't very efficient, but the
algorithm isn't the important issue here. Rather, it's the method used to
swap adjacent strings when required. Indeed, even more efficient
algorithms, such as the venerable quicksort method, require swapping.
The swapping is done two ways. The slow way using VB
assignments:
' Swap strings s and t.
temp = s
s = t
t = temp
and the quick way using the Win32 API function
CopyMemory:
CopyMemory lng,
ByVal VarPtr(s), 4
CopyMemory
ByVal VarPtr(s), ByVal
VarPtr(t), 4
CopyMemory
ByVal VarPtr(t), lng, 4
As you can see from FIGURE 1, for the long string array,
the quick method (using CopyMemory) is 500 times faster
than the slow method, even on a rather small 100-item array. Moreover, the
time it takes the quick method does not depend upon the length of the
strings. Wow!
To understand how the quick method works, we need to
take a look at the internal nature of VB strings. Before doing that,
however, let's take a look at the Win32 API function
CopyMemory.
CopyMemory - A VB Hacker's Dream
The purpose of CopyMemory is simply to copy a
block of memory byte-by-byte from one memory address to another. This
opens up a whole new set of possibilities for VB programmers, because VB
doesn't have this sort of capability, except in the rather restricted form
of LSet. Even then, the
documentation recommends against using LSet for this purpose.
The simplest VB declaration for CopyMemory is:
Declare SubCopyMemory Lib "kernel32" _
Alias"RtlMoveMemory" (lpDest As Any, _
lpSource As Any, ByValcbCopy As Long)
In this case, lpDest is the address of the first
byte of the destination memory, lpSource is the address of the
first byte of the source memory, and cbCopy is the number of bytes
to copy.
This VB declaration is a bit dangerous, because the
As Any form tells VB to skip any type checking, and an invalid type
can lead to the dreaded General Protection Fault. Thus, great care must be
taken when using this declaration. (Be sure to save all programs before
running code containing this function.) We can (and will) override the
default ByRef setting by including ByVal in the call to this
function, as in:
CopyMemory lng, AnAddress, 4
VB Strings
Let's now turn to a discussion of VB strings. We'll also
discuss the very useful, but undocumented, VB functions VarPtr and
StrPtr. I devote a 40-page chapter in my book to VB strings. Here's
a very abbreviated version.
The VB string data type, BSTR, is shown in FIGURE 2.
FIGURE 2: The BSTR data type.
The string in this figure corresponds to the following
VB code:
Dim str As String
str = "help"
There are several important things to note about the
BSTR data type:
- A BSTR is actually a pointer variable. It has a size of 32 bits,
like all pointers, and points to the first byte in a Unicode character
array. Thus, a Unicode character array and a BSTR are not the same
thing. This can cause great confusion, because the term string
sometimes refers to the BSTR and sometimes to the character array. To be
absolutely clear, we'll use the term VB string to refer to the
BSTR, not the character array.
- The Unicode character array that is pointed to by a BSTR must be
preceded by a 4-byte length field and terminated by a single, null,
2-byte character (ANSI = 0).
- There may be additional null (2-byte) characters anywhere within the
Unicode character array, so we cannot rely on a null character to signal
the end of the character array. This is why the length field is vital.
- The length field contains the number of bytes (not the number of
characters) in the character array, excluding the terminating null
bytes. Because the array is Unicode, the character count is one-half the
byte count.
Let's emphasize that code such as:
Dim str As String
str = "help"
means that str is the name of a BSTR, not a
Unicode character array. In other words, str is the name of the
variable that holds the address xxxx, as shown in FIGURE 2. (Of course,
the variable str has its own address, denoted by aaaa in FIGURE 2.)
Here is a brief experiment we can do to test the fact
that a VB string is a pointer to a character array and not a character
array. Consider the following code, which defines a structure whose
members are strings:
Private Type utTest
astring As String
bstring As String
End Type
Dim uTest As utTest Dim s as String
s = "testing" uTest.astring = "testing" uTest.bstring = "testing"
Debug.Print Len(s) Debug.Print Len(uTest)
The output from this code is:
7
8
In the case of the string variable s, the Len function reports the length
of the character array. In this case, there are seven characters in the
character array "testing". In the case of the structure variable
uTest, however, the Len function actually reports the length
of the structure (in bytes). The return value 8 clearly indicates that
each of the two BSTRs has length 4. This is because a BSTR is a pointer.
VarPtr and StrPtr
The functions VarPtr and StrPtr aren't documented by
Microsoft, but they can be very useful in manipulating BSTRs.
If var is
any variable, then:
VarPtr(var)
is the address of that variable, returned as a long. If
str is a BSTR variable then:
StrPtr(str)
is the contents of the BSTR, which, as we've seen, is
the address of the Unicode character array pointed to by the BSTR.
Let's verify these statements using the string in FIGURE
2. Note that the variable str
has address aaaa, and the character array begins at address xxxx, which is
the contents of the pointer variable str.
To see that:
VarPtr(str) = aaaa
StrPtr(str) = xxxx
run the code in FIGURE 3.
Dim lng As Long, i As Integer, s As String
Dim b(1 To 10) As Byte
Dim sp As Long, vp As Long
s = "help"
sp = StrPtr(s) Debug.Print "StrPtr:" & sp
vp = VarPtr(s) Debug.Print "VarPtr:" & vp
' Verify that sp = xxxx and vp = aaaa
' by moving the long at address vp
' to the variable lng and then comparing it to sp.
CopyMemory lng, ByVal vp, 4 Debug.Print lng = sp
' To see that sp contains address of char array,
' copy from that address to a byte array and print
' the byte array. We should get "help" in Unicode.
CopyMemory b(1), ByVal sp, 10
For i = 1 To 10
Debug.Print b(i);
Next
FIGURE 3:Verify that sp =
xxxx and vp = aaaa.
A sample of the output is:
StrPtr:1836612
VarPtr:1243988
True
104 0 101 0 108 0 112 0 0 0
Swapping Strings Using CopyMemory
Now we have the necessary background to understand our
string sorting application. As mentioned earlier, the only difference
between the slow and quick sorting methods lies in how they handle string
swapping. The slow method uses the obvious approach:
' Swap strings s and t.
temp = s
s = t
t = temp
Unfortunately, for each string assignment:
str1 = str2
VB must make a copy of the entire Unicode array pointed
to by the BSTR str2 and assign
a BSTR str1 that points to the
copied array. It's clear that this process is very time consuming, and
depends on the length of the strings.
On the other hand, the quick method uses the swapping
code:
CopyMemory lng,
ByVal VarPtr(s), 4
CopyMemory
ByVal VarPtr(s), ByVal
VarPtr(t), 4
CopyMemory
ByVal VarPtr(t), lng, 4
This code simply swaps the contents of the BSTR's
s and t. That is, it swaps the addresses of the
corresponding Unicode arrays. In this way, we only need to swap 4-byte
addresses, no matter how long the Unicode arrays may be.
In the first line of code, the long variable lng
will receive the address of the first Unicode array. Because this address
is stored in the BSTR s, we pass the address of s by value.
Actually, you might think that the code:
CopyMemory lng, s, 4
would also work, but it doesn't. In brief, the reason is
that when VB sees that a string is being passed to an API function, it
makes a copy of the array in ANSI format (rather than Unicode) and passes
the ANSI version to the function. (For a more detailed discussion of this
issue, please see my book.)
The remaining two lines of code complete the swapping.
Conclusion
It is usually said that the Win32 API can be useful to
VB/VBA programmers who want to delve more deeply into the Windows
operating system and do things that cannot be done with VB alone. Here is
an example, however, of a case where VB can do the job, but the Win32 API
can do it much, much better.
Dr Steven Roman is an Emeritus Professor at the
California State University, Fullerton. He has written 35 books, including
Access Database Design & Programming [1999], Writing Word Macros
[1999], Writing Excel Macros [1999], Developing Visual Basic Add-Ins
[1999], and Win32 API Programming with Visual Basic [1999], all published
by O'Reilly & Associates, and Concepts of Object-Oriented Programming
with Visual Basic [1997] and Understanding Personal Computer Hardware
[1998], both published by Springer-Verlag. He has written a special object
library browser, called Object Model Browser, that displays a structured
view of object libraries, rather than the usual flat view. For more
information about Dr Roman and his books, articles, and software, please
visit his Web site at http://www.romanpress.com/.