Chapter 6: Strings

 

Steven Roman

O'Reilly & Associates, Inc.

Reproduced from Win32 API Programming with Visual Basic, by Steven Roman, by permission of O'Reilly & Associates, Inc. ISBN 1-56592-631-5. Copyright 1999, O'Reilly & Associates. All rights reserved. For further information, please contact nuts@oreilly.com, or call 1-800-998-9938, or visit their Web site at http://www.oreilly.com.

Buy this book

Table of Contents

The BSTR
C-Style LPSTR and LPWSTR Strings
String Terminology
Tools for Exploring Strings
Preparing the BSTR
The Returned BSTR
What to Call
The Whole String Trip
A Unicode Entry Point Example
Passing Strings to the Win32 API
Dealing with IN Parameters
Dealing with OUT Parameters
What Happened to My Pointer?
Strings and Byte Arrays
Getting the Address of a Variable of User-Defined Type

The subject of strings can be quite confusing, but this confusion tends to disappear with some careful attention to detail (as is usually the case). The main problem is that the term string is used in at least two different ways in Microsoft® Visual Basic® ("VB")!

Just what is a string in Visual Basic? According to the VB documentation, it is:

A data type consisting of a sequence of contiguous characters that represent the characters themselves rather than their numeric values.

Huh?

It seems to me that Microsoft is trying to say that the underlying set for the String data type is the set of finite-length sequences of characters. For Visual Basic, all characters are represented by 2-byte Unicode integers. Put another way, VB uses Unicode to represent the characters in a string. For instance, the ASCII representation for the character h is &H68, so the Unicode representation is &H0068, appearing in memory as 68 00.

Thus, the string "help" is represented as:

00 68 00 65 00 6C 00 70

Note, however, that because words are written with their bytes reversed in memory, the string "help" appears in memory as:

68 00 65 00 6C 00 70 00

This is fine, but it is definitely not how we should think of strings in VB programming. To avoid any possibility of ambiguity, we will refer to this type of object as a Unicode character array which is, after all, precisely what it is! This also helps distinguish it from an ANSI character array, that is, an array of characters represented using single-byte ANSI character codes.

Here is the key to understanding strings: when we write the code:

Dim str As String
str = "help"

We are not defining a Unicode character array per se. We are defining a member of a data type called BSTR, which is short for Basic String. A BSTR is, in fact, a pointer to a null-terminated Unicode character array that is preceded by a 4-byte length field. We had better elaborate on this.

The BSTR

Actually, the VB string data type defined by:

Dim str As String

underwent a radical change between versions 3 and 4 of Visual Basic, due in part to an effort to make the type more compatible with the Win32 operating system.

Just for comparison (and to show that we are more fortunate now), Figure 6-1 shows the format for the VB string data type under Visual Basic 3, called an HLSTR (High-Level String).

Figure 6-1. The high-level string format (HLSTR) used by VB3

The rather complex HLSTR format starts with a pointer to a string descriptor, which contains the 2-byte length of the string along with another pointer to the character array, which is in ANSI format (one byte per character).

With respect to the Win32 API, this string format is a nightmare. Beginning with Visual Basic 4, the VB string data type changed. The new data type, called a BSTR, is shown in Figure 6-2.

Figure 6-2. A BSTR

This data type is actually defined in the OLE 2.0 specifications; that is, it is part of Microsoft's ActiveX specification.

There are several important things to note about the BSTR data type.

  • The BSTR is the actual pointer variable. It has size 32 bits, like all pointers, and points to a Unicode character array. Thus, a Unicode character array and a BSTR are not the same thing. It is correct to refer to a BSTR as a string (or VB string) but, unfortunately, the Unicode character array is also often called a string! Hence, we will not refer to a BSTR simply as a string--we will refer to it by its unequivocal name--BSTR.
  • The Unicode character array that is pointed to by a BSTR must be preceded by a 4-byte length field and terminated by a single null 2-byte character (ANSI = 0).
  • There may be additional null characters anywhere within the Unicode character array, so we cannot rely on a null character to signal the end of the character array. This is why the length field is vital.
  • Again, the pointer points to the beginning of the character array, not to the 4-byte length field that precedes the array. As we will see, this is critical to interpreting a BSTR as a VC++-style string.
  • The length field contains the number of bytes (not the number of characters) in the character array, excluding the terminating null bytes. Since the array is Unicode, the character count is one-half the byte count.

We should emphasize that an embedded null Unicode character is a 16-bit 0, not an 8-bit 0. Watch out for this when testing for null characters in Unicode arrays.

Note that it is common practice to speak of "the BSTR `help'" or to say that a BSTR may contain embedded null characters when what is really being referred to is the character array pointed to by the BSTR.

Because a BSTR may contain embedded null characters, the terminating null is not of much use, at least as far as VB is concerned. However, its presence is extremely important for Win32. The reason is that the Unicode version of a Win32 string (denoted by LPWSTR) is defined as a pointer to a null-terminated Unicode character array (which, by the way, is not allowed to contain embedded null characters).

This makes it clear why BSTR's are null terminated. A BSTR with no embedded nulls is also an LPWSTR. We will discuss C++ strings in a moment.

Let us emphasize that code such as:

Dim str As String
str = "help"

means that str is the name of a BSTR, not a Unicode character array. In other words, str is the name of the variable that holds the address xxxx, as shown in Figure 6-2.

Here is a brief experiment we can do to test the fact that a VB string is a pointer to a character array and not a character array. Consider the following code, which defines a structure whose members are strings:

Private Type utTest
   astring As String
   bstring As String
End Type

Dim uTest As utTest
Dim s as String

s = "testing"
uTest.astring = "testing"
uTest.bstring = "testing"

Debug.Print Len(s)
Debug.Print Len(uTest)

The output from this code is:

7
8

In the case of the string variable s, the Len function reports the length of the character array; in this case there are 7 characters in the character array `testing'. However, in the case of the structure variable uTest, the Len function actually reports the length of the structure (in bytes). The return value of 8 clearly indicates that each of the two BSTRs has length 4. This is because a BSTR is a pointer!

C-Style LPSTR and LPWSTR Strings

VC++ and Win32 use the string data types LPSTR and LPWSTR.

An LPSTR string is defined as a pointer to a null-terminated ANSI character array. However, because the only way that we can tell when an LPSTR string ends is by the location of the terminating null, LPSTRs are not allowed to contain embedded null characters. Similarly, an LPWSTR is a pointer to a null-terminated Unicode character set with no embedded nulls. (The W in LPWSTR stands for Wide, which is Microsoft's way of saying Unicode.) These string data types are pictured in Figure 6-3.

Fig. 6-3. LPSTR and LPWSTR data types

We will also encounter the data types LPCSTR and LPCWSTR. The embedded C stands for constant and simply means that an instance of this data type cannot (and will not) be changed by any API function that uses this type. Otherwise, an LPCSTR is identical to an LPSTR, and, similarly, an LPCWSTR is identical to an LPWSTR.

Finally, the generic LPTSTR data type is used in conditional compilation, just like the TCHAR data type, to cover both ANSI and Unicode in a single source code. Here are the declarations:

#ifdef  UNICODE

typedef LPWSTR LPTSTR;      // LPTSTR is synonym for LPWSTR under Unicode
typedef LPCWSTR LPCTSTR;    // LPCTSTR is synonym for LPCWSTR under Unicode

#else   

typedef LPSTR LPTSTR;       // LPTSTR is synonym for LPSTR under ANSI
typedef LPCSTR LPCTSTR;     // LPTCSTR is synonym for LPCSTR under ANSI

#endif

Figure 6-4 summarizes the possibilities.

Figure 6-4. The LP... STR mess.

Thus, for instance, LPCTSTR is read long pointer to a constant generic string.

String Terminology

To avoid any possible confusion, we will use the terms BSTR, Unicode character array, and ANSI character array. When we do use the term string, we will modify it by writing VB string (meaning BSTR) or VC++ string (meaning LP??STR). We will avoid using the term string without some modification.

However, in translating VB documentation, you will see the unqualified term string used quite often. It falls to you to determine whether the reference is to a BSTR or a character array.

Tools for Exploring Strings

If we are going to do some exploring, then we will need some tools. We have already discussed the CopyMemory API function. Let us take a look at some additional tools for dealing with strings.

The Visual Basic StrConv Function

The StrConv function is used to convert character arrays from one format to another. Its syntax is:

StrConv(string, conversion, LCID)

where string is a BSTR, conversion is a constant (described later), and LCID is an optional locale identifier (which we will ignore).

Among the possible constants, and the only ones that interest us, are:

  • VbUnicode (which should have been vbToUnicode)
  • vbFromUnicode

These constants convert the character array of the BSTR between Unicode and ANSI.

But now we have a problem (which really should have been addressed by the official documentation). There is no such thing as an ANSI BSTR. By definition, the character array pointed to by a BSTR is a Unicode array.

However, we can image what an ANSI BSTR would be--just replace the Unicode character array in Figure 6-2 with an ANSI array. We will use the term ABSTR to stand for ANSI BSTR, but you should keep in mind that this term will not be officially recognized outside of this book.

We can now say that there are two legal forms for StrConv :

StrConv(aBSTR, vbFromUnicode)    ' returns an ABSTR
StrConv(anABSTR, vbUnicode)      ' returns a BSTR

The irony is that, in the first case, VB doesn't understand the return value of its own function! To see this, consider the following code:

s = "help"
Debug.Print s
Debug.Print StrConv(s, vbFromUnicode)

The result is:

help
??

because VB tries to interpret the ABSTR as a BSTR. Look at the following code:

s = "h" & vbNullChar & "e" & vbNullChar & "l" & vbNullChar & "p" & 
   vbNullChar
Debug.Print s
Debug.Print StrConv(s, vbFromUnicode)

The output is:

h e l p 
help

Here we have tricked VB by padding the original Unicode character array so that when StrConv does its conversion, the result is an ABSTR that happens to have a legitimate interpretation as a BSTR!

This shows that the StrConv function doesn't really understand or care about BSTRs and ABSTRs. It assumes that whatever you feed it is a pointer to a character array and it blindly does its conversion on that array. As we will see, many other string functions behave similarly. That is, they can take a BSTR or an ABSTR--to them it is just a pointer to some null-terminated array of bytes.

The Len and LenB Functions

Visual Basic has two string-length functions: Len and LenB. Each takes a BSTR or ABSTR and returns a long. The following code tells all.

s = "help"
Debug.Print Len(s), LenB(s)
Debug.Print Len(StrConv(s, vbFromUnicode)), LenB(StrConv(s, vbFromUnicode))

The output is:

 4             8 
 2             4 

showing that Len returns the number of characters and LenB returns the number of bytes in the BSTR.

The Chr, ChrB, and ChrW Functions

These three functions have different input ranges and produce different outputs. These differences can seem confusing at first—you may have to read the definitions a few times:

  • Chr takes a long value x in the range 0 to 255 and returns a BSTR of length 1. This one character pointed to by the BSTR has Unicode code equal to x. (In this case, the Unicode and ANSI values are actually equal.) Note that, according to the latest documentation, there is no difference between Chr and Chr$.
  • ChrB takes a long value x in the range 0 to 255 and returns an ABSTR of length 1 (byte). This one byte pointed to by the ABSTR has ANSI code equal to x.
  • ChrW takes a long value x in the range 0 to 65535 and returns a BSTR of length 1. This one character pointed to by the BSTR has Unicode code equal to x.

The Asc, AscB, and AscW Functions

These functions are the inverses of the Chr functions. For instance, AscB takes a single character (byte) ABSTR and returns a byte equal to the character's ANSI code. To see that the return type is a byte, try running the code:

Debug.Print VarType(AscB("h")) = vbByte

(The output is True.) It may appear that AscB will accept a BSTR as input, but in reality, it just takes the first byte in the BSTR.

The Asc function takes a BSTR (but not an ABSTR) and returns an integer equal to the character's Unicode code.

Null Strings and Null Characters

To its credit, VB does allow null BSTRs. The code:

Dim s As String
s = vbNullString
Debug.Print VarPtr(s)
Debug.Print StrPtr(s)

produces the following output (your address may vary, of course):

 1243948 
 0 

This shows that a null BSTR is simply a pointer whose contents are 0. (We will discuss the meaning of StrPtr in a moment.) In Win32 and VC++, this is called a null pointer. You can probably see the difference between vbNullString and vbNullChar at this point. vbNullChar is not a pointer--it is a Unicode character whose value is 0. Thus, at the bit level, the values vbNullString and vbNullChar are identical. However, they are interpreted differently, so they are in fact different.

It is also important not to confuse a null BSTR with an empty BSTR, usually denoted by a pair of adjacent quotation marks:

Dim s As String
Dim t As String
s = vbNullString
t = ""

Unlike a null string, the empty BSTR t is a pointer that points to some nonzero memory address. At that address resides the terminating null character for the empty BSTR, and the preceeding length field also contains a 0.

VarPtr and StrPtr

We have discussed the function VarPtr already, but not in connection with strings. The functions VarPtr and StrPtr are not documented by Microsoft, but they can be very useful, so we will use them often, particularly the VarPtr function.

If var is a variable, we have seen that:

VarPtr(var)

is the address of that variable, returned as a long. If str is a BSTR variable, then:

StrPtr(str)

gives the contents of the BSTR! These contents are the address of the Unicode character array pointed to by the BSTR.

Let us elaborate. Figure 6-5 shows a BSTR.

Figure 6-5. A BSTR

The code for this figure is simply:

Dim str As String
str = "help"

Note that the variable str is located at address aaaa and the character array begins at address xxxx, which is the contents of the pointer variable str.

To see that:

VarPtr = aaaa
StrPtr = xxxx

just run the following code:

Dim lng As Long
Dim i As Integer
Dim s As String
Dim b(1 To 10) As Byte
Dim sp As Long, vp As Long

s = "help"

sp = StrPtr(s)
Debug.Print "StrPtr:" & sp

vp = VarPtr(s)
Debug.Print "VarPtr:" & vp

' Verify that sp = xxxx and vp = aaaa
' by moving the long pointed to by vp (which is xxxx)
' to the variable lng and then comparing it to sp
CopyMemory lng, ByVal vp, 4
Debug.Print lng = sp

' To see that sp contains address of char array,
' copy from that address to a byte array and print
' the byte array. We should get "help".
CopyMemory b(1), ByVal sp, 10
For i = 1 To 10
   Debug.Print b(i);
Next

The output is:

StrPtr:1836612
VarPtr:1243988
True
 104  0  101  0  108  0  112  0  0  0 

This shows again that the character array in a BSTR is indeed in Unicode format. Also, by adding the following lines:

Dim ct As Long
CopyMemory ct, ByVal sp - 4, 4
Debug.Print "Length field: " & ct

just after the lines:

sp = StrPtr(s)
Debug.Print "StrPtr:" & sp

we get the output:

Length field: 8

which shows that the length field does indeed hold the byte count and not the character count.

As mentioned earlier, if you do not like to use undocumented functions (and who can blame you for that?), you can use the function rpiVarPtr in the rpiAPI.dll library on the accompanying CD. You can also simulate StrPtr as follows:

' Simulate StrPtr
Dim lng As Long
CopyMemory lng, ByVal VarPtr(s), 4
' lng = StrPtr(s)

As we have seen, this code copies the contents of the BSTR pointer, which is the value of StrPtr, to a long variable lng.

String Conversion by VB

Now we come to the strange story on how VB handles passing BSTRs to external DLL functions. It doesn't.

As we have seen, VB uses Unicode internally; that is, BSTRs use the Unicode format. Windows NT also uses Unicode as its native character code. However, Windows 9x does not support Unicode (with some exceptions). Let's examine the path that is taken by a BSTR argument to an external DLL function (Win32 API or otherwise).

In an effort to be compatible with Windows 95, VB always (even when running under Windows NT) creates an ABSTR, converts the BSTR's Unicode character array to ANSI, and places the converted characters in the ABSTR's character array. VB then passes the ABSTR to the external function. As we will see, this is true even when calling the Unicode entry points under Windows NT.

Preparing the BSTR

Before sending a BSTR to an external DLL function, VB creates a new ABSTR string at a location different from the original BSTR. It then passes that ABSTR to the DLL function. This duplication/translation process is pictured in Figure 6-6.

Figure 6-6. Translating a BSTR to an ABSTR

When we first introduced the CopyMemory function, we used it to demonstrate this Unicode-to-ANSI translation process. But let's do that again in a different way. The rpiAPI.dll library includes a function called rpiBSTRtoByteArray, whose purpose is to return the values of VarPtr and StrPtr on the string that is actually passed to a DLL function. The VB declaration is as follows.

Public Declare Function rpiBSTRtoByteArray Lib "???\rpiAPI.dll" ( _
   ByRef pBSTR As String, _
   ByRef bArray As Byte, _
   pVarPtr As Long, _
   pStrPtr As Long
) As Long

For its first parameter, this function takes as input a BSTR, which is passed by reference. Hence, the address of the BSTR is passed, not the address of the character array. (Thus, we are passing a pointer to a pointer to the character array.)

The second parameter should be set to the first byte of a byte array that the caller must allocate with enough space to accommodate all of the bytes of the BSTR. Failing to do so will definitely crash the application.

The last two parameters are OUT parameters, meaning that the caller just declares a pair of long variables, which the function will fill in. The pVarPtr variable will be filled by the address of the BSTR, and the pStrPtr will be filled by the contents of the BSTR (which, as we know, is the address of the character array) as the DLL function sees it. Thus, we will be able to get a glimpse of what the DLL is actually passed by VB!

The function returns the length (in bytes) of the original string. Finally, in order to convince ourselves that everything is working as it should, the function changes the first character of the original string to an X.

Here is a test run (the function VBGetTarget was discussed in Chapter 3, API Declarations, under the section "Implementing Indirection in Visual Basic"):

Sub BSTRTest()

Dim i As Integer
Dim sString As String
Dim bBuf(1 To 10) As Byte
Dim pVarPtr As Long
Dim pStrPtr As Long
Dim bTarget As Byte
Dim lTarget As Long

sString = "help"

' Print the BSTR's initial address and contents
Debug.Print "VarPtr:" & VarPtr(sString)
Debug.Print "StrPtr:" & StrPtr(sString)

' Call the external function
Debug.Print "Function called. Return value:" & _
   rpiBSTRToByteArray(sString, bBuf(1), pVarPtr, pStrPtr)

' Print what the DLL sees, which is the temp ABSTR
' Its address and contents are:
Debug.Print "Address of temp ABSTR as DLL sees it: " & pVarPtr
Debug.Print "Contents of temp ABSTR as DLL sees it: " & pStrPtr

' Print the buffer pointed to by temp ABSTR
Debug.Print "Temp character array: ";
For i = 1 To 10
   Debug.Print bBuf(i);
Next
Debug.Print

' Now that we have returned from the DLL function call
' check status of the passed string buffer -- it has been deallocated
VBGetTarget lTarget, pVarPtr, 4
Debug.Print "Contents of temp ABSTR after DLL returns: " & lTarget

' Check the string for altered character
Debug.Print "BSTR is now: " & sString

End Sub

Here is the output:

VarPtr:1242736
StrPtr:2307556
Function called. Return value:4
Address of temp ABSTR as DLL sees it: 1242688
Contents of temp ABSTR as DLL sees it: 1850860
Temp character array:  104  101  108  112  0  0  0  0  0  0 
Contents of temp ABSTR after DLL returns: 0
BSTR is now: Xelp

This code first prints the address (VarPtr ) and the contents (StrPtr ) of the original BSTR as VB sees it. It then calls the function, which fills in the byte buffer and the OUT parameters. Next, the buffer and OUT parameters are printed. The important point to note is that the address and contents of the "string," as returned by the DLL function, are different than the original values, which indicates that VB has passed a different object to the DLL. In fact, the buffer is in ANSI format; that is, the object is an ABSTR.

Next, we print the contents of the passed ABSTR, when the DLL has returned. This is 0, indicating that the temporary ABSTR has been deallocated. (It is tempting but not correct to say that the ABSTR is now the null string--in fact the ABSTR no longer exists!)

Finally, note that I am running this code under Windows NT--the translation still takes place even though Windows NT supports Unicode.

The Returned BSTR

It is not uncommon for a BSTR that is passed to a DLL function to be altered and returned to the caller. In fact, this may be the whole purpose of the function.

Figure 6-7 shows the situation. After the ABSTR is altered by the DLL function, the translation process is reversed. Thus, the original BSTR str will now point to a Unicode character array with the output of the API function. Note, however, that the character array may not be returned to its original location. For instance, as we will see, the API function GetWindowText seems to move the array. The point is that we cannot rely on the contents of the BSTR to remain unchanged, only its address. This will prove to be an important issue in our discussions later in the chapter.

Figure 6-7. The return translation

What to Call

Since Windows 9x does not implement Unicode API entry points, for compatibility reasons you will probably want to call only ANSI API entry points in your applications. For instance, you should call SendMessageA, not SendMessageW. (Nonetheless, we will do a Unicode entry point example a little later.)

The Whole String Trip

Let's take a look at the entire round trip that a BSTR takes when passed to an external DLL.

Assume that we call a DLL function that takes a string parameter and modifies that string for return. The CharUpper API function is a good example. This function does an in-place conversion of each character in the string to uppercase. The VB declaration for the ANSI version is as follows.

Declare Function CharUpperA Lib "user32" ( _
   ByVal lpsz As String _
) As Long

Under Windows 9x

Under Windows 9x, the following happens to the string argument. Remember that it is the character array pointers that are being passed back and forth, not the actual character arrays:

  • The BSTR lpsz is duplicated as an ABSTR by VB, and the duplicate is passed to the function CharUpperA, which treats it as an LPSTR.
  • This function processes the LPSTR and passes the result to VB.
  • VB translates the LPSTR back to a BSTR.

Note that since most API functions (in this case CharUpper) treat BSTRs as LPSTRs, that is, they ignore the length field, we cannot be certain that this field will always be accurate. For CharUpper, the length is not changed, so it should remain correct, but other API functions could conceivably change the length of the character array. Unless written specifically for the BSTR format, the function will just null-terminate the new character array, without updating the length field. Thus, we cannot rely on the length field to be valid.

Under Windows NT

Under Windows NT, our string argument will go through the following machinations:

  1. The string is translated from a BSTR to an ABSTR by VB and passed to the function CharUpperA, which treats it as an LPSTR.
  2. This function translates the LPSTR to an LPWSTR and passes the LPWSTR to the Unicode entry point CharUpperW.
  3. The Unicode function CharUpperW processes the LPWSTR and produces an LPWSTR for output, returning it to CharUpperA.
  4. The function CharUpperA translates the LPWSTR back to an LPSTR and passes it to VB, which thinks of it as an ABSTR.
  5. VB translates the ABSTR back to a BSTR!

A Unicode Entry Point Example

Under Windows NT, we can call the Unicode entry points and expect to get something meaningful in return. However, VB still makes the BSTR-to-ABSTR translations, and we must counteract this translation. Here is the ANSI version of a call to CharUpperA:

s = "d:\temp"
Debug.Print s
CharUpperA s
Debug.Print s

Under both Windows 9x and Windows NT, the outcome is as expected:

d:\temp
D:\TEMP

Under Windows NT, we might first attempt the Unicode version thusly:

s = "d:\temp"
Debug.Print s
CharUpperW s
Debug.Print s

but the result is:

d:\temp
d:\temp

Clearly, something is wrong. Incidentally, here is what the documentation says about errors in the CharUpper function.

"There is no indication of success or failure. Failure is rare. There is no extended error information for this function; do no [sic] call GetLastError."

Nonetheless, we know that the problem is that VB is making the BSTR-to-ABSTR translation. So let us try the following code:

s = "d:\temp"
Debug.Print s
s = StrConv(s, vbUnicode)
Debug.Print s
CharUpperW s
Debug.Print s
s = StrConv(s, vbFromUnicode)
Debug.Print s

The output is:

d:\temp
d : \ t e m p
D : \ T E M P 
D:\TEMP

What we are doing here is compensating for the shrinking of our BSTR to an ABSTR by expanding it first. Indeed, the first call to the StrConv function simply takes each byte in its operand and expands it to Unicode format. It doesn't know or care that the string is already in Unicode format.

Consider, for instance, the first Unicode character d. Its Unicode code is 0064 (in hex), which appears in memory as 64 00. Each byte is translated by StrConv to Unicode, which results in 0064 0000 (appearing in memory as 64 00 00 00). The effect is to put a null character between each Unicode character in the original Unicode string.

Now, in preparation for passing the string to CharUpperW, VB takes this expanded string and converts it from Unicode to ANSI, thus returning it to its original Unicode state. At this point, CharUpperW can make sense of it and do the conversion to uppercase. Once the converting string returns from CharUpperW, VB "translates" the result to Unicode, thus expanding it with embedded null characters. We must convert the result to ANSI to remove the supererogatory padding.

Passing Strings to the Win32 API

We can now discuss some of the practical aspects of string passing.

ByVal Versus ByRef

Some authors like to say that the ByVal keyword is overloaded for strings, meaning that it takes on a different meaning when applied to strings than when applied to other variables. Frankly, I don't see it. Writing:

ByVal str As String

tells VB to pass the contents of the BSTR (actually the ABSTR), which is the pointer to the character array. Thus, ByVal is acting normally--it just happens that the content of the BSTR is a pointer to another object, so this simulates a pass by reference. Similarly:

ByRef str As String

passes the address of the BSTR, as expected.

IN and OUT String Parameters

There are many API functions that require and/or return strings. Almost all of these functions deal with C-style strings, that is, LPSTRs or LPWSTRs. Some OLE-related functions do require BSTRs. By way of example, the following function is part of the Microsoft Web Publishing API. Note that it uses BSTRs. (Note also that the declaration is kind enough to tell us which parameters are IN parameters and which are OUT parameters. This is all too rare.)

HRESULT WpPostFile(
    [in]        LONG      hWnd
    [in]        BSTR      bstrLocalPath
    [in, out]   LONG *    plSiteNameBufLen
    [in, out]   BSTR      bstrSiteName
    [in, out]   LONG *    plDestURLBufLen
    [in, out]   BSTR      bstrDestURL
    [in]        LONG      lFlags
    [out, retval]   LONG *   plRetCode
    );

In general, API functions that use strings can do so in three ways:

  • They can require a string as input in an IN parameter
  • They can return a string as output in an OUT parameter
  • They can do both, either in the same parameter or in separate parameters

To illustrate, Example 6-1 shows three API declarations.

Example 6-1: Three Example Declarations

// IN parameter example
HWND FindWindow(
  LPCTSTR lpClassName,  // pointer to class name
  LPCTSTR lpWindowName  // pointer to window name
);

// OUT parameter example
int GetWindowText(
   HWND hWnd,           // handle to window or control with text 
   LPTSTR lpString,     // address of buffer for text 
   int nMaxCount           // maximum number of characters to copy
);

// IN/OUT parameter example
LPTSTR CharUpper(
  LPTSTR lpsz           // single character or pointer to string
);

The FindWindow function returns a handle to a top-level window whose class name and/or window name matches specified strings. In this case, both parameters are IN parameters.

The GetWindowText function returns the text of a window's title bar in an OUT parameter lpString. It also returns the number of characters in the title as its return value.

The CharUpper function converts either a string or a single character to uppercase. When the argument is a string, the function converts the characters in the character array in place, that is, the parameter is IN/OUT.

How shall we convert these function declarations to VB?

We could simply replace each C-style string with a VB-style:

ByVal str As String 

declaration, which, as we know, is a BSTR data type. However, there are some caveats.

Dealing with IN Parameters

The first declaration in Example 6-1:

HWND FindWindow(
  LPCTSTR lpClassName,  // pointer to class name
  LPCTSTR lpWindowName  // pointer to window name
);

might be translated as follows:

Declare Function FindWindow Lib "user32" Alias "FindWindowA" ( _
   ByVal lpClassName As String, _
   ByVal lpWindowName As String _
) As Long

This works just fine. Since the FindWindow function does not alter the contents of the parameters (note the C in LPCTSTR), the BSTRs will be treated by Win32 as LPSTRs, which they are. In general, when dealing with a constant LPSTR, we can use a BSTR.

We should also note that FindWindow allows one (but not both) of these string parameters to be set, with the remaining parameter set to a null. In Win32, this parameter that the programmer chooses not to supply is represented by a null pointer--that is, a pointer that contains the value 0. Of course, 0 is not a valid address, so a null pointer is a very special type of pointer and is treated in this way by Win32.

Fortunately, VB has the vbNullString keyword, which is a null BSTR (and so also a null LPWSTR). It can be used whenever a null string is desired (or required). Actually, this is not as trivial an issue as it might seem at first. Before the introduction of the vbNullString into Visual Basic (I think with VB 4), we would need to do something like:

FindWindow(0&,. . .)

to simulate a null string for the first parameter. The problem is that VB would issue a type mismatch error, because a long 0 is not a string. The solution was to declare three separate aliases just to handle the two extra cases of null parameters. With the introduction of vbNullString, this annoyance went away.

To illustrate, in order to get the handle of the window with title "Microsoft Word - API.doc," we can write:

Dim sTitle As String
Dim hnd As Long
sTitle = "Microsoft Word - API.doc"
hnd = FindWindow(vbNullString, sTitle)

or more simply:

Dim hnd As Long
hnd = FindWindow(vbNullString, "Microsoft Word - API.doc")

Dealing with OUT Parameters

Now consider the second declaration in Example 6-1:

int GetWindowText(
   HWND hWnd,       // handle to window or control with text 
   LPTSTR lpString, // address of buffer for text 
   int nMaxCount    // maximum number of characters to copy
);

This might be translated to VB as follows:

Declare Function GetWindowText Lib "user32" Alias "GetWindowTextA" ( _
   ByVal hwnd As Long, _
   ByVal lpString As String, _
   ByVal cch As Long _
) As Long

An HWND is a long value, as is a C-style int (integer). In this case, the string parameter is an OUT parameter, meaning that the function is going to fill this string with something useful--in this case, the title of the window whose handle is in the hwnd parameter.

Here is an example of a call to this function:

Sub GetWindowTitle()

Dim sText As String
Dim hnd As Long
Dim cTitle As Integer
Dim lngS As Long, lngV As Long

' Allocate string buffer
sText = String$(256, vbNullChar)

' Save the BSTR and Unicode character array locations
lngV = VarPtr(sText)
lngS = StrPtr(sText)

' Search for window with a given class
hnd = FindWindow("ThunderRT5Form", vbNullString)

' If window found, get title
If hnd > 0 Then
   cTitle = GetWindowText(hnd, sText, 255)
   sText = Left$(sText, cTitle)
   Debug.Print sText
   ' Compare the BSTR and character array locations
   ' to look for changes
   Debug.Print VarPtr(sText), lngV
   Debug.Print StrPtr(sText), lngS
Else
   Debug.Print "No window with this class name.", vbInformation
End If

End Sub
The output of one run is:
RunHelp - Unregistered Copy  -  Monday, December 7, 1998     10:11:53 AM
 1243480       1243480
 2165764       2012076

(Don't worry--this unregistered program is my own.)

We first allocate a string buffer for the window title. We will discuss this important point further in a moment. Then we use FindWindow to search for a window with class name ThunderRT5Form--a VB5 runtime form. If such a window is found, its handle is returned in the hnd parameter. We can then call GetWindow-Text, passing it hnd as well as our text buffer sText and its size. Since the GetWindowText function returns the number of characters placed in the buffer, not including the terminating null, that is, the number of characters in the window title, we can use the Left function to extract just the title from the string buffer.

Note also that we have saved both the BSTR address (in lngV) and the character array address (in lngS ), so that we can compare these values to the same values after calling GetWindowText. Lo and behold, the BSTR has not moved, but its contents have changed, that is, the character array has moved, as we discussed earlier.

Incidentally, since the returned string is null terminated and contains no embedded nulls, the following function also extracts the portion of the buffer that contains the title. This little utility is generic, and I use it often (in this book as well as in my programs).

Public Function Trim0(sName As String) As String
   ' Right trim string at first null.
   Dim x As Integer
   x = InStr(sName, vbNullChar)
   If x > 0 Then Trim0 = Left$(sName, x - 1) Else Trim0 = sName
End Function

Getting back to the issue at hand, it is important to understand that, when OUT string parameters are involved, it is almost always our responsibility to set up a string buffer, that is, a BSTR that has enough space allocated to hold the data that will be placed in it by the API function. Most Win32 API functions do not create strings--they merely fill strings created by the caller. It is not enough simply to declare:

Dim sText As String

We must allocate space, as in:

sText = String$(256, vbNullChar)

Thus, it is important to remember:

When dealing with OUT string parameters, be sure to allocate a string buffer of sufficient size.

Note that in some cases, such as GetWindowText, the function provides an IN parameter for specifying the size of the buffer. This is actually a courtesy to us, in the sense that the function agrees not to place more characters in the buffer than we specify as the size of the buffer. (I often give the buffer an extra character that the function doesn't know about. Usually, the function includes the terminating null in its reckoning, but why take chances?)

Note that there are other cases in which no such courtesy is extended, so we must be careful.

Consider the case of SendMessage, for example. Here is part of what the Win32 documentation says about the LB_GETTEXT message, which can be used to retrieve the text of an item in a list box.

An application sends an LB_GETTEXT message to retrieve a string from a listbox.
wParam = (WPARAM) index;                // item index [0-based]
lParam = (LPARAM) (LPCTSTR) lpszBuffer; // address of buffer 

[The parameter lpszBuffer is a] pointer to the buffer that will receive the string. The buffer must have sufficient space for the string and a terminating null character. An LB_GETTEXTLEN message can be sent before the LB_GETTEXT message to retrieve the length, in characters, of the string.

Thus, in this case, there is no IN parameter to act as a safety net. If we fail to allocate sufficient space in the buffer, the function will write over the end of our buffer, into unknown memory. If we are lucky, this will crash the program. If we are not lucky, it will overwrite some other data, possibly resulting in logical errors in our program, or crashing a client's program!

However, in this case Windows is not completely devoid of compassion. It does provide the LB_GETTEXTLEN message for us to use to first retrieve the length of the item in question. With this value, we can allocate a sufficiently capacious buffer. Example 6-2 shows some sample code. This code extracts the items from a listbox (which might belong to some other application) and places them in our listbox lstMain. We will expand this example considerably in Chapter 16, Windows Messages. Note the use of two different forms of the SendMessage function.

Example 6-2: Using LB_GETTEXT

Public Sub ExtractFromListBox(hControl As Long)

Dim cItems As Integer
Dim i As Integer
Dim sBuf As String
Dim cBuf As Long
Dim lResp As Long

' Get item count from control
cItems = SendMessageByLong(hControl, LB_GETCOUNT, 0&, 0&)

If cItems <= 0 Then Exit Sub

' Put items into list box
For i = 0 To cItems - 1
   
   ' Get length of item
   cBuf = SendMessageByString(hControl, LB_GETTEXTLEN, CLng(i), vbNullString)
   
   ' Allocate buffer to hold item
   sBuf = String$(cBuf + 1, " ")
   
   ' Send message to get item
   lResp = SendMessageByString(hControl, LB_GETTEXT, CLng(i), sBuf)
   
   ' Add item to local list box
   If lResp > 0 Then
      Form1.lstMain.AddItem Left$(sBuf, lResp)
   End If

Next i

Form1.lstMain.Refresh
 
End Sub

An IN/OUT Parameter Example--Watching Out for As Any

Consider now the third and final function in Example 6-1:

PTSTR CharUpper(
  LPTSTR lpsz   // single character or pointer to string
);

One problem here is that, despite the declaration of lpsz as an LPTSTR, the function allows the parameter to be filled with a non-LPTSTR. To wit, the documentation states that the lpsz parameter is a:

Pointer to a null-terminated string or specifies a single character. If the high-order word of this parameter is zero, the low-order word must contain a single character to be converted.

For use with string input, we can translate this into VB as:

Declare Function CharUpperForString Lib "user32" Alias "CharUpperA" ( _
   ByVal lpsz As String _
) As Long

This will generally work, as in:

' Convert string
str = "help"
Debug.Print StrPtr(str)
Debug.Print CharUpperForString(str)
Debug.Print str

whose output is:

 1896580 
 1980916 
HELP

Let us pause for a moment to inspect this output. The CharUpper documentation also states:

"If the operand is a character string, the function returns a pointer to the converted string. Since the string is converted in place, the return value is equal to lpsz."

On the other hand, the two addresses StrPtr(s) (which is the address of the character array) and CharUpper(s) seem to be different. But remember the BSTR-to-ABSTR translation issue. Our string str undergoes a translation to a temporary ABSTR string at another location. This string is passed to the CharUpper function, which then changes the string (uppercases it) and also returns the location of the ABSTR string. Now, VB translates the ABSTR back to our BSTR, but it knows nothing about the fact that the return value represents the location of the temporary ABSTR, so it returns the address of that string!

We can confirm this further by calling the Unicode entry point, just as we did in an earlier example. The following declaration and code:

Declare Function CharUpperWide Lib "user32" Alias "CharUpperW" ( _
   ByVal lpsz As Long _
) As Long
 
' Construct an LPSTR
s = "help"
lng = StrPtr(s)
Debug.Print lng
Debug.Print CharUpperWide(lng)
Debug.Print s

returns:

 1980916 
 1980916 
HELP

Now the two addresses are the same, since no translation occurs!

For dealing with characters, we can make the following declaration:

Declare Function CharUpperForChar Lib "user32" Alias "CharUpperA" ( _
   ByVal lpsz As Long _
) As Long

For instance, calling:

Debug.Print Chr(CharUpperForChar(CLng(Asc("a"))))

returns an uppercase A.

You might think we could combine the two declarations by using As Any.

Declare Function CharUpperAsAny Lib "user32" Alias "CharUpperA" ( _
   ByVal lpsz As Any _
) As Long

The following code works:

s = "help"
Debug.Print StrPtr(s)
Debug.Print CharUpperAsAny(s)
Debug.Print s

as does:

Debug.Print Chr(CharUpperAsAny(CLng(Asc("a"))))

and:

Debug.Print Chr(CharUpperAsAny(97&))

(which returns the uppercase letter A.) However, the following code crashes my computer:

Debug.Print CharUpperAsAny(&H11000)

The problem is that the CharUpper function sees that the upper word of &H11000 is nonzero, so it assumes that the value is an address. But this is fatal. Who knows what is at address &H1100? In my case, it is protected memory.

What Happened to My Pointer?

There is another, much more insidious problem that can arise in connection with passing strings to API functions. As we can see from the CharUpper case, the API occasionally uses a single parameter to hold multiple data types (at different times, of course). Imagine the following hypothetical circumstance.

A certain API function has declaration:

LPTSTR WatchOut(
   int nFlags    // flags
   LPTSTR lpsz   // pointer to string or length as a long
);

The documentation says that if nFlags has value WO_TEXT (a symbolic constant defined somewhere), then lpsz will receive an LPTSTR string (pointer to a character array), but if nFlags has value WO_LENGTH, then lpsz gets the length of the string, as a long.

Now, if we make the VB declaration:

Declare Function WatchOut Lib "whatever" ( _
   ByVal nFlags As Integer
   ByVal lpsz As String _
) As Long

we can get into real trouble. In particular, if we set nFlags equal to WO_LENGTH, then the following events take place under Windows 9x:

  1. We create an initial BSTR string buffer for lpsz (see Figure 6.8), say:
    Dim str As String
    str = String$(256, vbNullChar) 
    
    

    Figure 6-8. The initial BSTR

  2. VB creates a temporary ABSTR to pass to WatchOut, as shown in Figure 6-9.

    Figure 6-9. Creating a temporary ABSTR

  3. As Figure 6-10 shows, because nFlags = WO_LENGTH, WatchOut changes the pointer, not the character array!

    Figure 6-10. The resulting broken pointer

  4. VB tries to translate what it thinks is an ANSI character array at address zzzz of length ????. This is a disaster.

Under Windows NT, the WatchOut function changes the original BSTR pointer (instead of an ANSI copy), but this will have the same disastrous effects. Note that even if we somehow are unlucky enough to escape a crash when VB tries to translate the fraudulent ABSTR, the result will be garbage, the program may crash after we send it to our customers, and there is still the matter of the dangling string, whose memory will not be recovered until the program terminates. This is called a memory leak.

The problem can be summarized quite simply: occasionally an API function will change a string pointer (not the string itself) to a numeric value. But VB still thinks it has a pointer. This spells disaster. In addition, testing to see whether the contents of the BSTR pointer variable have changed doesn't solve the problem, because as we have seen (Figure 6-8), VB sometimes changes the pointer to point to a legitimate character array!

As it happens, the situation described earlier can occur. Here is an important example, which we will play with at the end of the chapter.

The GetMenuItemInfo function retrieves information about a Windows menu item. Its declaration is:

BOOL GetMenuItemInfo(
  HMENU hMenu,            // handle of menu
  uint uItem,             // indicates which item to look at
  BOOL fByPosition,       // used with uItem
  MENUITEMINFO *lpmii     // pointer to structure (see discussion)
);

where, in particular, the parameter lpmii is a pointer to a MENUITEMINFO structure that will be filled in by GetMenuItemInfo. This structure is:

typedef struct tagMENUITEMINFO {
   UINT cbSize; 
   UINT fMask; 
   UINT fType; 
   UINT fState; 
   UINT wID; 
   HMENU hSubMenu; 
   HBITMAP hbmpChecked; 
   HBITMAP hbmpUnchecked; 
   DWORD dwItemData; 
   LPTSTR dwTypeData; 
   UINT cch; 
}

Note that the penultimate member is an LPTSTR.

Now, the rpiAPIData application on the accompanying CD will automatically translate this to a VB user-defined type, replacing all C data types in this case by VB longs:

Public Type MENUITEMINFO  
   cbSize  As Long           '//UINT
   fMask  As Long            '//UINT
   fType  As Long            '//UINT
   fState  As Long           '//UINT
   wID  As Long              '//UINT
   hSubMenu  As Long         '//HMENU
   hbmpChecked  As Long      '//HBITMAP
   hbmpUnchecked  As Long    '//HBITMAP
   dwItemData  As Long       '//DWORD
   dwTypeData  As Long       '//LPTSTR
   cch  As Long              '//UINT
End Type

Suppose instead that the LPTSTR was translated into a VB string:

dwTypeData  As String        '//LPTSTR

According to the documentation for MENUITEMINFO, if we set the fMask parameter to MIIM_TYPE, allocate a suitable string buffer in dwTypeData, and place its length in cch, then the GetMenuItemInfo function will retrieve the type of the menu item into fType (and adjust the value of cch). If this type is MFT_TEXT, then the string buffer will be filled with the text of that menu item. However, and this is the problem, if the type is MFT_BITMAP, then the low-order word of dwTypeData gets the bitmap's handle (and cch is ignored).

Thus, GetMenuItemInfo may change dwDataType from an LPTSTR to a bitmap handle! This is exactly the problem we described earlier. We will consider an actual example of this later in the chapter. Keep in mind also that even if the type is MFT_TEXT, the dwDataType pointer may be changed to point to a different character buffer.

So if we shouldn't use a string variable for dwDataType, what should we do?

The answer is that we should create our own character array by declaring a byte array and pass a pointer to that array. In other words, we create our own LPSTR. Then VB will not try to interpret it as a VB string.

This even solves the orphaned array problem, for if the API function changes our LPSTR to a numeric value (like a bitmap handle), we still retain a reference to the byte array (we had to create it somehow), so we can deallocate the memory ourselves (or it will be allocated when the byte array variable goes out of scope).

Before getting into a discussion of byte arrays and looking at an example, let us summarize:

Occasionally an API function will change an LPSTR to a numeric value. But VB will still think it has a string. This spells disaster. Moreover, testing to see whether the contents of the BSTR pointer variable have changed doesn't help because VB sometimes changes the original BSTR to point to a legitimate character array. Hence, if there is a chance that this might happen, you should create your own LPSTR using a byte array and use it in place of the BSTR. For safety, you may want to do this routinely when the string is embedded within a structure.

The last point made in the caveat is worth elaborating. Oftentimes an API function parameter refers to a structure, whose members may be other structures, whose members may, in turn, be other structures. This structure nesting can get quite involved. We will see an example when we create our DLL Export Table application. This makes it very difficult to keep track of what the API function might be doing to all of the structure members. The safest thing to do is to always use pointers to byte arrays (that is, LPSTRs) and avoid BSTRs completely when dealing with strings embedded in structures.

Strings and Byte Arrays

Of course, a byte array is just an array whose members have type byte, for instance:

Dim b(1 to 100) As Byte

To get a pointer to this byte array, we can use VarPtr:

Dim lpsz As Long

lpsz = VarPtr(b(1))   ' or rpiVarPtr(b(1))

(Even though it doesn't seem so, the letters lpsz stand for long pointer to null-terminated string.) Note that the address of the first member of the array is the address of the array.

Remembering that an LPSTR is a pointer to a null-terminated character array, we should initialize the array to nulls:

For i = 1 To 100
   b(i) = 0
Next

(It is true that VB does its own initialization, but it is not good programming practice to rely on this.)

Translating Between Byte Arrays and BSTRs

To copy a BSTR:

Dim s As String

to a byte array, we can proceed in a couple of different ways. For a strictly VB solution, we have:

s = "help"
Dim b(1 To 8) As Byte
For i = 1 To 8
   b(i) = AscB(MidB(s, i))
Next

Another approach is:

s = "help"
Dim b(1 To 8) As Byte
CopyMemory b(1), ByVal StrPtr(s), LenB(s)

Note that (in both cases) we get:

104  0  101  0  108  0  112  0 

showing that the bytes are reversed in each Unicode integer.

In the other direction, to copy a byte array into a BSTR, VB gives us some help. If b is a Unicode byte array, we can just write:

Dim t As String
t = b

For an ANSI byte array b, we write:

Dim t As String
t = StrConv(b, vbUnicode)

Note, however, that the StrConv function does not recognize a null terminator in the byte array--it will translate the entire array. Any nulls that are encountered in the array become embedded nulls in the BSTR.

Translating Between BSTRs and LPTSTRs

Let us consider how to translate back and forth between BSTRs and LPTSTRs.

From BSTR to LPWSTR

Getting a BSTR into a Unicode byte array is conceptually easy, because the character array of the BSTR is a Unicode byte array, so all we need to do is copy the bytes one by one. Here is a function to translate BSTRs to LPWSTRs:

Function BSTRtoLPWSTR(sBSTR As String, b() As Byte, lpwsz As Long) As Long

' Input: a nonempty BSTR string
' Input: **undimensioned** byte array b()
' Output: Fills byte array b() with Unicode char string from sBSTR
' Output: Fills lpwsz with a pointer to b() array
' Returns byte count, not including terminating 2-byte Unicode null character
' Original BSTR is not affected

Dim cBytes As Long

cBytes = LenB(sBSTR)

' ReDim array, with space for terminating null
ReDim b(1 To cBytes + 2) As Byte

' Point to BSTR char array
lpwsz = StrPtr(sBSTR)

' Copy the array
CopyMemory b(1), ByVal lpwsz, cBytes + 2

' Point lpsz to new array
lpwsz = VarPtr(b(1))

' Return byte count
BSTRtoLPWSTR = cBytes

End Function

This function takes a BSTR, an undimensioned byte array, and a long variable lng and converts the long to an LPWSTR. It returns the byte count as the return value of the function. Here is an example:

Dim b() As Byte
Dim lpsz As Long, lng As Long
lng = BSTRToLPWSTR("here", b, lpsz)

It might have occurred to you to simply copy the contents of the BSTR to the contents of lpsz:

lpsz = StrPtr(sBSTR)

The problem is that now we have two pointers to the same character array--a dangerous situation because VB does not realize this and might deallocate the array.

From BSTR to LPSTR

The function to convert a BSTR to an LPSTR is similar, but requires a translation from Unicode to ANSI first:

Function BSTRtoLPSTR(sBSTR As String, b() As Byte, lpsz As Long) As Long

' Input: a nonempty BSTR string
' Input: **undimensioned** byte array b()
' Output: Fills byte array b() with ANSI char string
' Output: Fills lpsz with a pointer to b() array
' Returns byte count, not including terminating null
' Original BSTR is not affected

Dim cBytes As Long
Dim sABSTR As String

cBytes = LenB(sBSTR)

' ReDim array, with space for terminating null
ReDim b(1 To cBytes + 2) As Byte

' Convert to ANSI
sABSTR = StrConv(sBSTR, vbFromUnicode)

' Point to BSTR char array
lpsz = StrPtr(sABSTR)

' Copy the array
CopyMemory b(1), ByVal lpsz, cBytes + 2

' Point lpsz to new array
lpsz = VarPtr(b(1))

' Return byte count
BSTRtoLPSTR = cBytes

End Function

From LPWSTR to BSTR

On return from an API call, you may have an LPWSTR, that is, a pointer to a null-terminated Unicode character array. Visual Basic makes it easy to get a BSTR from a byte array--just make an assignment using the equal sign. However, VB doesn't know how to handle a pointer to a byte array.

Here is a little utility:

Function LPWSTRtoBSTR(ByVal lpwsz As Long) As String

' Input: a valid LPWSTR pointer lpwsz
' Return: a sBSTR with the same character array

Dim cChars As Long

' Get number of characters in lpwsz
cChars = lstrlenW(lpwsz)

' Initialize string
LPWSTRtoBSTR = String$(cChars, 0)

' Copy string
CopyMemory ByVal StrPtr(LPWSTRtoBSTR), ByVal lpwsz, cChars * 2

End Function

From LPSTR to BSTR

We can modify the previous utility to return a BSTR from an LPSTR as follows (recall that Trim0 just truncates a string at the first null character):

Function LPSTRtoBSTR(ByVal lpsz As Long) As String

' Input: a valid LPSTR pointer lpsz
' Output: a sBSTR with the same character array

Dim cChars As Long

' Get number of characters in lpsz
cChars = lstrlenA(lpsz)

' Initialize string
LPSTRtoBSTR = String$(cChars, 0)

' Copy string
CopyMemory ByVal StrPtr(LPSTRtoBSTR), ByVal lpsz, cChars

' Convert to Unicode
LPSTRtoBSTR = Trim0(StrConv(LPSTRtoBSTR, vbUnicode))

End Function

Example: Using Byte Arrays

Let us demonstrate the use of byte arrays with a simple example using CharUpper. We have seen that this function is declared as:

LPTSTR CharUpper(
  LPTSTR lpsz         // single character or pointer to string
);

This leads to two reasonable translations into VB:

Declare Function CharUpperByBSTR Lib "user32" Alias "CharUpperA" ( _
   ByVal s As String _
) As Long

or:

Declare Function CharUpperByLPSTR Lib "user32" Alias "CharUpperA" ( _
   ByVal lpsz As Long _
) As Long

We have seen the first form in action, so let us try the second form.

The following code first converts a BSTR to an LPSTR. Note that we should not convert to LPWSTR, since LPWSTRs are passed to CharUpperA without translation by VB; and if we passed an LPWSTR, then as soon as CharUpperA encountered the null byte that is part of the first Unicode character in the LPWSTR, it would think the string had ended. Thus, it would capitalize only the first character in the string.

The LPSTR is then passed to CharUpperA, which converts it to uppercase. Having saved the LPSTR pointer, we can check to see if it has been changed. If not, we translate the LPSTR back to a BSTR and print it. If the pointer is changed, then we must deallocate the byte array ourselves (or just let the array variable pass out of scope).

Of course, in this simple example, the pointer should not be changed by CharUpper. Nevertheless, this same procedure will deal with API functions that may change the pointer:

Public Sub CharUpperText

Dim lpsz As Long
Dim lpszOrg As Long
Dim sBSTR As String
Dim b() As Byte

sBSTR = "help"

' Convert BSTR to LPSTR.
BSTRtoLPSTR sBSTR, b, lpsz

' Save LPSTR to check for modification by API function
lpszOrg = lpsz

' Convert to upper case
CharUpperAsLPWSTR lpsz

' If pointer not modified, then convert back to BSTR
' and print
If lpszOrg = lpsz Then
   Debug.Print LPSTRtoBSTR(lpsz)
Else
   Erase b
   ' Use new value of lpsz if desired...
End If

End Sub

Example: Windows Menus

Let us turn to the example involving GetMenuItemInfo that we promised earlier. Recall that the GetMenuItemInfo function retrieves information about a Windows menu item. Its VB declaration is:

Declare Function GetMenuItemInfo Lib "user32" Alias "GetMenuItemInfoA" ( _
   ByVal hMenu As Long, _
   ByVal uItem As Long, _
   ByVal lByPos As Long, _
   ByRef lpMenuItemInfo As MENUITEMINFO _
) As Long

where, in particular, the parameter lpmii is a pointer to a MENUITEMINFO structure that will be filled in by GetMenuItemInfo. This structure is:

Public Type MENUITEMINFO  
   cbSize  As Long
   fMask  As Long
   fType  As Long
   fState  As Long
   wID  As Long
   hSubMenu  As Long
   hbmpChecked  As Long
   hbmpUnchecked  As Long 
   dwItemData  As Long
   dwTypeData  As Long
   cch  As Long
End Type

According to the documentation, if we set the fMask parameter to MIIM_TYPE, allocate a suitable string buffer in dwTypeData, and place its length in cch, then the GetMenuItemInfo function will retrieve the type of the menu item into fType. If this type is MFT_TEXT, then the string buffer will be filled with the text of that menu item. However, and this is the problem, if the type is MFT_BITMAP, then the low-order word of dwTypeData gets the bitmap's handle. Thus, in this case, GetMenuItemInfo will change dwTypeData from an LPTSTR to a bitmap handle!

Figure 6-11 shows a menu with a bitmap. We will discuss how to create such a menu in VB using the Win32 API in Chapter 21, Bitmaps.

Figure 6-11. A menu with a bitmap

Example 6-3 shows the code used to get the text for each of the items in this menu.

Example 6-3: Getting Menu Text

Public Sub GetMenuInfoExample

Const MIIM_TYPE = &H10      ' from WINUSER.H
Dim uMenuItemInfo As MENUITEMINFO
Dim bBuf(1 To 50) As Byte
Dim sText As String

' Initialize structure
uMenuItemInfo.cbSize = LenB(uMenuItemInfo)
uMenuItemInfo.fMask = MIIM_TYPE
uMenuItemInfo.dwTypeData = VarPtr(bBuf(1))
uMenuItemInfo.cch = 49

' Get menu text
For i = 0 To 2
   ' Must reset count each time
   uMenuItemInfo.cch = 49  

   ' Get TypeData before
   Debug.Print "Before:" & uMenuItemInfo.dwTypeData

   ' Call API
   lng = GetMenuItemInfo(hSubMenu, CLng(i), -1, uMenuItemInfo)

   ' Get TypeData after
   Debug.Print "After:" & uMenuItemInfo.dwTypeData

   ' Print text -- CAREFUL HERE
'   sText = StrConv(bBuf, vbUnicode)
'   Debug.Print sText

Next

End Sub

Here is what happens as this code executes.

The first loop (i = 0) presents no problems, and the output is:

Before:1479560
After:1479560

Observe that the buffer pointer had not changed. Hence, the commented code that prints the menu text would run without error.

The second loop also runs without error (as long as the statements involving sText are commented out). The output, however, is:

Before:1479560
After:3137668829

As the documentation suggests, GetMenuItemInfo returns the bitmap's handle in uMenuItemInfo.dwTypeData. Thus, we have lost the pointer to the buffer sBuf. On the third loop, the program will crash, because the third call to GetMenuItem-Info will try to write the menu text for the third item to an imaginary buffer at address 3137668829 = &Hbb0506dd. If this memory is protected (as it probably is), you will get a message similar to the one I got in Figure 6-12.

Figure 6-12. Whoops

Note that if we uncomment the lines of code that print the menu text, the code will probably crash when we come to these lines during the second loop.

To fix this code, we need to pay attention to when the pointer changes and correct the problem, as in Example 6-4.

Example 6-4: A Corrected Version of Example 6-3

Public Sub GetMenuInfoExample

Dim uMenuItemInfo As utMENUITEMINFO
Dim bBuf(1 To 50) As Byte
Dim sText As String
Dim lPointer As Long

' Initialize structure
uMenuItemInfo.cbSize = LenB(uMenuItemInfo)
uMenuItemInfo.fMask = MIIM_TYPE
uMenuItemInfo.dwTypeData = VarPtr(bBuf(1))
uMenuItemInfo.cch = 49

' Get menu text
For i = 0 To 2
   ' Must reset count each time
   uMenuItemInfo.cch = 49
   
   ' Save buffer pointer
   lPointer = uMenuItemInfo.dwTypeData

   Debug.Print "Before:" & uMenuItemInfo.dwTypeData

   ' Call API
   lng = GetMenuItemInfo(hSubMenu, CLng(i), -1, uMenuItemInfo)

   Debug.Print "After:" & uMenuItemInfo.dwTypeData
   
   ' Check for pointer change
   If lPointer <> uMenuItemInfo.dwTypeData Then
      Debug.Print "Bitmap!"
      ' Restore pointer
      uMenuItemInfo.dwTypeData = lPointer
   Else
      ' Print text
      sText = StrConv(bBuf, vbUnicode)
      Debug.Print sText
   End If
   
Next

End Sub

The output is:

Before:1760168
After:1760168
Test1
Before:1760168
After:1443168935
Bitmap!
Before:1760168
After:1760168
Test3

Note that if we had declared uMenuItemInfo.dwTypeData of type String, then as soon as GetMenuItemInfo changed the pointer to the bitmap handle, VB would think it had a character array at that location. We can't even watch out for this and reset the pointer, because the change might have been legitimate.

The previous discussion and the previous example have shown that we need to be very careful about BSTRs. In short, there are two issues that must be addressed:

  • A BSTR undergoes a BSTR-to-ABSTR translation when passed to an external function.
  • A BSTR may have its value changed to a non-BSTR value (such as a handle or length) by an external function.

Note that these issues must be addressed even when a BSTR is embedded in a structure.

In any case, the translation issue is generally not a problem, since VB does the reverse translation on the return value. However, the other issue can be a fatal problem. The only way to avoid it completely is to manually replace any BSTRs by LPSTRs, using a byte array.

Getting the Address of a Variable of User-Defined Type

An API programmer often needs to get the address of a variable of user-defined type. Consider, for example, the structure:

Type utExample
   sString As String
   iInteger As Integer
End Type

Dim uEx As utExample

Suppose we want to find the address of the variable uEx. First, note that the address of a structure variable is the same as the address of its first member.

Now consider the following code:

Debug.Print VarPtr(uEx)
Debug.Print VarPtr(uEx.sString)
Debug.Print VarPtr(uEx.iInteger)
Debug.Print
Debug.Print rpiVarPtr(uEx)
Debug.Print rpiVarPtr(uEx.sString)
Debug.Print rpiVarPtr(uEx.iInteger)

whose output is as follows.

 1243836 
 1243836 
 1243840 

 1243824 
 1243820 
 1243840
 

As you can see, VarPtr reports the address as you would expect: the address of uEx is the same as the address of uEx.aString, and the address of uEx.iInteger is 4 bytes larger, to account for the 4-byte BSTR.

On the other hand, the rpiVarPtr is susceptible to BSTR-to-ABSTR translation, which occurs on the member of the structure that is a BSTR.

The relationship between the first and second address in the second group may look strange until we remember that each call to rpiVarPtr produces a translation, so we cannot compare addresses from two separate calls, both of which involve translations!

On the other hand, the third address is the address of the original integer member. There is no translation in the call:

Debug.Print rpiVarPtr(uEx.iInteger)

because there are no BSTR parameters. Thus, we can use an external function such as rpiVarPtr to compute the address of a structure provided the structure has at least one non-BSTR parameter. In this event, we get the address of one such parameter and count backwards to the beginning of the structure.

Show: