Visual Basic Concepts

DBCS String Manipulation Functions

Although a double-byte character consists of a lead byte and a trail byte and requires two consecutive storage bytes, it must be treated as a single unit in any operation involving characters and strings. Several string manipulation functions properly handle all strings, including DBCS characters, on a character basis.

These functions have an ANSI/DBCS version and a binary version and/or Unicode version, as shown in the following table. Use the appropriate functions, depending on the purpose of string manipulation.

The "B" versions of the functions in the following table are intended especially for use with strings of binary data. The "W" versions are intended for use with Unicode strings.

Function Description
Asc Returns the ANSI or DBCS character code for the first character of a string.
AscB Returns the value of the first byte in the given string containing binary data.
AscW Returns the Unicode character code for the first character of a string.
Chr Returns a string containing a specific ANSI or DBCS character code.
ChrB Returns a binary string containing a specific byte.
ChrW Returns a string containing a specific Unicode character code.
Input Returns a specified number of ANSI or DBCS characters from a file.
InputB Returns a specified number of bytes from a file.
InStr Returns the first occurrence of one string within another.
InStrB Returns the first occurrence of a byte in a binary string.
Left, Right Returns a specified number of characters from the right or left sides of a string.
LeftB, RightB Returns a specified number of bytes from the left or right side of a binary string.
Len Returns the length of the string in number of characters.
LenB Returns the length of the string in number of bytes.
Mid Returns a specified number of characters from a string.
MidB Returns the specified number of bytes from a binary string.

The functions without a "B" or "W" in this table correctly handle DBCS and ANSI characters. In addition to the functions above, the String function handles DBCS characters. This means that all these functions consider a DBCS character as one character even if that character consists of 2 bytes.

The behavior of these functions is different when they're handling SBCS and DBCS characters. For instance, the Mid function is used in Visual Basic to return a specified number of characters from a string. In locales using DBCS, the number of characters and the number of bytes are not necessarily the same. Mid would only return the number of characters, not bytes.

In most cases, use the character-based functions when you handle string data because these functions can properly handle ANSI strings, DBCS strings, and Unicode strings.

The byte-based string manipulation functions, such as LenB and LeftB, are provided to handle the string data as binary data. When you store the characters to a String variable or get the characters from a String variable, Visual Basic automatically converts between Unicode and ANSI characters. When you handle the binary data, use the Byte array instead of the String variable and the byte-based string manipulation functions.

For More Information   See the Language Reference for the appropriate function.

If you want to handle strings of binary data, you can map the characters in a string to a Byte array by using the following code:

Dim MyByteString() As Byte
' Map the string to a Byte array.
MyByteString = "ABC"
' Display the binary data.
For i = LBound(MyByteString) to UBound(MyByteString)
   Print Right(" " + Hex(MyByteString(i)),2) + " ,";
Next
Print

DBCS String Conversion

Visual Basic provides several string conversion functions that are useful for DBCS characters: StrConv, UCase, and LCase.

StrConv Function

The global options of the StrConv function are converting uppercase to lowercase, and vice versa. In addition to those options, the function has several DBCS-specific options. For example, you can convert narrow letters to wide letters by specifying vbWide in the second argument of this function. You can convert one character type to another, such as hiragana to katakana in Japanese. StrConv enables you to specify a LocaleID for the string, if different than the system's LocaleID.

You can also use the StrConv function to convert Unicode characters to ANSI/DBCS characters, and vice versa. Usually, a string in Visual Basic consists of Unicode characters. When you need to handle strings in ANSI/DBCS (for example, to calculate the number of bytes in a string before writing the string into a file), you can use this functionality of the StrConv function.

Case Conversion in Wide-Width Letters

You can convert the case of letters by using the StrConv function with vbUpperCase or vbLowerCase, or by using the UCase or LCase functions. When you use these functions, the case of English wide-width letters in DBCS are converted as well as ANSI characters.