CharSet Property

Identifies the character set of the data file.

Syntax

[ sValue = ] TDC.CharSet

Possible Values

sValue String expression that describes the character set used for the data file. If no value is supplied, the input file is interpreted using codepage 1252 (Western Alphabet). See the table below for the full listing of possible values for this parameter.

The property is read/write. The property has no default value.

Remarks

The following table defines the list of possible values for the character set codes.

Code Value (Codepage) Alphabet
DIN_66003 20106 IA5 (German)
NS_4551-1 20108 IA5 (Norwegian)
SEN_850200_B 20107 IA5 (Swedish)
_autodetect 50932 Japanese (Auto Select)
_autodetect_kr 50949 Korean (Auto Select)
big5 950 Chinese Traditional (Big5)
csISO2022JP 50221 Japanese (JIS-Allow 1 byte Kana)
euc-kr 51949 Korean (EUC)
gb2312 936 Chinese Simplified (GB2312)
hz-gb-2312 52936 Chinese Simplified (HZ)
ibm852 852 Central European (DOS)
ibm866 866 Cyrillic Alphabet (DOS)
irv 20105 IA5 (IRV)
iso-2022-jp 50220 Japanese (JIS)
iso-2022-jp 50222 Japanese (JIS-Allow 1 byte Kana)
iso-2022-kr 50225 Korean (ISO)
iso-8859-1 1252 Western Alphabet
iso-8859-1 28591 Western Alphabet (ISO)
iso-8859-2 28592 Central European Alphabet (ISO)
iso-8859-3 28593 Latin 3 Alphabet (ISO)
iso-8859-4 28594 Baltic Alphabet (ISO)
iso-8859-5 28595 Cyrillic Alphabet (ISO)
iso-8859-6 28596 Arabic Alphabet (ISO)
iso-8859-7 28597 Greek Alphabet (ISO)
iso-8859-8 28598 Hebrew Alphabet (ISO)
koi8-r 20866 Cyrillic Alphabet (KOI8-R)
ks_c_5601 949 Korean
shift-jis 932 Japanese (Shift-JIS)
unicode 1200 Universal Alphabet
unicodeFEFF 1201 Universal Alphabet (Big-Endian)
utf-7 65000 Universal Alphabet (UTF-7)
utf-8 65001 Universal Alphabet (UTF-8)
windows-1250 1250 Central European Alphabet (Windows)
windows-1251 1251 Cyrillic Alphabet (Windows)
windows-1252 1252 Western Alphabet (Windows)
windows-1253 1253 Greek Alphabet (Windows)
windows-1254 1254 Turkish Alphabet
windows-1255 1255 Hebrew Alphabet (Windows)
windows-1256 1256 Arabic Alphabet (Windows)
windows-1257 1257 Baltic Alphabet (Windows)
windows-1258 1258 Vietnamese Alphabet (Windows)
windows-874 874 Thai (Windows)
x-euc 51932 Japanese (EUC)
x-user-defined 50000 User Defined

In normal use, the CharSet property is set in the Web page (or is left at its default value) and never referenced again. Although you can set the property when the data has been loaded, this will not change the interpretation of the data. The only exception to this is if the DataURL property is set to a new value that forces all properties to be reevaluated.

The Tabular Data Control (TDC) determines the codepage for the source data file incorrectly in certain scenarios. The problem can occur if the ambient codepage is Unicode, in which case the TDC assumes that the bound data is also Unicode, which is not necessarily true. When the TDC attempts to identify the Unicode signature in the byte-reversed case, it compares the value incorrectly. If the TDC reads a variable that is uninitialized, it waits for an excessive period of time in an attempt to identify the ambient codepage. If the TDC changes its codepage because it sees a Unicode signature, it fails to update its CharSet property. Therefore, to avoid potential problems associated with incorrect codepage identification, the CharSet property can be set explicitly when declaring the TDC, as shown in the following example.

<OBJECT classid=CLSID:333C7BC4-460F-11D0-BC04-0080C7055A83>
<PARAM NAME="CharSet" VALUE="iso-8859-1" / >
</OBJECT>

Applies To

TDC