CharSet Property

Article
07/27/2012

Identifies the character set of the data file.

Syntax

[ sValue = ] TDC.CharSet

Possible Values

sValue String expression that describes the character set used for the data file. If no value is supplied, the input file is interpreted using codepage 1252 (Western Alphabet). See the table below for the full listing of possible values for this parameter.

The property is read/write. The property has no default value.

Remarks

The following table defines the list of possible values for the character set codes.

Code Value (Codepage) Alphabet

DIN_66003 20106 IA5 (German)

NS_4551-1 20108 IA5 (Norwegian)

SEN_850200_B 20107 IA5 (Swedish)

_autodetect 50932 Japanese (Auto Select)

_autodetect_kr 50949 Korean (Auto Select)

big5 950 Chinese Traditional (Big5)

csISO2022JP 50221 Japanese (JIS-Allow 1 byte Kana)

euc-kr 51949 Korean (EUC)

gb2312 936 Chinese Simplified (GB2312)

hz-gb-2312 52936 Chinese Simplified (HZ)

ibm852 852 Central European (DOS)

ibm866 866 Cyrillic Alphabet (DOS)

irv 20105 IA5 (IRV)

iso-2022-jp 50220 Japanese (JIS)

iso-2022-jp 50222 Japanese (JIS-Allow 1 byte Kana)

iso-2022-kr 50225 Korean (ISO)

iso-8859-1 1252 Western Alphabet

iso-8859-1 28591 Western Alphabet (ISO)

iso-8859-2 28592 Central European Alphabet (ISO)

iso-8859-3 28593 Latin 3 Alphabet (ISO)

iso-8859-4 28594 Baltic Alphabet (ISO)

iso-8859-5 28595 Cyrillic Alphabet (ISO)

iso-8859-6 28596 Arabic Alphabet (ISO)

iso-8859-7 28597 Greek Alphabet (ISO)

iso-8859-8 28598 Hebrew Alphabet (ISO)

koi8-r 20866 Cyrillic Alphabet (KOI8-R)

ks_c_5601 949 Korean

shift-jis 932 Japanese (Shift-JIS)

unicode 1200 Universal Alphabet

unicodeFEFF 1201 Universal Alphabet (Big-Endian)

utf-7 65000 Universal Alphabet (UTF-7)

utf-8 65001 Universal Alphabet (UTF-8)

windows-1250 1250 Central European Alphabet (Windows)

windows-1251 1251 Cyrillic Alphabet (Windows)

windows-1252 1252 Western Alphabet (Windows)

windows-1253 1253 Greek Alphabet (Windows)

windows-1254 1254 Turkish Alphabet

windows-1255 1255 Hebrew Alphabet (Windows)

windows-1256 1256 Arabic Alphabet (Windows)

windows-1257 1257 Baltic Alphabet (Windows)

windows-1258 1258 Vietnamese Alphabet (Windows)

windows-874 874 Thai (Windows)

x-euc 51932 Japanese (EUC)

x-user-defined 50000 User Defined

In normal use, the CharSet property is set in the Web page (or is left at its default value) and never referenced again. Although you can set the property when the data has been loaded, this will not change the interpretation of the data. The only exception to this is if the DataURL property is set to a new value that forces all properties to be reevaluated.

The Tabular Data Control (TDC) determines the codepage for the source data file incorrectly in certain scenarios. The problem can occur if the ambient codepage is Unicode, in which case the TDC assumes that the bound data is also Unicode, which is not necessarily true. When the TDC attempts to identify the Unicode signature in the byte-reversed case, it compares the value incorrectly. If the TDC reads a variable that is uninitialized, it waits for an excessive period of time in an attempt to identify the ambient codepage. If the TDC changes its codepage because it sees a Unicode signature, it fails to update its CharSet property. Therefore, to avoid potential problems associated with incorrect codepage identification, the CharSet property can be set explicitly when declaring the TDC, as shown in the following example.
<OBJECT classid=CLSID:333C7BC4-460F-11D0-BC04-0080C7055A83>
<PARAM NAME="CharSet" VALUE="iso-8859-1" / >
</OBJECT>

Code	Value (Codepage)	Alphabet
DIN_66003	20106	IA5 (German)
NS_4551-1	20108	IA5 (Norwegian)
SEN_850200_B	20107	IA5 (Swedish)
_autodetect	50932	Japanese (Auto Select)
_autodetect_kr	50949	Korean (Auto Select)
big5	950	Chinese Traditional (Big5)
csISO2022JP	50221	Japanese (JIS-Allow 1 byte Kana)
euc-kr	51949	Korean (EUC)
gb2312	936	Chinese Simplified (GB2312)
hz-gb-2312	52936	Chinese Simplified (HZ)
ibm852	852	Central European (DOS)
ibm866	866	Cyrillic Alphabet (DOS)
irv	20105	IA5 (IRV)
iso-2022-jp	50220	Japanese (JIS)
iso-2022-jp	50222	Japanese (JIS-Allow 1 byte Kana)
iso-2022-kr	50225	Korean (ISO)
iso-8859-1	1252	Western Alphabet
iso-8859-1	28591	Western Alphabet (ISO)
iso-8859-2	28592	Central European Alphabet (ISO)
iso-8859-3	28593	Latin 3 Alphabet (ISO)
iso-8859-4	28594	Baltic Alphabet (ISO)
iso-8859-5	28595	Cyrillic Alphabet (ISO)
iso-8859-6	28596	Arabic Alphabet (ISO)
iso-8859-7	28597	Greek Alphabet (ISO)
iso-8859-8	28598	Hebrew Alphabet (ISO)
koi8-r	20866	Cyrillic Alphabet (KOI8-R)
ks_c_5601	949	Korean
shift-jis	932	Japanese (Shift-JIS)
unicode	1200	Universal Alphabet
unicodeFEFF	1201	Universal Alphabet (Big-Endian)
utf-7	65000	Universal Alphabet (UTF-7)
utf-8	65001	Universal Alphabet (UTF-8)
windows-1250	1250	Central European Alphabet (Windows)
windows-1251	1251	Cyrillic Alphabet (Windows)
windows-1252	1252	Western Alphabet (Windows)
windows-1253	1253	Greek Alphabet (Windows)
windows-1254	1254	Turkish Alphabet
windows-1255	1255	Hebrew Alphabet (Windows)
windows-1256	1256	Arabic Alphabet (Windows)
windows-1257	1257	Baltic Alphabet (Windows)
windows-1258	1258	Vietnamese Alphabet (Windows)
windows-874	874	Thai (Windows)
x-euc	51932	Japanese (EUC)
x-user-defined	50000	User Defined

Applies To

TDC

CharSet Property

Additional resources