Article
10/04/2012

Supplementary Characters

The data types nchar and nvarchar store each character as a 16-bit value in an encoding called UCS-2. This encoding, defined by versions of Unicode prior to 1996, supports characters in the range U+0000 to U+FFFF. Newer versions of Unicode have defined additional characters in the range U+10000 to U+10FFFF called supplementary characters. These characters are stored as pairs of 16-bit values, called surrogate pairs, in an encoding called UTF-16. All new _100 level collations support linguistic sorting with supplementary characters.

If you use supplementary characters, consider the following limitations:

Supplementary characters can only be used in ordering and comparison operations in collation versions 90 or greater.
Because supplementary characters are stored as two 16-bit values, the LEN() function returns the value 2 for each supplementary character that is contained in the argument string. Similarly, the functions CHARINDEX and PATINDEX misrepresent the occurrence of supplementary characters inside character strings.
The LEFT, RIGHT, SUBSTRING, STUFF, and REVERSE functions may split any surrogate pairs and lead to unexpected results.
Supplementary characters are not supported for use with the underscore (_), percent (%), and caret (^) wildcard characters.
Supplementary characters are not supported for use in metadata, such as in names of database objects.

For a Transact-SQL script related to this scenario, see the Supplementary-Aware String Manipulation sample. For information about samples, see Considerations for Installing SQL Server Samples and Sample Databases.

Supplementary Characters

See Also

Concepts

Additional resources