Supplementary Characters

Updated: 12 December 2006

SQL Server 2005 uses a UCS-2 encoding scheme to encode Unicode characters in binary format. Some languages, such as Chinese, define additional characters that are less frequently used and are not included in the Unicode standard. UCS-2 stores these characters, supplementary characters, as two undefined Unicode characters that define a supplementary character in storage when they are paired together. This way, SQL Server stores supplementary characters without risk of loss or corruption. Also, in SQL Server 2005, you can use and display supplementary characters in any SQL Server-based application, including tools-based applications such as Business Intelligence.

When you are using a Microsoft input method editor (IME) for Windows Vista or Microsoft Office 2007, supplementary characters are represented by UCS code points U+10000 or higher.

When you use supplementary characters, consider the following limitations:

  • Supplementary characters can be used in ordering and comparison operations on 90 collation versions only. These comparisons are based only on code points, and not on other linguistically meaningful ways. Be careful when you use supplementary characters in operations such as ORDER BY, GROUP BY, and DISTINCT, and especially when supplementary and non-supplementary characters are included in the same operation. The 90 collation versions are signified by the 90 suffix added to their names. For example, instead of the Japanese collation, use Japanese_90. For more information, see Collation Settings in Setup.
  • Because supplementary characters are stored as pairs of two double-byte characters, the LEN() function returns the value 2 for each supplementary character that is contained in the argument string. Similarly, the functions CHARINDEX and PATINDEX misrepresent the occurrence of supplementary characters inside character strings, and the NCHAR function returns a character that represents only one-half of the supplementary character pair. Converting a binary or varbinary value to a supplementary character also produces only one-half of a supplementary character pair.
  • The LEFT, RIGHT, SUBSTRING, STUFF, and REVERSE functions may split any supplementary character pairs and lead to unexpected results.
  • Supplementary characters are not supported for use with the underscore (_), space ( ), and caret (^) wildcard characters.
  • Supplementary characters are not supported for use in metadata, such as in names of database objects.
  • Because of their unique storage mechanism, the presence of supplementary characters can consume additional resources during server and database upgrade operations.

SQL Server 2005 provides a sample that shows supplementary character-aware string processing inside a common language runtime (CLR) function. For more information, see Supplementary-Aware String Manipulation.

Release History

12 December 2006

New content:
  • Clarified how to determine whether a character is a supplementary character.

Community Additions