String.Normalize Method
Returns a new string whose textual value is the same as this string, but whose binary representation is in Unicode normalization form C.
Assembly: mscorlib (in mscorlib.dll)
Return Value
Type: System.StringA new, normalized string whose textual value is the same as this string, but whose binary representation is in normalization form C.
| Exception | Condition |
|---|---|
| ArgumentException | The current instance contains invalid Unicode characters. |
Some Unicode characters have multiple equivalent binary representations consisting of sets of combining and/or composite Unicode characters. The existence of multiple representations for a single character complicates searching, sorting, matching, and other operations.
The Unicode standard defines a process called normalization that returns one binary representation when given any of the equivalent binary representations of a character. Normalization can be performed with several algorithms, called normalization forms, that obey different rules. The .NET Framework currently supports normalization forms C, D, KC, and KD.
For a description of supported Unicode normalization forms, see System.Text.NormalizationForm.
Notes to CallersThe IsNormalized method returns false as soon as it encounters the first non-normalized character in a string. Therefore, if a string contains non-normalized characters followed by invalid Unicode characters, the Normalize method will throw an ArgumentException although IsNormalized returns false.
The following example normalizes a string to each of four normalization forms, confirms the string was normalized to the specified normalization form, then lists the code points in the normalized string.
// This example demonstrates the String.Normalize method // and the String.IsNormalized method using System; using System.Text; class Sample { public static void Main() { // Character c; combining characters acute and cedilla; character 3/4 string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'}); string s2 = null; string divider = new String('-', 80); divider = String.Concat(Environment.NewLine, divider, Environment.NewLine); try { Show("s1", s1); Console.WriteLine(); Console.WriteLine("U+0063 = LATIN SMALL LETTER C"); Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT"); Console.WriteLine("U+0327 = COMBINING CEDILLA"); Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS"); Console.WriteLine(divider); Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", s1.IsNormalized()); Console.WriteLine("A2) Is s1 normalized to Form C?: {0}", s1.IsNormalized(NormalizationForm.FormC)); Console.WriteLine("A3) Is s1 normalized to Form D?: {0}", s1.IsNormalized(NormalizationForm.FormD)); Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", s1.IsNormalized(NormalizationForm.FormKC)); Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", s1.IsNormalized(NormalizationForm.FormKD)); Console.WriteLine(divider); Console.WriteLine("Set string s2 to each normalized form of string s1."); Console.WriteLine(); Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE"); Console.WriteLine("U+0033 = DIGIT THREE"); Console.WriteLine("U+2044 = FRACTION SLASH"); Console.WriteLine("U+0034 = DIGIT FOUR"); Console.WriteLine(divider); s2 = s1.Normalize(); Console.Write("B1) Is s2 normalized to the default form (Form C)?: "); Console.WriteLine(s2.IsNormalized()); Show("s2", s2); Console.WriteLine(); s2 = s1.Normalize(NormalizationForm.FormC); Console.Write("B2) Is s2 normalized to Form C?: "); Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC)); Show("s2", s2); Console.WriteLine(); s2 = s1.Normalize(NormalizationForm.FormD); Console.Write("B3) Is s2 normalized to Form D?: "); Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD)); Show("s2", s2); Console.WriteLine(); s2 = s1.Normalize(NormalizationForm.FormKC); Console.Write("B4) Is s2 normalized to Form KC?: "); Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC)); Show("s2", s2); Console.WriteLine(); s2 = s1.Normalize(NormalizationForm.FormKD); Console.Write("B5) Is s2 normalized to Form KD?: "); Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD)); Show("s2", s2); Console.WriteLine(); } catch (Exception e) { Console.WriteLine(e.Message); } } private static void Show(string title, string s) { Console.Write("Characters in string {0} = ", title); foreach(short x in s.ToCharArray()) { Console.Write("{0:X4} ", x); } Console.WriteLine(); } } /* This example produces the following results: Characters in string s1 = 0063 0301 0327 00BE U+0063 = LATIN SMALL LETTER C U+0301 = COMBINING ACUTE ACCENT U+0327 = COMBINING CEDILLA U+00BE = VULGAR FRACTION THREE QUARTERS -------------------------------------------------------------------------------- A1) Is s1 normalized to the default form (Form C)?: False A2) Is s1 normalized to Form C?: False A3) Is s1 normalized to Form D?: False A4) Is s1 normalized to Form KC?: False A5) Is s1 normalized to Form KD?: False -------------------------------------------------------------------------------- Set string s2 to each normalized form of string s1. U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE U+0033 = DIGIT THREE U+2044 = FRACTION SLASH U+0034 = DIGIT FOUR -------------------------------------------------------------------------------- B1) Is s2 normalized to the default form (Form C)?: True Characters in string s2 = 1E09 00BE B2) Is s2 normalized to Form C?: True Characters in string s2 = 1E09 00BE B3) Is s2 normalized to Form D?: True Characters in string s2 = 0063 0327 0301 00BE B4) Is s2 normalized to Form KC?: True Characters in string s2 = 1E09 0033 2044 0034 B5) Is s2 normalized to Form KD?: True Characters in string s2 = 0063 0327 0301 0033 2044 0034 */
Windows 8, Windows Server 2012, Windows 7, Windows Vista SP2, Windows Server 2008 (Server Core Role not supported), Windows Server 2008 R2 (Server Core Role supported with SP1 or later; Itanium not supported)
The .NET Framework does not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.