NormalizationForm Enumeration
Assembly: mscorlib (in mscorlib.dll)
| Member name | Description | |
|---|---|---|
| FormC | Indicates that a Unicode string is normalized using full canonical decomposition, followed by the replacement of sequences with their primary composites, if possible. | |
| FormD | Indicates that a Unicode string is normalized using full canonical decomposition. | |
| FormKC | Indicates that a Unicode string is normalized using full compatibility decomposition, followed by the replacement of sequences with their primary composites, if possible. | |
| FormKD | Indicates that a Unicode string is normalized using full compatibility decomposition. |
Some Unicode sequences are considered equivalent because they represent the same character. For example, the following are considered equivalent because any of these can be used to represent "ắ":
-
"\u1EAF"
-
"\u0103\u0301"
-
"\u0061\u0306\u0301"
However, ordinal comparisons (that is, binary comparisons) consider these sequences different because they contain different Unicode code values. Before performing ordinal comparisons, these strings must first be normalized; that is, they must be decomposed into their basic components.
Each composite Unicode character is mapped to a more basic sequence of one or more characters. The process of decomposition replaces composite characters in a string with their more basic mapping. A full decomposition recursively performs this replacement until none of the characters in the string can be decomposed further.
Unicode defines two types of decompositions: compatibility decomposition and canonical decomposition. In compatibility decomposition, formatting information might be lost. In canonical decomposition, which is a subset of compatibility decomposition, formatting information is preserved.
Two sets of characters are considered to have canonical equivalence if their full canonical decompositions are identical. Likewise, two sets of characters are considered to have compatibility equivalence if their full compatibility decompositions are identical.
For more information on normalization, decompositions and equivalence, see The Unicode Standard at www.unicode.org.
The following code example determines if an encoding is always normalized using the different normalization forms.
Imports System Imports System.Text Public Class SamplesASCIIEncoding Public Shared Sub Main() ' Display the value of IsAlwaysNormalized for every normalization form. Console.WriteLine("{0,30} FormC FormKC FormD FormKD", "") PrintNormalization(New UTF32Encoding(True, True, True)) PrintNormalization(New UnicodeEncoding(True, True, True)) PrintNormalization(New UTF8Encoding(True, True)) PrintNormalization(New UTF7Encoding(True)) PrintNormalization(New ASCIIEncoding()) End Sub 'Main Public Shared Sub PrintNormalization(enc As Encoding) Console.Write("{0,-30} ", enc.ToString()) Console.Write("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormC)) Console.Write("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormKC)) Console.Write("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormD)) Console.WriteLine("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormKD)) End Sub 'PrintNormalization End Class 'SamplesASCIIEncoding 'This code produces the following output. ' ' FormC FormKC FormD FormKD 'System.Text.UTF32Encoding False False False False 'System.Text.UnicodeEncoding False False False False 'System.Text.UTF8Encoding False False False False 'System.Text.UTF7Encoding False False False False 'System.Text.ASCIIEncoding True True True True
import System.*;
import System.Text.*;
public class SamplesASCIIEncoding
{
public static void main(String[] args)
{
// Display the value of IsAlwaysNormalized
//for every normalization form.
Console.WriteLine("{0,30} FormC FormKC FormD FormKD", "");
PrintNormalization(new UTF32Encoding(true, true, true));
PrintNormalization(new UnicodeEncoding(true, true, true));
PrintNormalization(new UTF8Encoding(true, true));
PrintNormalization(new UTF7Encoding(true));
PrintNormalization(new ASCIIEncoding());
} //main
public static void PrintNormalization(Encoding enc)
{
Console.Write("{0,-30} ", enc.ToString());
Console.Write("{0,-8}",
System.Convert.ToString(enc.IsAlwaysNormalized(NormalizationForm.FormC)));
Console.Write("{0,-8}",
System.Convert.ToString(enc.IsAlwaysNormalized(NormalizationForm.FormKC)));
Console.Write("{0,-8}",
System.Convert.ToString(enc.IsAlwaysNormalized(NormalizationForm.FormD)));
Console.WriteLine("{0,-8}",
System.Convert.ToString(enc.IsAlwaysNormalized(NormalizationForm.FormKD)));
} //PrintNormalization
} //SamplesASCIIEncoding
/*
This code produces the following output.
FormC FormKC FormD FormKD
System.Text.UTF32Encoding False False False False
System.Text.UnicodeEncoding False False False False
System.Text.UTF8Encoding False False False False
System.Text.UTF7Encoding False False False False
System.Text.ASCIIEncoding True True True True
*/
Windows 98, Windows 2000 SP4, Windows Millennium Edition, Windows Server 2003, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP SP2, Windows XP Starter Edition
The .NET Framework does not support all versions of every platform. For a list of the supported versions, see System Requirements.