Click to Rate and Give Feedback
MSDN
MSDN Library
.NET Development
.NET Framework 3.5
 NormalizationForm Enumeration
Collapse All/Expand All Collapse All
This page is specific to
Microsoft Visual Studio 2008/.NET Framework 3.5

Other versions are also available for the following:
.NET Framework Class Library
NormalizationForm Enumeration

Defines the type of normalization to perform.

Namespace:  System.Text
Assembly:  mscorlib (in mscorlib.dll)
Visual Basic (Declaration)
<ComVisibleAttribute(True)> _
Public Enumeration NormalizationForm
Visual Basic (Usage)
Dim instance As NormalizationForm
C#
[ComVisibleAttribute(true)]
public enum NormalizationForm
Visual C++
[ComVisibleAttribute(true)]
public enum class NormalizationForm
JScript
public enum NormalizationForm
Member nameDescription
FormCIndicates that a Unicode string is normalized using full canonical decomposition, followed by the replacement of sequences with their primary composites, if possible.
FormDIndicates that a Unicode string is normalized using full canonical decomposition.
FormKCIndicates that a Unicode string is normalized using full compatibility decomposition, followed by the replacement of sequences with their primary composites, if possible.
FormKDIndicates that a Unicode string is normalized using full compatibility decomposition.

Some Unicode sequences are considered equivalent because they represent the same character. For example, the following are considered equivalent because any of these can be used to represent "ắ":

  • "\u1EAF"

  • "\u0103\u0301"

  • "\u0061\u0306\u0301"

However, ordinal, that is, binary, comparisons consider these sequences different because they contain different Unicode code values. Before performing ordinal comparisons, applications must normalize these strings to decompose them into their basic components.

Each composite Unicode character is mapped to a more basic sequence of one or more characters. The process of decomposition replaces composite characters in a string with their more basic mappings. A full decomposition recursively performs this replacement until none of the characters in the string can be decomposed further.

Unicode defines two types of decompositions: compatibility decomposition and canonical decomposition. In compatibility decomposition, formatting information might be lost. In canonical decomposition, which is a subset of compatibility decomposition, formatting information is preserved.

Two sets of characters are considered to have canonical equivalence if their full canonical decompositions are identical. Likewise, two sets of characters are considered to have compatibility equivalence if their full compatibility decompositions are identical.

For more information on normalization, decompositions and equivalence, see The Unicode Standard at the Unicode home page.

The following code example determines if an encoding is always normalized using the different normalization forms.

Visual Basic
Imports System
Imports System.Text

Public Class SamplesASCIIEncoding

   Public Shared Sub Main()

      ' Display the value of IsAlwaysNormalized for every normalization form.
      Console.WriteLine("{0,30} FormC   FormKC  FormD   FormKD", "")
      PrintNormalization(New UTF32Encoding(True, True, True))
      PrintNormalization(New UnicodeEncoding(True, True, True))
      PrintNormalization(New UTF8Encoding(True, True))
      PrintNormalization(New UTF7Encoding(True))
      PrintNormalization(New ASCIIEncoding())

   End Sub 'Main

   Public Shared Sub PrintNormalization(enc As Encoding)
      Console.Write("{0,-30} ", enc.ToString())
      Console.Write("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormC))
      Console.Write("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormKC))
      Console.Write("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormD))
      Console.WriteLine("{0,-8}", enc.IsAlwaysNormalized(NormalizationForm.FormKD))
   End Sub 'PrintNormalization

End Class 'SamplesASCIIEncoding 


'This code produces the following output.
'
'                               FormC   FormKC  FormD   FormKD
'System.Text.UTF32Encoding      False   False   False   False
'System.Text.UnicodeEncoding    False   False   False   False
'System.Text.UTF8Encoding       False   False   False   False
'System.Text.UTF7Encoding       False   False   False   False
'System.Text.ASCIIEncoding      True    True    True    True

C#
using System;
using System.Text;

public class SamplesASCIIEncoding  {

   public static void Main()  {

      // Display the value of IsAlwaysNormalized for every normalization form.
      Console.WriteLine( "{0,30} FormC   FormKC  FormD   FormKD", "" );
      PrintNormalization( new UTF32Encoding( true, true, true ) );
      PrintNormalization( new UnicodeEncoding( true, true, true ) );
      PrintNormalization( new UTF8Encoding( true, true ) );
      PrintNormalization( new UTF7Encoding( true ) );
      PrintNormalization( new ASCIIEncoding() );

   }

   public static void PrintNormalization( Encoding enc )  {
      Console.Write( "{0,-30} ", enc.ToString() );
      Console.Write( "{0,-8}", enc.IsAlwaysNormalized( NormalizationForm.FormC ) );
      Console.Write( "{0,-8}", enc.IsAlwaysNormalized( NormalizationForm.FormKC ) );
      Console.Write( "{0,-8}", enc.IsAlwaysNormalized( NormalizationForm.FormD ) );
      Console.WriteLine( "{0,-8}", enc.IsAlwaysNormalized( NormalizationForm.FormKD ) );
   }


}


/* 
This code produces the following output.

                               FormC   FormKC  FormD   FormKD
System.Text.UTF32Encoding      False   False   False   False
System.Text.UnicodeEncoding    False   False   False   False
System.Text.UTF8Encoding       False   False   False   False
System.Text.UTF7Encoding       False   False   False   False
System.Text.ASCIIEncoding      True    True    True    True

*/

Visual C++
using namespace System;
using namespace System::Text;
void PrintNormalization( Encoding^ enc );
int main()
{

   // Display the value of IsAlwaysNormalized for every normalization form.
   Console::WriteLine( "{0,30} FormC   FormKC  FormD   FormKD", "" );
   PrintNormalization( gcnew UTF32Encoding( true,true,true ) );
   PrintNormalization( gcnew UnicodeEncoding( true,true,true ) );
   PrintNormalization( gcnew UTF8Encoding( true,true ) );
   PrintNormalization( gcnew UTF7Encoding( true ) );
   PrintNormalization( gcnew ASCIIEncoding );
}

void PrintNormalization( Encoding^ enc )
{
   Console::Write( "{0,-30} ", enc );
   Console::Write( "{0,-8}", enc->IsAlwaysNormalized( NormalizationForm::FormC ) );
   Console::Write( "{0,-8}", enc->IsAlwaysNormalized( NormalizationForm::FormKC ) );
   Console::Write( "{0,-8}", enc->IsAlwaysNormalized( NormalizationForm::FormD ) );
   Console::WriteLine( "{0,-8}", enc->IsAlwaysNormalized( NormalizationForm::FormKD ) );
}

/* 
This code produces the following output.

                               FormC   FormKC  FormD   FormKD
System.Text.UTF32Encoding      False   False   False   False
System.Text.UnicodeEncoding    False   False   False   False
System.Text.UTF8Encoding       False   False   False   False
System.Text.UTF7Encoding       False   False   False   False
System.Text.ASCIIEncoding      True    True    True    True

*/

Windows 7, Windows Vista, Windows XP SP2, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP Starter Edition, Windows Server 2008 R2, Windows Server 2008, Windows Server 2003, Windows Server 2000 SP4, Windows Millennium Edition, Windows 98

The .NET Framework and .NET Compact Framework do not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.

.NET Framework

Supported in: 3.5, 3.0, 2.0
Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
Processing
© 2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement | Site Feedback
Page view tracker