Normalisation

Article
08/21/2008

Mise à jour : novembre 2007

Certains caractères Unicode ont plusieurs représentations binaires équivalentes : des combinaisons de caractères Unicode ou des caractères Unicode composites. La représentation multiple pour un même caractère complique la recherche, le tri, la mise en correspondance et les autres opérations.

La norme Unicode définit un processus appelé normalisation, celui qui retourne une seule représentation binaire pour toutes les représentations binaires équivalentes d'un caractère. La normalisation peut utiliser plusieurs algorithmes, appelés formulaires de normalisation, qui sont régis par différentes règles. Le .NET Framework prend actuellement en charge les formulaires de normalisation Unicode C, D, KC et KD.

Remarque :
Deux chaînes normalisées en un même formulaire de normalisation peuvent être comparées à l'aide d'une comparaison ordinale ; c'est-à-dire une comparaison binaire de caractère à caractère.

Pour plus d'informations sur les formulaires de normalisation pris en charge par le .NET Framework, consultez NormalizationForm. Pour plus d'informations sur la normalisation, l'équivalence et la décomposition des caractères, consultez l'annexe 15 de la norme Unicode traitant des formulaires de normalisation Unicode, sur le site Web Unicode.

Normalisation d'une chaîne

L'application doit utiliser String.Normalize d'un objet String pour retourner une nouvelle chaîne normalisée par défaut en formulaire de normalisation C. Elle peut aussi utiliser la méthode String.Normalize d'un objet String qui spécifie une valeur NormalizationForm pour retourner une nouvelle chaîne normalisée spécifiquement en formulaire de normalisation C, D, KC ou KD.

Tester pour déterminer si une chaîne est normalisée

L'application peut utiliser la méthode String.IsNormalized d'un objet String pour déterminer si la valeur de chaîne de l'objet est normalisée en formulaire de normalisation C. Elle peut aussi utiliser la méthode String.IsNormalized d'un objet String qui spécifie une valeur NormalizationForm particulière pour déterminer si la valeur de chaîne de l'objet est normalisée spécifiquement en formulaire de normalisation C, D, KC ou KD.

Exemple

L'exemple de code suivant illustre les méthodes IsNormalized et Normalize. L'exemple de code vérifie si une chaîne d'origine est représentée dans l'un des quatre formulaires de normalisation, crée une version de la chaîne d'origine dans chacun des formulaires de normalisation, vérifie si chaque chaîne normalisée est représentée dans le formulaire de normalisation prévu, puis affiche le point de code hexadécimal de chaque caractère dans chaque chaîne normalisée.

' This example demonstrates the String.Normalize method
'                       and the String.IsNormalized method
Imports System
Imports System.Text
Imports Microsoft.VisualBasic

Class Sample
   Public Shared Sub Main()
      ' Character c; combining characters acute and cedilla; character 3/4
      Dim s1 = New [String](New Char() {ChrW(&H0063), ChrW(&H0301), ChrW(&H0327), ChrW(&H00BE)})
      Dim s2 As String = Nothing
      Dim divider = New [String]("-"c, 80)
      divider = [String].Concat(Environment.NewLine, divider, Environment.NewLine)

      Try
         Show("s1", s1)
         Console.WriteLine()
         Console.WriteLine("U+0063 = LATIN SMALL LETTER C")
         Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT")
         Console.WriteLine("U+0327 = COMBINING CEDILLA")
         Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS")

         Console.WriteLine(divider)

         Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", s1.IsNormalized())
         Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", s1.IsNormalized(NormalizationForm.FormC))
         Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", s1.IsNormalized(NormalizationForm.FormD))
         Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", s1.IsNormalized(NormalizationForm.FormKC))
         Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", s1.IsNormalized(NormalizationForm.FormKD))

         Console.WriteLine(divider)

         Console.WriteLine("Set string s2 to each normalized form of string s1.")
         Console.WriteLine()
         Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE")
         Console.WriteLine("U+0033 = DIGIT THREE")
         Console.WriteLine("U+2044 = FRACTION SLASH")
         Console.WriteLine("U+0034 = DIGIT FOUR")
         Console.WriteLine(divider)

         s2 = s1.Normalize()
         Console.Write("B1) Is s2 normalized to the default form (Form C)?: ")
         Console.WriteLine(s2.IsNormalized())
         Show("s2", s2)
         Console.WriteLine()

         s2 = s1.Normalize(NormalizationForm.FormC)
         Console.Write("B2) Is s2 normalized to Form C?: ")
         Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC))
         Show("s2", s2)
         Console.WriteLine()

         s2 = s1.Normalize(NormalizationForm.FormD)
         Console.Write("B3) Is s2 normalized to Form D?: ")
         Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD))
         Show("s2", s2)
         Console.WriteLine()

         s2 = s1.Normalize(NormalizationForm.FormKC)
         Console.Write("B4) Is s2 normalized to Form KC?: ")
         Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC))
         Show("s2", s2)
         Console.WriteLine()

         s2 = s1.Normalize(NormalizationForm.FormKD)
         Console.Write("B5) Is s2 normalized to Form KD?: ")
         Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD))
         Show("s2", s2)
         Console.WriteLine()

      Catch e As Exception
         Console.WriteLine(e.Message)
      End Try
   End Sub 'Main

   Private Shared Sub Show(title As String, s As String)
      Console.Write("Characters in string {0} = ", title)
      Dim x As Char
      For Each x In  s.ToCharArray()
         Console.Write("{0:X4} ", AscW(x))
      Next x
      Console.WriteLine()
   End Sub 'Show
End Class 'Sample
'
'This example produces the following results:
'
'Characters in string s1 = 0063 0301 0327 00BE
'
'U+0063 = LATIN SMALL LETTER C
'U+0301 = COMBINING ACUTE ACCENT
'U+0327 = COMBINING CEDILLA
'U+00BE = VULGAR FRACTION THREE QUARTERS
'
'--------------------------------------------------------------------------------
'
'A1) Is s1 normalized to the default form (Form C)?: False
'A2) Is s1 normalized to Form C?:  False
'A3) Is s1 normalized to Form D?:  False
'A4) Is s1 normalized to Form KC?: False
'A5) Is s1 normalized to Form KD?: False
'
'--------------------------------------------------------------------------------
'
'Set string s2 to each normalized form of string s1.
'
'U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
'U+0033 = DIGIT THREE
'U+2044 = FRACTION SLASH
'U+0034 = DIGIT FOUR
'
'--------------------------------------------------------------------------------
'
'B1) Is s2 normalized to the default form (Form C)?: True
'Characters in string s2 = 1E09 00BE
'
'B2) Is s2 normalized to Form C?: True
'Characters in string s2 = 1E09 00BE
'
'B3) Is s2 normalized to Form D?: True
'Characters in string s2 = 0063 0327 0301 00BE
'
'B4) Is s2 normalized to Form KC?: True
'Characters in string s2 = 1E09 0033 2044 0034
'
'B5) Is s2 normalized to Form KD?: True
'Characters in string s2 = 0063 0327 0301 0033 2044 0034
'

// This example demonstrates the String.Normalize method
//                       and the String.IsNormalized method

using System;
using System.Text;

class Sample 
{
    public static void Main() 
    {
// Character c; combining characters acute and cedilla; character 3/4
    string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
    string s2 = null;
    string divider = new String('-', 80);
    divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);

    try 
    {
    Show("s1", s1);
    Console.WriteLine();
    Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
    Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
    Console.WriteLine("U+0327 = COMBINING CEDILLA");
    Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
    Console.WriteLine(divider);

    Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", 
                                 s1.IsNormalized());
    Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", 
                                 s1.IsNormalized(NormalizationForm.FormC));
    Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", 
                                 s1.IsNormalized(NormalizationForm.FormD));
    Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", 
                                 s1.IsNormalized(NormalizationForm.FormKC));
    Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", 
                                 s1.IsNormalized(NormalizationForm.FormKD));

    Console.WriteLine(divider);

    Console.WriteLine("Set string s2 to each normalized form of string s1.");
    Console.WriteLine();
    Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
    Console.WriteLine("U+0033 = DIGIT THREE");
    Console.WriteLine("U+2044 = FRACTION SLASH");
    Console.WriteLine("U+0034 = DIGIT FOUR");
    Console.WriteLine(divider);

    s2 = s1.Normalize();
    Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
    Console.WriteLine(s2.IsNormalized());
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormC);
    Console.Write("B2) Is s2 normalized to Form C?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormD);
    Console.Write("B3) Is s2 normalized to Form D?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormKC);
    Console.Write("B4) Is s2 normalized to Form KC?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormKD);
    Console.Write("B5) Is s2 normalized to Form KD?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
    Show("s2", s2);
    Console.WriteLine();
    }

    catch (Exception e) 
        {
        Console.WriteLine(e.Message);
        }
    }

    private static void Show(string title, string s)
    {
    Console.Write("Characters in string {0} = ", title);
    foreach(short x in s.ToCharArray())
        {
        Console.Write("{0:X4} ", x);
        }
    Console.WriteLine();
    }
}
/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/

// This example demonstrates the String.Normalize method
//                       and the String.IsNormalized method
using namespace System;
using namespace System::Text;
void Show( String^ title, String^ s )
{
   Console::Write( "Characters in string {0} = ", title );
   System::Collections::IEnumerator^ myEnum = s->ToCharArray()->GetEnumerator();
   while ( myEnum->MoveNext() )
   {

      /*) * __try_cast < Char * > ( myEnum -> Current );*/
      int x;
      Console::Write( "{0:X4} ", x );
   }

   Console::WriteLine();
}

int main()
{

   // Character c; combining characters acute and cedilla; character 3/4
   array<Char>^temp0 = {L'c',L'\u0301',L'\u0327',L'\u00BE'};
   String^ s1 = gcnew String( temp0 );
   String^ s2 = nullptr;
   String^ divider = gcnew String( '-',80 );
   divider = String::Concat( Environment::NewLine, divider, Environment::NewLine );
   try
   {
      Show( "s1", s1 );
      Console::WriteLine();
      Console::WriteLine( "U+0063 = LATIN SMALL LETTER C" );
      Console::WriteLine( "U+0301 = COMBINING ACUTE ACCENT" );
      Console::WriteLine( "U+0327 = COMBINING CEDILLA" );
      Console::WriteLine( "U+00BE = VULGAR FRACTION THREE QUARTERS" );
      Console::WriteLine( divider );
      Console::WriteLine( "A1) Is s1 normalized to the default form (Form C)?: {0}", s1->IsNormalized() );
      Console::WriteLine( "A2) Is s1 normalized to Form C?:  {0}", s1->IsNormalized( NormalizationForm::FormC ) );
      Console::WriteLine( "A3) Is s1 normalized to Form D?:  {0}", s1->IsNormalized( NormalizationForm::FormD ) );
      Console::WriteLine( "A4) Is s1 normalized to Form KC?: {0}", s1->IsNormalized( NormalizationForm::FormKC ) );
      Console::WriteLine( "A5) Is s1 normalized to Form KD?: {0}", s1->IsNormalized( NormalizationForm::FormKD ) );
      Console::WriteLine( divider );
      Console::WriteLine( "Set string s2 to each normalized form of string s1." );
      Console::WriteLine();
      Console::WriteLine( "U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE" );
      Console::WriteLine( "U+0033 = DIGIT THREE" );
      Console::WriteLine( "U+2044 = FRACTION SLASH" );
      Console::WriteLine( "U+0034 = DIGIT FOUR" );
      Console::WriteLine( divider );
      s2 = s1->Normalize();
      Console::Write( "B1) Is s2 normalized to the default form (Form C)?: " );
      Console::WriteLine( s2->IsNormalized() );
      Show( "s2", s2 );
      Console::WriteLine();
      s2 = s1->Normalize( NormalizationForm::FormC );
      Console::Write( "B2) Is s2 normalized to Form C?: " );
      Console::WriteLine( s2->IsNormalized( NormalizationForm::FormC ) );
      Show( "s2", s2 );
      Console::WriteLine();
      s2 = s1->Normalize( NormalizationForm::FormD );
      Console::Write( "B3) Is s2 normalized to Form D?: " );
      Console::WriteLine( s2->IsNormalized( NormalizationForm::FormD ) );
      Show( "s2", s2 );
      Console::WriteLine();
      s2 = s1->Normalize( NormalizationForm::FormKC );
      Console::Write( "B4) Is s2 normalized to Form KC?: " );
      Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKC ) );
      Show( "s2", s2 );
      Console::WriteLine();
      s2 = s1->Normalize( NormalizationForm::FormKD );
      Console::Write( "B5) Is s2 normalized to Form KD?: " );
      Console::WriteLine( s2->IsNormalized( NormalizationForm::FormKD ) );
      Show( "s2", s2 );
      Console::WriteLine();
   }
   catch ( Exception^ e ) 
   {
      Console::WriteLine( e->Message );
   }

}

/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/

// This example demonstrates the String.Normalize method
//                       and the String.IsNormalized method
import System.*;
import System.Text.*;

class Sample
{
    public static void main(String[] args)
    {
        // Character c; combining characters acute and cedilla; character 3/4
        String s1 = new String(new char[] { '\u0063', '\u0301', '\u0327', 
            '\u00BE' });
        String s2 = null;
        String divider = new String('-', 80);
        divider = String.Concat(Environment.get_NewLine(), divider, 
            Environment.get_NewLine());

        try {
            Show("s1", s1);
            Console.WriteLine();
            Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
            Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
            Console.WriteLine("U+0327 = COMBINING CEDILLA");
            Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
            Console.WriteLine(divider);

            Console.WriteLine("A1) Is s1 normalized to the default form " 
                + "(Form C)?: {0}", System.Convert.ToString(s1.IsNormalized()));
            Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", 
                System.Convert.ToString(s1.
                IsNormalized(NormalizationForm.FormC)));
            Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", 
                System.Convert.ToString(s1.
                IsNormalized(NormalizationForm.FormD)));
            Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", 
                System.Convert.ToString(s1.
                IsNormalized(NormalizationForm.FormKC)));
            Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", 
                System.Convert.ToString(s1.
                IsNormalized(NormalizationForm.FormKD)));

            Console.WriteLine(divider);

            Console.WriteLine("Set string s2 to each normalized form of " 
                + "string s1.");
            Console.WriteLine();
            Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA " 
                + "AND ACUTE");
            Console.WriteLine("U+0033 = DIGIT THREE");
            Console.WriteLine("U+2044 = FRACTION SLASH");
            Console.WriteLine("U+0034 = DIGIT FOUR");
            Console.WriteLine(divider);

            s2 = s1.Normalize();
            Console.Write("B1) Is s2 normalized to the default form " 
                + "(Form C)?: ");
            Console.WriteLine(s2.IsNormalized());
            Show("s2", s2);
            Console.WriteLine();

            s2 = s1.Normalize(NormalizationForm.FormC);
            Console.Write("B2) Is s2 normalized to Form C?: ");
            Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
            Show("s2", s2);
            Console.WriteLine();

            s2 = s1.Normalize(NormalizationForm.FormD);
            Console.Write("B3) Is s2 normalized to Form D?: ");
            Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
            Show("s2", s2);
            Console.WriteLine();

            s2 = s1.Normalize(NormalizationForm.FormKC);
            Console.Write("B4) Is s2 normalized to Form KC?: ");
            Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
            Show("s2", s2);
            Console.WriteLine();

            s2 = s1.Normalize(NormalizationForm.FormKD);
            Console.Write("B5) Is s2 normalized to Form KD?: ");
            Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
            Show("s2", s2);
            Console.WriteLine();
        }
        catch (System.Exception e) {
            Console.WriteLine(e.get_Message());
        }
    } //main

    private static void Show(String title, String s)
    {
        Console.Write("Characters in string {0} = ", title);
        char myCharArray[] = s.ToCharArray();
        for (int iCtr = 0; iCtr < myCharArray.length; iCtr++) {
            char c = myCharArray[iCtr];
            Console.Write(((System.Int32)c).ToString("X4") + " ");
        }
        Console.WriteLine();
    } //Show
} //Sample
/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/

Voir aussi

Concepts

Normalisation et tri

Partager via