本文由机器翻译。若要查看英语原文,请勾选“英语”复选框。 也可将鼠标指针移到文本上,在弹出窗口中显示英语原文。
翻译
英语

String.Normalize 方法 ()

 

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合 Unicode 范式 C。

命名空间:   System
程序集:  mscorlib(位于 mscorlib.dll)

public string Normalize()

返回值

Type: System.String

一个新的规范化字符串,其文本值与此字符串相同,但其二进制表示形式符合范式 C。

Exception Condition
ArgumentException

当前实例包含无效的 Unicode 字符。

某些 Unicode 字符具有多个等效二进制表示形式包含集组合在一起的和/或复合 Unicode 字符。 例如,下面的代码点的任何可表示字母"ắ":

  • U + 1EAF

  • U + 0103 U + 0301

  • U + 0061 U + 0306 U + 0301

搜索、 排序、 匹配和其他操作将增加复杂性存在单个字符的多个表示形式。

Unicode 标准定义了一个过程调用返回在给定的任何字符的等效的二进制表示的一种二进制表示形式的规范化。 可以使用几种算法,称为范式,遵循不同的规则,执行规范化。 .NET Framework 支持四种范式 (C、 D、 KC 和 KD) 定义的 Unicode 标准。当两个字符串以相同的标准化形式表示时,通过使用来比较序号比较。

若要进行规范化,并比较两个字符串,执行以下操作︰

  1. 获取要从输入源,如文件或单一用户输入的设备进行比较的字符串。

  2. 调用Normalize()方法以将字符串规范化为范式 c。

  3. 若要比较两个字符串,调用方法,支持的序号字符串比较,如Compare(String, String, StringComparison)方法,提供的一个值StringComparison.OrdinalStringComparison.OrdinalIgnoreCase作为StringComparison自变量。 若要排序的规范化字符串数组,请将传递comparerStringComparer.OrdinalStringComparer.OrdinalIgnoreCase的相应重载Array.Sort

  4. 发出基于上一步所述的顺序排序输出中的字符串。

有关受支持的 Unicode 范式的说明,请参阅System.Text.NormalizationForm

调用函数说明:

IsNormalized方法返回false只要它遇到第一个非规范化字符串中的字符。 因此,如果字符串包含无效的 Unicode 字符后, 跟非规范化字符Normalize方法会引发ArgumentException尽管IsNormalized返回false

下面的示例将每个规范化这四种形式的字符串,确认字符串已规范化为指定的范式,则列出规范化字符串中的码位。

using System;
using System.Text;

class Example
{
    public static void Main() 
    {
       // Character c; combining characters acute and cedilla; character 3/4
       string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
       string s2 = null;
       string divider = new String('-', 80);
       divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);

       Show("s1", s1);
       Console.WriteLine();
       Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
       Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
       Console.WriteLine("U+0327 = COMBINING CEDILLA");
       Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
       Console.WriteLine(divider);

       Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", 
                                    s1.IsNormalized());
       Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", 
                                    s1.IsNormalized(NormalizationForm.FormC));
       Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", 
                                    s1.IsNormalized(NormalizationForm.FormD));
       Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", 
                                    s1.IsNormalized(NormalizationForm.FormKC));
       Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", 
                                    s1.IsNormalized(NormalizationForm.FormKD));

       Console.WriteLine(divider);

       Console.WriteLine("Set string s2 to each normalized form of string s1.");
       Console.WriteLine();
       Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
       Console.WriteLine("U+0033 = DIGIT THREE");
       Console.WriteLine("U+2044 = FRACTION SLASH");
       Console.WriteLine("U+0034 = DIGIT FOUR");
       Console.WriteLine(divider);

       s2 = s1.Normalize();
       Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
       Console.WriteLine(s2.IsNormalized());
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormC);
       Console.Write("B2) Is s2 normalized to Form C?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormD);
       Console.Write("B3) Is s2 normalized to Form D?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormKC);
       Console.Write("B4) Is s2 normalized to Form KC?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormKD);
       Console.Write("B5) Is s2 normalized to Form KD?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
       Show("s2", s2);
       Console.WriteLine();
    }

    private static void Show(string title, string s)
    {
       Console.Write("Characters in string {0} = ", title);
       foreach(short x in s) {
           Console.Write("{0:X4} ", x);
       }
       Console.WriteLine();
    }
}
/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/

.NET Framework
自 2.0 起可用
返回页首
显示: