本文由机器翻译。若要查看英语原文,请勾选“英语”复选框。 也可将鼠标指针移到文本上,在弹出窗口中显示英语原文。
翻译
英语

String.Normalize 方法 (NormalizationForm)

 

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合指定的 Unicode 范式。

命名空间:   System
程序集:  mscorlib(位于 mscorlib.dll)

public string Normalize(
	NormalizationForm normalizationForm
)

参数

normalizationForm
Type: System.Text.NormalizationForm

一个 Unicode 范式。

返回值

Type: System.String

一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合由 normalizationForm 参数指定的范式。

Exception Condition
ArgumentException

当前实例包含无效的 Unicode 字符。

某些 Unicode 字符具有多个等效的二进制表示形式包含组合在一起的集和/或复合的 Unicode 字符。 搜索、 排序、 匹配和其他操作将增加复杂性存在多种表示形式的单个字符。

Unicode 标准定义了名为返回在给定的任何字符的等效的二进制表示的一种二进制表示形式的规范化的过程。 可以使用多种算法,称为范式,遵循不同的规则,执行规范化。 .NET Framework 支持的四个正常化窗体 (C、 D、 KC 和 KD) 由 Unicode 标准定义。当两个字符串都包含在同一范式时,则可以按使用序号比较比较。

若要将标准化并比较两个字符串,执行以下操作︰

  1. 获取要从输入源,例如文件或用户输入的设备进行比较的字符串。

  2. 调用 Normalize(NormalizationForm) 方法,以将字符串规范化为指定的范式。

  3. 若要比较两个字符串,调用了一个支持的序号字符串比较,如方法 Compare(String, String, StringComparison) 方法,并提供的值为 StringComparison.OrdinalStringComparison.OrdinalIgnoreCase 作为 StringComparison 参数。 若要排序的规范化的字符串数组,请将传递 comparerStringComparer.OrdinalStringComparer.OrdinalIgnoreCase 的适当重载 Array.Sort

  4. 发出已排序的输出基于上一步所述的顺序中的字符串。

有关受支持的 Unicode 范式的说明,请参阅 System.Text.NormalizationForm

调用函数说明:

IsNormalized 方法将返回 false 一旦遇到第一个非规范化字符在字符串中的。 因此,如果一个字符串包含无效的 Unicode 字符后, 跟非规范化字符 Normalize 方法可能会引发 ArgumentException 虽然 IsNormalized 返回 false

下面的示例将连接到每个四个正常化窗体的字符串,确认字符串已规范化为指定的范式,然后列出规范化字符串中的代码数据点。

using System;
using System.Text;

class Example
{
    public static void Main() 
    {
       // Character c; combining characters acute and cedilla; character 3/4
       string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
       string s2 = null;
       string divider = new String('-', 80);
       divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);

       Show("s1", s1);
       Console.WriteLine();
       Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
       Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
       Console.WriteLine("U+0327 = COMBINING CEDILLA");
       Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
       Console.WriteLine(divider);

       Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", 
                                    s1.IsNormalized());
       Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", 
                                    s1.IsNormalized(NormalizationForm.FormC));
       Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", 
                                    s1.IsNormalized(NormalizationForm.FormD));
       Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", 
                                    s1.IsNormalized(NormalizationForm.FormKC));
       Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", 
                                    s1.IsNormalized(NormalizationForm.FormKD));

       Console.WriteLine(divider);

       Console.WriteLine("Set string s2 to each normalized form of string s1.");
       Console.WriteLine();
       Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
       Console.WriteLine("U+0033 = DIGIT THREE");
       Console.WriteLine("U+2044 = FRACTION SLASH");
       Console.WriteLine("U+0034 = DIGIT FOUR");
       Console.WriteLine(divider);

       s2 = s1.Normalize();
       Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
       Console.WriteLine(s2.IsNormalized());
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormC);
       Console.Write("B2) Is s2 normalized to Form C?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormD);
       Console.Write("B3) Is s2 normalized to Form D?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormKC);
       Console.Write("B4) Is s2 normalized to Form KC?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
       Show("s2", s2);
       Console.WriteLine();

       s2 = s1.Normalize(NormalizationForm.FormKD);
       Console.Write("B5) Is s2 normalized to Form KD?: ");
       Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
       Show("s2", s2);
       Console.WriteLine();
    }

    private static void Show(string title, string s)
    {
       Console.Write("Characters in string {0} = ", title);
       foreach(short x in s) {
           Console.Write("{0:X4} ", x);
       }
       Console.WriteLine();
    }
}
/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/

.NET Framework
自 2.0 起可用
返回页首
显示: