本文由机器翻译。若要查看英语原文,请勾选“英语”复选框。 也可将鼠标指针移到文本上,在弹出窗口中显示英语原文。
翻译
英语

StringInfo 类

 

提供将字符串拆分为文本元素和循环访问这些文本元素的功能。

命名空间:   System.Globalization
程序集:  mscorlib(位于 mscorlib.dll)

System.Object
  System.Globalization.StringInfo

[SerializableAttribute]
[ComVisibleAttribute(true)]
public class StringInfo

名称说明
System_CAPS_pubmethodStringInfo()

初始化 StringInfo 类的新实例。

System_CAPS_pubmethodStringInfo(String)

StringInfo 类的新实例初始化为指定的字符串。

名称说明
System_CAPS_pubpropertyLengthInTextElements

获取在当前文本元素的数目 StringInfo 对象。

System_CAPS_pubpropertyString

获取或设置当前 StringInfo 对象的值。

名称说明
System_CAPS_pubmethodEquals(Object)

指示当前 StringInfo 对象是否与指定的对象相等。(覆盖 Object.Equals(Object)。)

System_CAPS_protmethodFinalize()

在垃圾回收将某一对象回收前允许该对象尝试释放资源并执行其他清理操作。(继承自 Object。)

System_CAPS_pubmethodGetHashCode()

计算当前的值的哈希代码 StringInfo 对象。(覆盖 Object.GetHashCode()。)

System_CAPS_pubmethodSystem_CAPS_staticGetNextTextElement(String)

获取指定字符串中第一个文本元素。

System_CAPS_pubmethodSystem_CAPS_staticGetNextTextElement(String, Int32)

获取指定的字符串的指定索引处的文本元素。

System_CAPS_pubmethodSystem_CAPS_staticGetTextElementEnumerator(String)

返回循环访问整个字符串的文本元素的枚举。

System_CAPS_pubmethodSystem_CAPS_staticGetTextElementEnumerator(String, Int32)

返回指定索引处开始,循环访问字符串中的文本元素的枚举。

System_CAPS_pubmethodGetType()

获取当前实例的 Type(继承自 Object。)

System_CAPS_protmethodMemberwiseClone()

创建当前 Object 的浅表副本。(继承自 Object。)

System_CAPS_pubmethodSystem_CAPS_staticParseCombiningCharacters(String)

返回的每个基字符、 高代理项或指定字符串中的控制字符的索引。

System_CAPS_pubmethodSubstringByTextElements(Int32)

检索文本元素的子字符串从当前 StringInfo 对象从指定的文本元素开始,一直到最后一个文本元素。

System_CAPS_pubmethodSubstringByTextElements(Int32, Int32)

检索文本元素的子字符串从当前 StringInfo 对象从指定的文本元素开始,一直到指定数量的文本元素。

System_CAPS_pubmethodToString()

返回表示当前对象的字符串。(继承自 Object。)

The .NET Framework defines a text element as a unit of text that is displayed as a single character, that is, a grapheme. A text element can be a base character, a surrogate pair, or a combining character sequence. The Unicode Standardhttp://go.microsoft.com/fwlink/?linkid=37123 defines a surrogate pair as a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate. The Unicode Standard defines a combining character sequence as a combination of a base character and one or more combining characters. A surrogate pair can represent a base character or a combining character.

The T:System.Globalization.StringInfo class enables you to work with a string as a series of textual elements rather than individual T:System.Char objects.

To instantiate a T:System.Globalization.StringInfo object that represents a specified string, you can do either of the following:

  • Call the M:System.Globalization.StringInfo.#ctor(System.String) constructor and pass it the string that the T:System.Globalization.StringInfo object is to represent as an argument.

  • Call the default M:System.Globalization.StringInfo.#ctor constructor, and assign the string that the T:System.Globalization.StringInfo object is to represent to the P:System.Globalization.StringInfo.String property.

You can work with the individual text elements in a string in two ways:

  • By enumerating each text element. To do this, you call the M:System.Globalization.StringInfo.GetTextElementEnumerator(System.String) method, and then repeatedly call the M:System.Globalization.TextElementEnumerator.MoveNext method on the returned T:System.Globalization.TextElementEnumerator object until the method returns false.

  • By calling the M:System.Globalization.StringInfo.ParseCombiningCharacters(System.String) method to retrieve an array that contains the starting index of each text element. You can then retrieve individual text elements by passing these indexes to the M:System.Globalization.StringInfo.SubstringByTextElements(System.Int32) method.

The following example illustrates both ways of working with the text elements in a string. It creates two strings:

  • strCombining, which is a string of Arabic characters that includes three text elements with multiple Char objects. The first text element is the base character ARABIC LETTER ALEF (U+-627) followed by ARABIC HAMZA BELOW (U+-655) and ARABIC KASRA (U+0650). The second text element is ARABIC LETTER HEH (U+0647) followed by ARABIC FATHA (U+-64E). The third text element is ARABIC LETTTER BEH (U+0628) followed by ARABIC DAMMATAN (U+064C).

  • strSurrogates, which is a string that includes three surrogate pairs: GREEK ACROPHONIC FIVE TALENTS (U+10148) from the Supplementary Multilingual Plane, U+20026 from the Supplementary Ideographic Plane, and U+F1001 from the private user area. The UTF-16 encoding of each character is a surrogate pair that consists of a high surrogate followed by a low surrogate.

Each string is parsed once by the M:System.Globalization.StringInfo.ParseCombiningCharacters(System.String) method and then by the M:System.Globalization.StringInfo.GetTextElementEnumerator(System.String) method. Both methods correctly parse the text elements in the two strings and display the results of the parsing operation.

using System;
using System.Globalization;

public class Example
{
   public static void Main()
   {
      // The Unicode code points specify Arabic base characters and 
      // combining character sequences.
      string strCombining = "\u0627\u0655\u0650\u064A\u0647\u064E" +
                            "\u0627\u0628\u064C";

      // The Unicode code points specify private surrogate pairs.
      string strSurrogates = Char.ConvertFromUtf32(0x10148) +
                             Char.ConvertFromUtf32(0x20026) + "a" +
                             Char.ConvertFromUtf32(0xF1001);

      EnumerateTextElements(strCombining);
      EnumerateTextElements(strSurrogates);
   }

   public static void EnumerateTextElements(string str)
   {
      // Get the Enumerator.
      TextElementEnumerator teEnum = null;      

      // Parse the string using the ParseCombiningCharacters method.
      Console.WriteLine("\nParsing with ParseCombiningCharacters:");
      int[] teIndices = StringInfo.ParseCombiningCharacters(str);

      for (int i = 0; i < teIndices.Length; i++) {
         if (i < teIndices.Length - 1)
            Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i, 
               teIndices[i], teIndices[i + 1] - 1, 
               ShowHexValues(str.Substring(teIndices[i], teIndices[i + 1] - 
                             teIndices[i])));
         else
            Console.WriteLine("Text Element {0} ({1}..{2})= {3}", i, 
               teIndices[i], str.Length - 1, 
               ShowHexValues(str.Substring(teIndices[i])));
      }
      Console.WriteLine();

      // Parse the string with the GetTextElementEnumerator method.
      Console.WriteLine("Parsing with TextElementEnumerator:");
      teEnum = StringInfo.GetTextElementEnumerator(str);

      int teCount = - 1;

      while (teEnum.MoveNext()) {
         // Displays the current element.
         // Both GetTextElement() and Current retrieve the current
         // text element. The latter returns it as an Object.
         teCount++;
         Console.WriteLine("Text Element {0} ({1}..{2})= {3}", teCount, 
            teEnum.ElementIndex, teEnum.ElementIndex + 
            teEnum.GetTextElement().Length - 1, ShowHexValues((string)(teEnum.Current)));
      }
   }

   private static string ShowHexValues(string s)
   {
      string hexString = "";
      foreach (var ch in s)
         hexString += String.Format("{0:X4} ", Convert.ToUInt16(ch));

      return hexString;
   }
}
// The example displays the following output:
//       Parsing with ParseCombiningCharacters:
//       Text Element 0 (0..2)= 0627 0655 0650
//       Text Element 1 (3..3)= 064A
//       Text Element 2 (4..5)= 0647 064E
//       Text Element 3 (6..6)= 0627
//       Text Element 4 (7..8)= 0628 064C
//       
//       Parsing with TextElementEnumerator:
//       Text Element 0 (0..2)= 0627 0655 0650
//       Text Element 1 (3..3)= 064A
//       Text Element 2 (4..5)= 0647 064E
//       Text Element 3 (6..6)= 0627
//       Text Element 4 (7..8)= 0628 064C
//       
//       Parsing with ParseCombiningCharacters:
//       Text Element 0 (0..1)= D800 DD48
//       Text Element 1 (2..3)= D840 DC26
//       Text Element 2 (4..4)= 0061
//       Text Element 3 (5..6)= DB84 DC01
//       
//       Parsing with TextElementEnumerator:
//       Text Element 0 (0..1)= D800 DD48
//       Text Element 1 (2..3)= D840 DC26
//       Text Element 2 (4..4)= 0061
//       Text Element 3 (5..6)= DB84 DC01

调用函数说明:

Internally, the methods of the T:System.Globalization.StringInfo class call the methods of the T:System.Globalization.CharUnicodeInfo class to determine character categories. Starting with the net_v462, character classification is based on The Unicode Standard, Version 8.0.0http://unicode.org/versions/Unicode8.0.0. For the net_v40_long through the net_v461, it is based on The Unicode Standard, Version 6.3.0http://www.unicode.org/versions/Unicode6.3.0/.

This example shows how to use the Overload:System.Globalization.StringInfo.GetTextElementEnumerator and M:System.Globalization.StringInfo.ParseCombiningCharacters(System.String) methods of the T:System.Globalization.StringInfo class to manipulate a string that contains surrogate and combining characters.

using System;
using System.Text;
using System.Globalization;

public sealed class App {
   static void Main() {
      // The string below contains combining characters.
      String s = "a\u0304\u0308bc\u0327";

      // Show each 'character' in the string.
      EnumTextElements(s);

      // Show the index in the string where each 'character' starts.
      EnumTextElementIndexes(s);
   }

   // Show how to enumerate each real character (honoring surrogates) in a string.
   static void EnumTextElements(String s) {
      // This StringBuilder holds the output results.
      StringBuilder sb = new StringBuilder();

      // Use the enumerator returned from GetTextElementEnumerator 
      // method to examine each real character.
      TextElementEnumerator charEnum = StringInfo.GetTextElementEnumerator(s);
      while (charEnum.MoveNext()) {
         sb.AppendFormat(
           "Character at index {0} is '{1}'{2}",
           charEnum.ElementIndex, charEnum.GetTextElement(),
           Environment.NewLine);
      }

      // Show the results.
      Console.WriteLine("Result of GetTextElementEnumerator:");
      Console.WriteLine(sb);
   }

   // Show how to discover the index of each real character (honoring surrogates) in a string.
   static void EnumTextElementIndexes(String s) {
      // This StringBuilder holds the output results.
      StringBuilder sb = new StringBuilder();

      // Use the ParseCombiningCharacters method to 
      // get the index of each real character in the string.
      Int32[] textElemIndex = StringInfo.ParseCombiningCharacters(s);

      // Iterate through each real character showing the character and the index where it was found.
      for (Int32 i = 0; i < textElemIndex.Length; i++) {
         sb.AppendFormat(
            "Character {0} starts at index {1}{2}",
            i, textElemIndex[i], Environment.NewLine);
      }

      // Show the results.
      Console.WriteLine("Result of ParseCombiningCharacters:");
      Console.WriteLine(sb);
   }
}

// This code produces the following output.
//
// Result of GetTextElementEnumerator:
// Character at index 0 is 'a-"'
// Character at index 3 is 'b'
// Character at index 4 is 'c,'
// 
// Result of ParseCombiningCharacters:
// Character 0 starts at index 0
// Character 1 starts at index 3
// Character 2 starts at index 4

通用 Windows 平台
自 8 起可用
.NET Framework
自 1.1 起可用
可移植类库
可移植 .NET 平台 中受支持
Silverlight
自 2.0 起可用
Windows Phone Silverlight
自 7.0 起可用
Windows Phone
自 8.1 起可用

此类型的所有公共静态(Visual Basic 中的 已共享 在 Visual Basic 中)成员都是线程安全的。不保证所有实例成员都是线程安全的。

返回页首
显示: