Export (0) Print
Expand All

CharUnicodeInfo.GetUnicodeCategory Method (String, Int32)

Updated: December 2010

Gets the Unicode category of the character at the specified index of the specified string.

Namespace:  System.Globalization
Assembly:  mscorlib (in mscorlib.dll)

public static UnicodeCategory GetUnicodeCategory(
	string s,
	int index
)

Parameters

s
Type: System.String

The String containing the Unicode character for which to get the Unicode category.

index
Type: System.Int32

The index of the Unicode character for which to get the Unicode category.

Return Value

Type: System.Globalization.UnicodeCategory
A UnicodeCategory value indicating the category of the character at the specified index of the specified string.

ExceptionCondition
ArgumentNullException

s is null.

ArgumentOutOfRangeException

index is outside the range of valid indexes in s.

The Unicode characters are divided into categories. A character's category is one of its properties. For example, a character might be an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a connector punctuation, a math symbol, or a currency symbol. The UnicodeCategory class returns the category of a Unicode character. For more information on Unicode characters, see the Unicode Standard.

If the Char object at position index is the first character of a valid surrogate pair, the GetUnicodeCategory method returns the Unicode category of the surrogate pair instead of UnicodeCategory.Surrogate. For example, the Ugaritic alphabet occupies code points U+10380 to U+1039F. The following example uses the ConvertFromUtf32 method to instantiate a string that represents UGARITIC LETTER ALPA (U+10380), which is the first letter of the Ugaritic alphabet. As the output from the example shows, the GetUnicodeCategory method returns UnicodeCategory.OtherLetter if it is passed the high surrogate of this character, which indicates that it considers the surrogate pair. However, if it is passed the low surrogate, it considers only the low surrogate in isolation and returns UnicodeCategory.Surrogate.

int utf32 = 0x10380;       // UGARITIC LETTER ALPA
string surrogate = Char.ConvertFromUtf32(utf32);
for (int ctr = 0; ctr < surrogate.Length; ctr++)
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(surrogate[ctr]), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(surrogate, ctr));
// The example displays the following output: 
//       U+D800: OtherLetter 
//       U+DF80: Surrogate      

Note that CharUnicodeInfo.GetUnicodeCategory method does not always return the same UnicodeCategory value as the Char.GetUnicodeCategory method when passed a particular character as a parameter. The CharUnicodeInfo.GetUnicodeCategory method is designed to reflect the current version of the Unicode standard. In contrast, although the Char.GetUnicodeCategory method usually reflects the current version of the Unicode standard, it might return a character's category based on a previous version of the standard, or it might return a category that differs from the current standard to preserve backward compatibility.

The following code example shows the values returned by each method for different types of characters.

using System;
using System.Globalization;

public class SamplesCharUnicodeInfo  {

   public static void Main()  {

      // The String to get information for.
      String s = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788";
      Console.WriteLine( "String: {0}", s );

      // Print the values for each of the characters in the string.
      Console.WriteLine( "index c  Num   Dig   Dec   UnicodeCategory" );
      for ( int i = 0; i < s.Length; i++ )  {
         Console.Write( "{0,-5} {1,-3}", i, s[i] );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( s, i ) );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( s, i ) );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( s, i ) );
         Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( s, i ) );
      }

   }

}


/*
This code produces the following output.  Some characters might not display at the console.

String: a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788
index c  Num   Dig   Dec   UnicodeCategory
0     a   -1    -1    -1   LowercaseLetter
1     9   9     9     9    DecimalDigitNumber
2     \u0393   -1    -1    -1   UppercaseLetter
3     \u00B2   2     2     2    OtherNumber
4     \u00BC   0.25  -1    -1   OtherNumber
5     \u0BEF   9     9     9    DecimalDigitNumber
6     \u0BF0   10    -1    -1   OtherNumber
7     \u2788   9     9     -1   OtherNumber

*/

Windows 7, Windows Vista, Windows XP SP2, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP Starter Edition, Windows Server 2008 R2, Windows Server 2008, Windows Server 2003, Windows Server 2000 SP4, Windows Millennium Edition, Windows 98, Windows CE, Windows Mobile for Smartphone, Windows Mobile for Pocket PC, Xbox 360, Zune

The .NET Framework and .NET Compact Framework do not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.

.NET Framework

Supported in: 3.5, 3.0, 2.0

.NET Compact Framework

Supported in: 3.5, 2.0

XNA Framework

Supported in: 3.0, 2.0, 1.0

Date

History

Reason

December 2010

Added information about how the method handles surrogate pairs.

Information enhancement.

Community Additions

ADD
Show:
© 2014 Microsoft