Export (0) Print
Expand All

CharUnicodeInfo.GetUnicodeCategory Method (Char)

Gets the Unicode category of the specified character.

Namespace:  System.Globalization
Assemblies:   System.Globalization (in System.Globalization.dll)
  mscorlib (in mscorlib.dll)

public static UnicodeCategory GetUnicodeCategory(
	char ch
)

Parameters

ch
Type: System.Char

The Unicode character for which to get the Unicode category.

Return Value

Type: System.Globalization.UnicodeCategory
A UnicodeCategory value indicating the category of the specified character.

The Unicode characters are divided into categories. A character's category is one of its properties. For example, a character might be an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a connector punctuation, a math symbol, or a currency symbol. The UnicodeCategory class returns the category of a Unicode character. For more information on Unicode characters, see the Unicode Standard.

The GetUnicodeCategory method assumes that ch corresponds to a single linguistic character and returns its category. This means that, for surrogate pairs, it returns UnicodeCategory.Surrogate instead of the category to which the surrogate belongs. For example, the Ugaritic alphabet occupies code points U+10380 to U+1039F. The following example uses the ConvertFromUtf32 method to instantiate a string that represents UGARITIC LETTER ALPA (U+10380), which is the first letter of the Ugaritic alphabet. As the output from the example shows, the IsNumber(Char) method returns false if it is passed either the high surrogate or the low surrogate of this character.

int utf32 = 0x10380;       // UGARITIC LETTER ALPA 
string surrogate = Char.ConvertFromUtf32(utf32);
foreach (var ch in surrogate)
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(ch), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(ch));
// The example displays the following output: 
//       U+D800: Surrogate 
//       U+DF80: Surrogate

Note that CharUnicodeInfo.GetUnicodeCategory does not always return the same UnicodeCategory value as the Char.GetUnicodeCategory method when passed a particular character as a parameter. The CharUnicodeInfo.GetUnicodeCategory method is designed to reflect the current version of the Unicode standard. In contrast, although the Char.GetUnicodeCategory method usually reflects the current version of the Unicode standard, it might return a character's category based on a previous version of the standard, or it might return a category that differs from the current standard to preserve backward compatibility.

The following code example shows the values returned by each method for different types of characters.

using System;
using System.Globalization;

public class SamplesCharUnicodeInfo  {

   public static void Main()  {

      Console.WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );

      Console.Write( "U+0061 LATIN SMALL LETTER A            " );
      PrintProperties( 'a' );

      Console.Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
      PrintProperties( '\u0393' );

      Console.Write( "U+0039 DIGIT NINE                      " );
      PrintProperties( '9' );

      Console.Write( "U+00B2 SUPERSCRIPT TWO                 " );
      PrintProperties( '\u00B2' );

      Console.Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
      PrintProperties( '\u00BC' );

      Console.Write( "U+0BEF TAMIL DIGIT NINE                " );
      PrintProperties( '\u0BEF' );

      Console.Write( "U+0BF0 TAMIL NUMBER TEN                " );
      PrintProperties( '\u0BF0' );

      Console.Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
      PrintProperties( '\u0F33' );

      Console.Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
      PrintProperties( '\u2788' );

   }

   public static void PrintProperties( char c )  {
      Console.Write( " {0,-3}", c );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( c ) );
      Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( c ) );
   }

}


/*
This code produces the following output.  Some characters might not display at the console.

                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       \u0393   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  \u00B2   2     2     2    OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      \u00BC   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 \u0BEF   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 \u0BF0   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          \u0F33   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    \u2788   9     9     -1   OtherNumber

*/

.NET Framework

Supported in: 4.6, 4.5, 4, 3.5, 3.0, 2.0

.NET Framework Client Profile

Supported in: 4, 3.5 SP1

Portable Class Library

Supported in: Portable Class Library

.NET for Windows Store apps

Supported in: Windows 8

Supported in: Windows Phone 8.1

Supported in: Windows Phone Silverlight 8.1

Supported in: Windows Phone Silverlight 8

Windows Phone 8.1, Windows Phone 8, Windows 8.1, Windows Server 2012 R2, Windows 8, Windows Server 2012, Windows 7, Windows Vista SP2, Windows Server 2008 (Server Core Role not supported), Windows Server 2008 R2 (Server Core Role supported with SP1 or later; Itanium not supported)

The .NET Framework does not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.

Show:
© 2014 Microsoft