CharUnicodeInfo.GetUnicodeCategory Method (Char)

Microsoft Silverlight will reach end of support after October 2021. Learn more.

Updated: December 2010

Gets the Unicode category of the specified character.

Namespace:  System.Globalization
Assembly:  mscorlib (in mscorlib.dll)

Syntax

'Declaration
Public Shared Function GetUnicodeCategory ( _
    ch As Char _
) As UnicodeCategory
public static UnicodeCategory GetUnicodeCategory(
    char ch
)

Parameters

  • ch
    Type: System.Char
    The Unicode character for which to get the Unicode category.

Return Value

Type: System.Globalization.UnicodeCategory
A UnicodeCategory value indicating the category of the specified character.

Remarks

The Unicode characters are divided into categories. For example, a character might be an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a connector punctuation, a math symbol, or a currency symbol. The UnicodeCategory class returns the category of a Unicode character. For more information on Unicode characters, see the Unicode Standard.

The GetUnicodeCategory method assumes that ch corresponds to a single linguistic character and returns its category. This means that, for surrogate pairs, it returns UnicodeCategory.Surrogate instead of the category to which the surrogate belongs. For example, the Ugaritic alphabet occupies code points U+10380 to U+1039F. The following example instantiates a string that represents UGARITIC LETTER ALPA (U+10380), which is the first letter of the Ugaritic alphabet. As the output from the example shows, the IsNumber method returns false if it is passed either the high surrogate or the low surrogate of this character.

Note that CharUnicodeInfo.GetUnicodeCategory(Char) does not always return the same UnicodeCategory value as the Char.GetUnicodeCategory method when passed a particular character as a parameter. The CharUnicodeInfo.GetUnicodeCategory method is designed to reflect the current version of the Unicode standard. In contrast, although the Char.GetUnicodeCategory method usually reflects the current version of the Unicode standard, it might return a character's category based on a previous version of the standard, or it might return a category that differs from the current standard to preserve backward compatibility.

Examples

The following code example shows the values returned by each method for different types of characters.

Imports System.Globalization

Public Class Example

   Public Shared Sub Demo(ByVal outputBlock As System.Windows.Controls.TextBlock)

      outputBlock.Text &= "                                        c  Num   UnicodeCategory" & vbCrLf

      outputBlock.Text &= "U+0061 LATIN SMALL LETTER A            "
      PrintProperties(outputBlock, "a"c)

      outputBlock.Text &= "U+0393 GREEK CAPITAL LETTER GAMMA      "
      PrintProperties(outputBlock, ChrW(&H393))

      outputBlock.Text &= "U+0039 DIGIT NINE                      "
      PrintProperties(outputBlock, "9"c)

      outputBlock.Text &= "U+00B2 SUPERSCRIPT TWO                 "
      PrintProperties(outputBlock, ChrW(&HB2))

      outputBlock.Text &= "U+00BC VULGAR FRACTION ONE QUARTER     "
      PrintProperties(outputBlock, ChrW(&HBC))

      outputBlock.Text &= "U+0BEF TAMIL DIGIT NINE                "
      PrintProperties(outputBlock, ChrW(&HBEF))

      outputBlock.Text &= "U+0BF0 TAMIL NUMBER TEN                "
      PrintProperties(outputBlock, ChrW(&HBF0))

      outputBlock.Text &= "U+0F33 TIBETAN DIGIT HALF ZERO         "
      PrintProperties(outputBlock, ChrW(&HF33))

      outputBlock.Text &= "U+2788 CIRCLED SANS-SERIF DIGIT NINE   "
      PrintProperties(outputBlock, ChrW(&H2788))

   End Sub

   Public Shared Sub PrintProperties(ByVal outputBlock As System.Windows.Controls.TextBlock, ByVal c As Char)
      outputBlock.Text += String.Format(" {0,-3}", c)
      outputBlock.Text += String.Format(" {0,-5}", CharUnicodeInfo.GetNumericValue(c))
      outputBlock.Text += String.Format("{0}", CharUnicodeInfo.GetUnicodeCategory(c)) & vbCrLf
   End Sub
End Class 
' This example produces the following output.
'       U+0061 LATIN SMALL LETTER A             a   -1   LowercaseLetter
'       U+0393 GREEK CAPITAL LETTER GAMMA       G   -1   UppercaseLetter
'       U+0039 DIGIT NINE                       9   9    DecimalDigitNumber
'       U+00B2 SUPERSCRIPT TWO                  �   2    OtherNumber
'       U+00BC VULGAR FRACTION ONE QUARTER      �   0.25 OtherNumber
'       U+0BEF TAMIL DIGIT NINE                 ?  9    DecimalDigitNumber
'       U+0BF0 TAMIL NUMBER TEN                 ?   10   OtherNumber
'       U+0F33 TIBETAN DIGIT HALF ZERO          ? -0.5   OtherNumber
'       U+2788 CIRCLED SANS-SERIF DIGIT NINE    ?    9   OtherNumber
using System;
using System.Globalization;

public class Example
{
   public static void Demo(System.Windows.Controls.TextBlock outputBlock)
   {
      outputBlock.Text += "                                        c  Num   Dig   Dec   UnicodeCategory" + "\n";

      outputBlock.Text += "U+0061 LATIN SMALL LETTER A            ";
      PrintProperties(outputBlock, 'a');

      outputBlock.Text += "U+0393 GREEK CAPITAL LETTER GAMMA      ";
      PrintProperties(outputBlock, '\u0393');

      outputBlock.Text += "U+0039 DIGIT NINE                      ";
      PrintProperties(outputBlock, '9');

      outputBlock.Text += "U+00B2 SUPERSCRIPT TWO                 ";
      PrintProperties(outputBlock, '\u00B2');

      outputBlock.Text += "U+00BC VULGAR FRACTION ONE QUARTER     ";
      PrintProperties(outputBlock, '\u00BC');

      outputBlock.Text += "U+0BEF TAMIL DIGIT NINE                ";
      PrintProperties(outputBlock, '\u0BEF');

      outputBlock.Text += "U+0BF0 TAMIL NUMBER TEN                ";
      PrintProperties(outputBlock, '\u0BF0');

      outputBlock.Text += "U+0F33 TIBETAN DIGIT HALF ZERO         ";
      PrintProperties(outputBlock, '\u0F33');

      outputBlock.Text += "U+2788 CIRCLED SANS-SERIF DIGIT NINE   ";
      PrintProperties(outputBlock, '\u2788');
   }

   public static void PrintProperties(System.Windows.Controls.TextBlock outputBlock, char c)
   {
      outputBlock.Text += String.Format(" {0,-3}", c);
      outputBlock.Text += String.Format(" {0,-5}", CharUnicodeInfo.GetNumericValue(c));
      outputBlock.Text += String.Format("{0}", CharUnicodeInfo.GetUnicodeCategory(c)) + "\n";
   }
}
/*
This example produces the following output. 
   U+0061 LATIN SMALL LETTER A             a   -1   LowercaseLetter
   U+0393 GREEK CAPITAL LETTER GAMMA       G   -1   UppercaseLetter
   U+0039 DIGIT NINE                       9   9    DecimalDigitNumber
   U+00B2 SUPERSCRIPT TWO                  ²   2    OtherNumber
   U+00BC VULGAR FRACTION ONE QUARTER      ¼   0.25 OtherNumber
   U+0BEF TAMIL DIGIT NINE                 ?  9    DecimalDigitNumber
   U+0BF0 TAMIL NUMBER TEN                 ?   10   OtherNumber
   U+0F33 TIBETAN DIGIT HALF ZERO          ? -0.5   OtherNumber
   U+2788 CIRCLED SANS-SERIF DIGIT NINE    ?    9   OtherNumber
*/

Version Information

Silverlight

Supported in: 5, 4, 3

Silverlight for Windows Phone

Supported in: Windows Phone OS 7.1, Windows Phone OS 7.0

XNA Framework

Supported in: Xbox 360, Windows Phone OS 7.0

Platforms

For a list of the operating systems and browsers that are supported by Silverlight, see Supported Operating Systems and Browsers.

Change History

Date

History

Reason

December 2010

Added information about how the method handles surrogate pairs.

Information enhancement.