1 out of 1 rated this helpful - Rate this topic

CharUnicodeInfo Class

Updated: August 2010

Retrieves information about a Unicode character. This class cannot be inherited.

System.Object
  System.Globalization.CharUnicodeInfo

Namespace:  System.Globalization
Assembly:  mscorlib (in mscorlib.dll)
public static class CharUnicodeInfo

The CharUnicodeInfo type exposes the following members.

  Name Description
Public method Static member GetDecimalDigitValue(Char) Gets the decimal digit value of the specified numeric character.
Public method Static member GetDecimalDigitValue(String, Int32) Gets the decimal digit value of the numeric character at the specified index of the specified string.
Public method Static member GetDigitValue(Char) Gets the digit value of the specified numeric character.
Public method Static member GetDigitValue(String, Int32) Gets the digit value of the numeric character at the specified index of the specified string.
Public method Static member Supported by the XNA Framework Supported by Portable Class Library GetNumericValue(Char) Gets the numeric value associated with the specified character.
Public method Static member Supported by the XNA Framework Supported by Portable Class Library GetNumericValue(String, Int32) Gets the numeric value associated with the character at the specified index of the specified string.
Public method Static member Supported by the XNA Framework Supported by Portable Class Library GetUnicodeCategory(Char) Gets the Unicode category of the specified character.
Public method Static member Supported by the XNA Framework Supported by Portable Class Library GetUnicodeCategory(String, Int32) Gets the Unicode category of the character at the specified index of the specified string.
Top

The Unicode Standard defines a number of Unicode character categories. For example, a character might be categorized as an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a paragraph separator, a math symbol, or a currency symbol. Your application can use the character category to govern string-based operations, such as parsing. The UnicodeCategory enumeration defines the possible character categories.

Your application uses the CharUnicodeInfo class to obtain the UnicodeCategory value for a specific character. The CharUnicodeInfo class defines methods that return the following Unicode character values:

  • Numeric value. Applies only to numeric characters, including fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits.

  • Digit value. Applies to numeric characters that can be combined with other numeric characters to represent a whole number in a numbering system.

  • Decimal digit value. Applies only to decimal digits in the decimal (base-10) system. A decimal digit can be one of ten digits, from 0 through 9.

When using this class in your applications, keep in mind the following programming considerations for using the Char type. The type can be difficult to use, and strings are generally preferable for representing linguistic content.

  • A Char object does not always correspond to a single character. Although the Char type represents a single 16-bit value, some Unicode characters consist of two or more UTF-16 code points. For more information, see "Char Objects and Unicode Characters" in the String class.

  • The notion of a "character" is also flexible. A character is often thought of as a glyph, but many glyphs require multiple code points. For example, ä can be represented either by two code points ("a" plus U+0308, which is the combining diaeresis), or by a single code point ("ä" or U+00A4). Some languages have many letters, characters, and glyphs that require multiple code points, which can cause confusion in linguistic content representation. For example, there is a ΰ (U+03B0, Greek small letter upsilon with dialytika and tonos), but there is no equivalent capital letter. Uppercasing such a value simply retrieves the original value.

The following code example shows the values returned by each method for different types of characters.


using System;
using System.Globalization;

public class SamplesCharUnicodeInfo  {

   public static void Main()  {

      Console.WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );

      Console.Write( "U+0061 LATIN SMALL LETTER A            " );
      PrintProperties( 'a' );

      Console.Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
      PrintProperties( '\u0393' );

      Console.Write( "U+0039 DIGIT NINE                      " );
      PrintProperties( '9' );

      Console.Write( "U+00B2 SUPERSCRIPT TWO                 " );
      PrintProperties( '\u00B2' );

      Console.Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
      PrintProperties( '\u00BC' );

      Console.Write( "U+0BEF TAMIL DIGIT NINE                " );
      PrintProperties( '\u0BEF' );

      Console.Write( "U+0BF0 TAMIL NUMBER TEN                " );
      PrintProperties( '\u0BF0' );

      Console.Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
      PrintProperties( '\u0F33' );

      Console.Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
      PrintProperties( '\u2788' );

   }

   public static void PrintProperties( char c )  {
      Console.Write( " {0,-3}", c );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( c ) );
      Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( c ) );
   }

}


/*
This code produces the following output.  Some characters might not display at the console.

                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       \u0393   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  \u00B2   2     2     2    OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      \u00BC   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 \u0BEF   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 \u0BF0   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          \u0F33   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    \u2788   9     9     -1   OtherNumber

*/



.NET Framework

Supported in: 4, 3.5, 3.0, 2.0

.NET Framework Client Profile

Supported in: 4, 3.5 SP1

Portable Class Library

Supported in: Portable Class Library

Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows XP SP2 x64 Edition, Windows Server 2008 (Server Core not supported), Windows Server 2008 R2 (Server Core supported with SP1 or later), Windows Server 2003 SP2

The .NET Framework does not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.
Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

Date

History

Reason

August 2010

Revised information about relationship of Char objects and Unicode characters.

Customer feedback.

Did you find this helpful?
(1500 characters remaining)
Community Content Add
Annotations FAQ
CharUnicodeInfo Example - recoded into PowerShell
<#
.SYNOPSIS
This script, a re-implementation of an MSDN sample, shows the
Unicode details of a unicode character, using PowerShell.
.DESCRIPTION
This script re-implements a simple MSDN script that takes a Unicode Character
and uses CharUnicodeInfo class to get details of that character, which are then
displayed on the console.
.NOTES
File Name : Show-UnicodeCharacters.ps1
Author : Thomas Lee - tfl@psp.co.uk
Requires : PowerShell Version 2.0
.LINK
This script posted to:
http://www.pshscripts.blogspot.com
MSDN sample posted to:
http://msdn.microsoft.com/en-us/library/system.globalization.charunicodeinfo.aspx
.EXAMPLE
Psh > .\show-unicodecharacters.ps1
c Num Dig Dec UnicodeCategory
U+0061 LATIN SMALL LETTER A a -1 -1 -1 LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA Ɖ -1 -1 -1 UppercaseLetter
U+0039 DIGIT NINE 9 9 9 9 DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO ² 2 2 -1 OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER ¼ 0.25 -1 -1 OtherNumber
U+0BEF TAMIL DIGIT NINE ௯ 9 9 9 DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN ௰ 10 -1 -1 OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO ༳ -0.5 -1 -1 OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE ➈ 9 9 -1 OtherNumber

#>

# Helper Function
Function PrintProperties {
param ($char)
$fmtstring = " {0,-5} {1,-8} {2,-9} {3,-9} {4,-9}"
$a = $char
$b = [System.Globalization.CharUnicodeInfo]::GetNumericValue( $char )
$c = [System.Globalization.CharUnicodeInfo]::GetDigitValue( $char )
$d = [System.Globalization.CharUnicodeInfo]::GetDecimalDigitValue( $char )
$e = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory( $char )
$fmtstring -f $a, $b, $c, $d, $e
}
" c Num Dig Dec UnicodeCategory"
"U+0061 LATIN SMALL LETTER A " + (PrintProperties "a")
"U+0393 GREEK CAPITAL LETTER GAMMA " + (PrintProperties ([Char] 0393) )
"U+0039 DIGIT NINE " + (PrintProperties "9")

"U+00B2 SUPERSCRIPT TWO " + (PrintProperties $([Char] 0x00B2) )

"U+00BC VULGAR FRACTION ONE QUARTER " + (PrintProperties $([Char] 0x00BC) )
"U+0BEF TAMIL DIGIT NINE " + (PrintProperties $([Char] 0x0BEF) )
"U+0BF0 TAMIL NUMBER TEN " + (PrintProperties $([Char] 0x0BF0) )
"U+0F33 TIBETAN DIGIT HALF ZERO " + (PrintProperties $([Char] 0x0F33) )
"U+2788 CIRCLED SANS-SERIF DIGIT NINE " + (PrintProperties $([Char] 0x2788) )