Retrieves information about a Unicode character. This class cannot be inherited.
Namespace:
System.Globalization
Assembly:
mscorlib (in mscorlib.dll)
Visual Basic (Declaration)
Public NotInheritable Class CharUnicodeInfo
Dim instance As CharUnicodeInfo
public sealed class CharUnicodeInfo
public ref class CharUnicodeInfo sealed
public final class CharUnicodeInfo
The Unicode Standard defines a number of Unicode character categories. For example, a character might be categorized as an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a connector punctuation, a math symbol, or a currency symbol. Your application can use the character category to govern string-based operations, such as parsing. The UnicodeCategory enumeration defines the possible character categories.
Your application uses the CharUnicodeInfo class to obtain the UnicodeCategory value for a specific character. The CharUnicodeInfo class defines methods that return the following Unicode character values:
Numeric value. Applies only to numeric characters, including fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits.
Digit value. Applies to numeric characters that can be combined with other numeric characters to represent a whole number in a numbering system.
Decimal digit value. Applies only to decimal digits in the decimal (base-10) system. A decimal digit can be one of ten digits, from 0 through 9.
When using this class in your applications, keep in mind the following programming considerations for using the "char" type. The type can be difficult to use and strings are generally preferable for representing linguistic content.
The "char" type represents a single 16-bit value, while some Unicode characters consist of two UTF-16 code points so that the "char" type cannot represent a complete character. This situation does not apply to English-language characters, but does apply to many Chinese and other characters that require two "char" types to represent the appropriate Unicode code point.
The notion of a "character" is also flexible. A character is often thought of as a "glyph," but many glyphs require multiple code points. For example, ä can be represented by "a" + U+0308 (combining diaresis) or "ä" (U+00A4). Some languages have many letters, characters, and glyphs that require multiple code points, which can cause confusion in linguistic content representation. For example, there is a ΰ (U+03B0 Greek small letter upsilon with dialytika and tonos), but there is no equivalent capital letter. Uppercasing such a value simply retrieves the original value.
The following code example shows the values returned by each method for different types of characters.
Imports System
Imports System.Globalization
Imports Microsoft.VisualBasic
Public Class SamplesCharUnicodeInfo
Public Shared Sub Main()
Console.WriteLine(" c Num Dig Dec UnicodeCategory")
Console.Write("U+0061 LATIN SMALL LETTER A ")
PrintProperties("a"c)
Console.Write("U+0393 GREEK CAPITAL LETTER GAMMA ")
PrintProperties(ChrW(&H0393))
Console.Write("U+0039 DIGIT NINE ")
PrintProperties("9"c)
Console.Write("U+00B2 SUPERSCRIPT TWO ")
PrintProperties(ChrW(&H00B2))
Console.Write("U+00BC VULGAR FRACTION ONE QUARTER ")
PrintProperties(ChrW(&H00BC))
Console.Write("U+0BEF TAMIL DIGIT NINE ")
PrintProperties(ChrW(&H0BEF))
Console.Write("U+0BF0 TAMIL NUMBER TEN ")
PrintProperties(ChrW(&H0BF0))
Console.Write("U+0F33 TIBETAN DIGIT HALF ZERO ")
PrintProperties(ChrW(&H0F33))
Console.Write("U+2788 CIRCLED SANS-SERIF DIGIT NINE ")
PrintProperties(ChrW(&H2788))
End Sub 'Main
Public Shared Sub PrintProperties(c As Char)
Console.Write(" {0,-3}", c)
Console.Write(" {0,-5}", CharUnicodeInfo.GetNumericValue(c))
Console.Write(" {0,-5}", CharUnicodeInfo.GetDigitValue(c))
Console.Write(" {0,-5}", CharUnicodeInfo.GetDecimalDigitValue(c))
Console.WriteLine("{0}", CharUnicodeInfo.GetUnicodeCategory(c))
End Sub 'PrintProperties
End Class 'SamplesCharUnicodeInfo
'This code produces the following output. Some characters might not display at the console.
'
' c Num Dig Dec UnicodeCategory
'U+0061 LATIN SMALL LETTER A a -1 -1 -1 LowercaseLetter
'U+0393 GREEK CAPITAL LETTER GAMMA \u0393 -1 -1 -1 UppercaseLetter
'U+0039 DIGIT NINE 9 9 9 9 DecimalDigitNumber
'U+00B2 SUPERSCRIPT TWO \u00B2 2 2 2 OtherNumber
'U+00BC VULGAR FRACTION ONE QUARTER \u00BC 0.25 -1 -1 OtherNumber
'U+0BEF TAMIL DIGIT NINE \u0BEF 9 9 9 DecimalDigitNumber
'U+0BF0 TAMIL NUMBER TEN \u0BF0 10 -1 -1 OtherNumber
'U+0F33 TIBETAN DIGIT HALF ZERO \u0F33 -0.5 -1 -1 OtherNumber
'U+2788 CIRCLED SANS-SERIF DIGIT NINE \u2788 9 9 -1 OtherNumber
using System;
using System.Globalization;
public class SamplesCharUnicodeInfo {
public static void Main() {
Console.WriteLine( " c Num Dig Dec UnicodeCategory" );
Console.Write( "U+0061 LATIN SMALL LETTER A " );
PrintProperties( 'a' );
Console.Write( "U+0393 GREEK CAPITAL LETTER GAMMA " );
PrintProperties( '\u0393' );
Console.Write( "U+0039 DIGIT NINE " );
PrintProperties( '9' );
Console.Write( "U+00B2 SUPERSCRIPT TWO " );
PrintProperties( '\u00B2' );
Console.Write( "U+00BC VULGAR FRACTION ONE QUARTER " );
PrintProperties( '\u00BC' );
Console.Write( "U+0BEF TAMIL DIGIT NINE " );
PrintProperties( '\u0BEF' );
Console.Write( "U+0BF0 TAMIL NUMBER TEN " );
PrintProperties( '\u0BF0' );
Console.Write( "U+0F33 TIBETAN DIGIT HALF ZERO " );
PrintProperties( '\u0F33' );
Console.Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE " );
PrintProperties( '\u2788' );
}
public static void PrintProperties( char c ) {
Console.Write( " {0,-3}", c );
Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( c ) );
Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( c ) );
Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( c ) );
Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( c ) );
}
}
/*
This code produces the following output. Some characters might not display at the console.
c Num Dig Dec UnicodeCategory
U+0061 LATIN SMALL LETTER A a -1 -1 -1 LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA \u0393 -1 -1 -1 UppercaseLetter
U+0039 DIGIT NINE 9 9 9 9 DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO \u00B2 2 2 2 OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER \u00BC 0.25 -1 -1 OtherNumber
U+0BEF TAMIL DIGIT NINE \u0BEF 9 9 9 DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN \u0BF0 10 -1 -1 OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO \u0F33 -0.5 -1 -1 OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE \u2788 9 9 -1 OtherNumber
*/
using namespace System;
using namespace System::Globalization;
void PrintProperties( Char c );
int main()
{
Console::WriteLine( " c Num Dig Dec UnicodeCategory" );
Console::Write( "U+0061 LATIN SMALL LETTER A " );
PrintProperties( L'a' );
Console::Write( "U+0393 GREEK CAPITAL LETTER GAMMA " );
PrintProperties( L'\u0393' );
Console::Write( "U+0039 DIGIT NINE " );
PrintProperties( L'9' );
Console::Write( "U+00B2 SUPERSCRIPT TWO " );
PrintProperties( L'\u00B2' );
Console::Write( "U+00BC VULGAR FRACTION ONE QUARTER " );
PrintProperties( L'\u00BC' );
Console::Write( "U+0BEF TAMIL DIGIT NINE " );
PrintProperties( L'\u0BEF' );
Console::Write( "U+0BF0 TAMIL NUMBER TEN " );
PrintProperties( L'\u0BF0' );
Console::Write( "U+0F33 TIBETAN DIGIT HALF ZERO " );
PrintProperties( L'\u0F33' );
Console::Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE " );
PrintProperties( L'\u2788' );
}
void PrintProperties( Char c )
{
Console::Write( " {0,-3}", c );
Console::Write( " {0,-5}", CharUnicodeInfo::GetNumericValue( c ) );
Console::Write( " {0,-5}", CharUnicodeInfo::GetDigitValue( c ) );
Console::Write( " {0,-5}", CharUnicodeInfo::GetDecimalDigitValue( c ) );
Console::WriteLine( "{0}", CharUnicodeInfo::GetUnicodeCategory( c ) );
}
/*
This code produces the following output. Some characters might not display at the console.
c Num Dig Dec UnicodeCategory
U+0061 LATIN SMALL LETTER A a -1 -1 -1 LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA \u0393 -1 -1 -1 UppercaseLetter
U+0039 DIGIT NINE 9 9 9 9 DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO \u00B2 2 2 2 OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER \u00BC 0.25 -1 -1 OtherNumber
U+0BEF TAMIL DIGIT NINE \u0BEF 9 9 9 DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN \u0BF0 10 -1 -1 OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO \u0F33 -0.5 -1 -1 OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE \u2788 9 9 -1 OtherNumber
*/
System..::.Object
System.Globalization..::.CharUnicodeInfo
Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
Windows 7, Windows Vista, Windows XP SP2, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP Starter Edition, Windows Server 2008 R2, Windows Server 2008, Windows Server 2003, Windows Server 2000 SP4, Windows Millennium Edition, Windows 98, Windows CE, Windows Mobile for Smartphone, Windows Mobile for Pocket PC, Xbox 360, Zune
The .NET Framework and .NET Compact Framework do not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.
.NET Framework
Supported in: 3.5, 3.0, 2.0
.NET Compact Framework
Supported in: 3.5, 2.0
XNA Framework
Supported in: 3.0, 2.0, 1.0
Reference