Este artículo se tradujo automáticamente. Para ver el artículo en inglés, active la casilla Inglés. Además, puede mostrar el texto en inglés en una ventana emergente si mueve el puntero del mouse sobre el texto.
Traducción
Inglés

Clase CharUnicodeInfo

 

Publicado: julio de 2016

Recupera la información sobre un carácter Unicode. Esta clase no puede heredarse.

Espacio de nombres:   System.Globalization
Ensamblado:  mscorlib (en mscorlib.dll)

System.Object
  System.Globalization.CharUnicodeInfo

public static class CharUnicodeInfo

NombreDescripción
System_CAPS_pubmethodSystem_CAPS_staticGetDecimalDigitValue(Char)

Obtiene el valor de dígito decimal del carácter numérico especificado.

System_CAPS_pubmethodSystem_CAPS_staticGetDecimalDigitValue(String, Int32)

Obtiene el valor de dígito decimal del carácter numérico en el índice especificado de la cadena especificada.

System_CAPS_pubmethodSystem_CAPS_staticGetDigitValue(Char)

Obtiene el valor de dígito del carácter numérico especificado.

System_CAPS_pubmethodSystem_CAPS_staticGetDigitValue(String, Int32)

Obtiene el valor de dígito del carácter numérico en el índice especificado de la cadena especificada.

System_CAPS_pubmethodSystem_CAPS_staticGetNumericValue(Char)

Obtiene el valor numérico asociado al carácter especificado.

System_CAPS_pubmethodSystem_CAPS_staticGetNumericValue(String, Int32)

Obtiene el valor numérico asociado con el carácter en el índice especificado de la cadena especificada.

System_CAPS_pubmethodSystem_CAPS_staticGetUnicodeCategory(Char)

Obtiene la categoría de Unicode del carácter especificado.

System_CAPS_pubmethodSystem_CAPS_staticGetUnicodeCategory(String, Int32)

Obtiene la categoría de Unicode del carácter en el índice especificado de la cadena especificada.

The Unicode Standardhttp://go.microsoft.com/fwlink/?linkid=37123 defines a number of Unicode character categories. For example, a character might be categorized as an uppercase letter, a lowercase letter, a decimal digit number, a letter number, a paragraph separator, a math symbol, or a currency symbol. Your application can use the character category to govern string-based operations, such as parsing or extracting substring with regular expressions. The T:System.Globalization.UnicodeCategory enumeration defines the possible character categories.

You use the T:System.Globalization.CharUnicodeInfo class to obtain the T:System.Globalization.UnicodeCategory value for a specific character. The T:System.Globalization.CharUnicodeInfo class defines methods that return the following Unicode character values:

  • The specific category to which a character or surrogate pair belongs. The value returned is a member of the T:System.Globalization.UnicodeCategory enumeration.

  • Numeric value. Applies only to numeric characters, including fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits.

  • Digit value. Applies to numeric characters that can be combined with other numeric characters to represent a whole number in a numbering system.

  • Decimal digit value. Applies only to characters that represent decimal digits in the decimal (base 10) system. A decimal digit can be one of ten digits, from zero through nine. These characters are members of the F:System.Globalization.UnicodeCategory.DecimalDigitNumber category.

In addition, the T:System.Globalization.CharUnicodeInfo class is used internally by a number of other .NET Framework types and methods that rely on character classification. These include:

  • The T:System.Globalization.StringInfo class, which works with textual elements instead of single characters in a string.

  • The overloads of the M:System.Char.GetUnicodeCategory(System.Char) method, which determine the category to which a character or surrogate pair belongs.

  • The character classes recognized by T:System.Text.RegularExpressions.Regex, the .NET Framework's regular expression engine.

When using this class in your applications, keep in mind the following programming considerations for using the T:System.Char type. The type can be difficult to use, and strings are generally preferable for representing linguistic content.

  • A T:System.Char object does not always correspond to a single character. Although the T:System.Char type represents a single 16-bit value, some characters (such as grapheme clusters and surrogate pairs) consist of two or more UTF-16 code units. For more information, see "Char Objects and Unicode Characters" in the T:System.String class.

  • The notion of a "character" is also flexible. A character is often thought of as a glyph, but many glyphs require multiple code points. For example, ä can be represented either by two code points ("a" plus U+0308, which is the combining diaeresis), or by a single code point ("ä" or U+00A4). Some languages have many letters, characters, and glyphs that require multiple code points, which can cause confusion in linguistic content representation. For example, there is a ΰ (U+03B0, Greek small letter upsilon with dialytika and tonos), but there is no equivalent capital letter. Uppercasing such a value simply retrieves the original value.

Notas para llamadores:

Recognized characters and the specific categories to which they belong are defined by the Unicode standard and can change from one version of the Unicode Standard to another. Categorization of characters in a particular version of the .NET Framework is based on a single version of the Unicode Standard regardless of the underlying operating system on which the .NET Framework is running. The following table lists versions of the .NET Framework since the net_v40_long and the versions of the Unicode Standard used to classify characters.

.NET Framework version

Version of the Unicode Standard

.NET Framework 4

The Unicode Standard, Version 5.0.0

.NET Framework 4.5

The Unicode Standard, Version 5.0.0

.NET Framework 4.5.1

The Unicode Standard, Version 5.0.0

.NET Framework 4.5.2

The Unicode Standard, Version 5.0.0

.NET Framework 4.6

The Unicode Standard, Version 6.3.0

.NET Framework 4.6.1

The Unicode Standard, Version 6.3.0

.NET Framework 4.6.2

The Unicode Standard, Version 8.0.0

Each version of the Unicode standard includes information on changes to the Unicode character database since the previous version. The Unicode character database is used by the T:System.Globalization.CharUnicodeInfo class for categorizing characters.

The following code example shows the values returned by each method for different types of characters.

using System;
using System.Globalization;

public class SamplesCharUnicodeInfo  {

   public static void Main()  {

      Console.WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );

      Console.Write( "U+0061 LATIN SMALL LETTER A            " );
      PrintProperties( 'a' );

      Console.Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
      PrintProperties( '\u0393' );

      Console.Write( "U+0039 DIGIT NINE                      " );
      PrintProperties( '9' );

      Console.Write( "U+00B2 SUPERSCRIPT TWO                 " );
      PrintProperties( '\u00B2' );

      Console.Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
      PrintProperties( '\u00BC' );

      Console.Write( "U+0BEF TAMIL DIGIT NINE                " );
      PrintProperties( '\u0BEF' );

      Console.Write( "U+0BF0 TAMIL NUMBER TEN                " );
      PrintProperties( '\u0BF0' );

      Console.Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
      PrintProperties( '\u0F33' );

      Console.Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
      PrintProperties( '\u2788' );

   }

   public static void PrintProperties( char c )  {
      Console.Write( " {0,-3}", c );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( c ) );
      Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( c ) );
   }

}


/*
This code produces the following output.  Some characters might not display at the console.

                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       \u0393   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  \u00B2   2     2     2    OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      \u00BC   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 \u0BEF   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 \u0BF0   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          \u0F33   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    \u2788   9     9     -1   OtherNumber

*/

Plataforma universal de Windows
Disponible desde 8
.NET Framework
Disponible desde 2.0
Biblioteca de clases portable
Se admite en: plataformas portátiles de .NET
Silverlight
Disponible desde 2.0
Windows Phone Silverlight
Disponible desde 7.0
Windows Phone
Disponible desde 8.1

Cualquier miembro ( Compartido en Visual Basic) estático público de este tipo es seguro para subprocesos. No se garantiza que los miembros de instancia sean seguros para subprocesos.

Volver al principio
Mostrar: