UnicodeCategory Enumeration

Updated: June 2011

Defines the Unicode category of a character.

Namespace:  System.Globalization
Assembly:  mscorlib (in mscorlib.dll)

[SerializableAttribute]
[ComVisibleAttribute(true)]
public enum UnicodeCategory

Member nameDescription
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkUppercaseLetterIndicates that the character is an uppercase letter. Signified by the Unicode designation "Lu" (letter, uppercase). The value is 0.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkLowercaseLetterIndicates that the character is a lowercase letter. Signified by the Unicode designation "Ll" (letter, lowercase). The value is 1.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkTitlecaseLetterIndicates that the character is a titlecase letter. Signified by the Unicode designation "Lt" (letter, titlecase). The value is 2.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkModifierLetterIndicates that the character is a modifier letter, which is free-standing spacing character that indicates modifications of a preceding letter. Signified by the Unicode designation "Lm" (letter, modifier). The value is 3.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkOtherLetterIndicates that the character is a letter that is not an uppercase letter, a lowercase letter, a titlecase letter, or a modifier letter. Signified by the Unicode designation "Lo" (letter, other). The value is 4.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkNonSpacingMarkIndicates that the character is a nonspacing character, which indicates modifications of a base character. Signified by the Unicode designation "Mn" (mark, nonspacing). The value is 5.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkSpacingCombiningMarkIndicates that the character is a spacing character, which indicates modifications of a base character and affects the width of the glyph for that base character. Signified by the Unicode designation "Mc" (mark, spacing combining). The value is 6.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkEnclosingMarkIndicates that the character is an enclosing mark, which is a nonspacing combining character that surrounds all previous characters up to and including a base character. Signified by the Unicode designation "Me" (mark, enclosing). The value is 7.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkDecimalDigitNumberIndicates that the character is a decimal digit, that is, in the range 0 through 9. Signified by the Unicode designation "Nd" (number, decimal digit). The value is 8.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkLetterNumberIndicates that the character is a number represented by a letter, instead of a decimal digit, for example, the Roman numeral for five, which is "V". The indicator is signified by the Unicode designation "Nl" (number, letter). The value is 9.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkOtherNumberIndicates that the character is a number that is neither a decimal digit nor a letter number, for example, the fraction 1/2. The indicator is signified by the Unicode designation "No" (number, other). The value is 10.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkSpaceSeparatorIndicates that the character is a space character, which has no glyph but is not a control or format character. Signified by the Unicode designation "Zs" (separator, space). The value is 11.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkLineSeparatorIndicates that the character is used to separate lines of text. Signified by the Unicode designation "Zl" (separator, line). The value is 12.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkParagraphSeparatorIndicates that the character is used to separate paragraphs. Signified by the Unicode designation "Zp" (separator, paragraph). The value is 13.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkControlIndicates that the character is a control code, with a Unicode value of U+007F or in the range U+0000 through U+001F or U+0080 through U+009F. Signified by the Unicode designation "Cc" (other, control). The value is 14.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkFormatIndicates that the character is a format character, which is not normally rendered but affects the layout of text or the operation of text processes. Signified by the Unicode designation "Cf" (other, format). The value is 15.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkSurrogateIndicates that the character is a high surrogate or a low surrogate. Surrogate code values are in the range U+D800 through U+DFFF. Signified by the Unicode designation "Cs" (other, surrogate). The value is 16.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkPrivateUseIndicates that the character is a private-use character, with a Unicode value in the range U+E000 through U+F8FF. Signified by the Unicode designation "Co" (other, private use). The value is 17.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkConnectorPunctuationIndicates that the character is a connector punctuation, which connects two characters. Signified by the Unicode designation "Pc" (punctuation, connector). The value is 18.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkDashPunctuationIndicates that the character is a dash or a hyphen. Signified by the Unicode designation "Pd" (punctuation, dash). The value is 19.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkOpenPunctuationIndicates that the character is the opening character of one of the paired punctuation marks, such as parentheses, square brackets, and braces. Signified by the Unicode designation "Ps" (punctuation, open). The value is 20.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkClosePunctuationIndicates that the character is the closing character of one of the paired punctuation marks, such as parentheses, square brackets, and braces. Signified by the Unicode designation "Pe" (punctuation, close). The value is 21.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkInitialQuotePunctuationIndicates that the character is an opening or initial quotation mark. Signified by the Unicode designation "Pi" (punctuation, initial quote). The value is 22.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkFinalQuotePunctuationIndicates that the character is a closing or final quotation mark. Signified by the Unicode designation "Pf" (punctuation, final quote). The value is 23.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkOtherPunctuationIndicates that the character is a punctuation that is not a connector punctuation, a dash punctuation, an open punctuation, a close punctuation, an initial quote punctuation, or a final quote punctuation. Signified by the Unicode designation "Po" (punctuation, other). The value is 24.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkMathSymbolIndicates that the character is a mathematical symbol, such as "+" or "= ". Signified by the Unicode designation "Sm" (symbol, math). The value is 25.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkCurrencySymbolIndicates that the character is a currency symbol. Signified by the Unicode designation "Sc" (symbol, currency). The value is 26.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkModifierSymbolIndicates that the character is a modifier symbol, which indicates modifications of surrounding characters. For example, the fraction slash indicates that the number to the left is the numerator and the number to the right is the denominator. The indicator is signified by the Unicode designation "Sk" (symbol, modifier). The value is 27.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkOtherSymbolIndicates that the character is a symbol that is not a mathematical symbol, a currency symbol or a modifier symbol. Signified by the Unicode designation "So" (symbol, other). The value is 28.
Supported by the .NET Compact FrameworkSupported by the XNA FrameworkOtherNotAssignedIndicates that the character is not assigned to any Unicode category. Signified by the Unicode designation "Cn" (other, not assigned). The value is 29.

A member of the UnicodeCategory enumeration is returned by the Char.GetUnicodeCategory and CharUnicodeInfo.GetUnicodeCategory methods. The UnicodeCategory enumeration is also used to support Char methods, such as IsUpper(Char). Such methods determine whether a specified character is a member of a particular Unicode general category. A Unicode general category defines the broad classification of a character, that is, designation as a type of letter, decimal digit, separator, mathematical symbol, punctuation, and so on.

This enumeration is based on The Unicode Standard, version 5.0. For more information, see the "UCD File Format" and "General Category Values" subtopics at the Unicode Character Database.

The Unicode Standard defines the following:

A surrogate pair is a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate. A high surrogate is a Unicode code point in the range U+D800 through U+DBFF and a low surrogate is a Unicode code point in the range U+DC00 through U+DFFF.

A combining character sequence is a combination of a base character and one or more combining characters. A surrogate pair represents a base character or a combining character. A combining character is either spacing or nonspacing. A spacing combining character takes up a spacing position by itself when rendered, while a nonspacing combining character does not. Diacritics are an example of nonspacing combining characters.

A modifier letter is a free-standing spacing character that, like a combining character, indicates modifications of a preceding letter.

An enclosing mark is a nonspacing combining character that surrounds all previous characters up to and including a base character.

A format character is a character that is not normally rendered but that affects the layout of text or the operation of text processes.

The Unicode Standard defines several variations to some punctuation marks. For example, a hyphen can be one of several code values that represent a hyphen, such as U+002D (hyphen-minus) or U+00AD (soft hyphen) or U+2010 (hyphen) or U+2011 (nonbreaking hyphen). The same is true for dashes, space characters, and quotation marks.

The Unicode Standard also assigns codes to representations of decimal digits that are specific to a given script or language, for example, U+0030 (digit zero) and U+0660 (Arabic-Indic digit zero).

Windows 7, Windows Vista, Windows XP SP2, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP Starter Edition, Windows Server 2008 R2, Windows Server 2008, Windows Server 2003, Windows Server 2000 SP4, Windows Millennium Edition, Windows 98, Windows CE, Windows Mobile for Smartphone, Windows Mobile for Pocket PC, Xbox 360, Zune

The .NET Framework and .NET Compact Framework do not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.

.NET Framework

Supported in: 3.5, 3.0, 2.0, 1.1, 1.0

.NET Compact Framework

Supported in: 3.5, 2.0, 1.0

XNA Framework

Supported in: 3.0, 2.0, 1.0

Date

History

Reason

June 2011

Expanded the Remarks section.

Information enhancement.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft