1 out of 11 rated this helpful - Rate this topic

Char Structure

Represents a Unicode character.

Namespace: System
Assembly: mscorlib (in mscorlib.dll)

[SerializableAttribute] 
[ComVisibleAttribute(true)] 
public struct Char : IComparable, IConvertible, IComparable<char>, 
	IEquatable<char>
/** @attribute SerializableAttribute() */ 
/** @attribute ComVisibleAttribute(true) */ 
public final class Char extends ValueType implements IComparable, IConvertible, 
	IComparable<char>, IEquatable<char>
JScript suports the use of structures, but not the declaration of new ones.

The .NET Framework uses the Char structure to represent Unicode characters. The Unicode Standard identifies each Unicode character with a unique 21-bit scalar number called a code point, and defines the UTF-16 encoding form that specifies how a code point is encoded into a sequence of one or more 16-bit values. Each 16-bit value ranges from hexadecimal 0x0000 through 0xFFFF and is stored in a Char structure. The value of a Char object is its 16-bit numeric (ordinal) value.

A String object is a sequential collection of Char structures that represents a string of text. Most Unicode characters can be represented by a single Char object, but a character that is encoded as a base character, surrogate pair, and/or combining character sequence is represented by multiple Char objects. For this reason, a Char structure in a String object is not necessarily equivalent to a single Unicode character.

For more information about the Unicode Standard, see the Unicode home page.

Functionality

The Char structure provides methods to compare Char objects, convert the value of the current Char object to an object of another type, and determine the Unicode category of a Char object:

Interface Implementations

This type implements the IConvertible, IComparable, and IComparable interfaces. Use the Convert class for conversions instead of this type's explicit interface member implementation of IConvertible.

The following code example demonstrates some of the methods in Char.

using System;

public class CharStructureSample {
	public static void Main() {
		char chA = 'A';
		char ch1 = '1';
		string str = "test string"; 

		Console.WriteLine(chA.CompareTo('B'));			// Output: "-1" (meaning 'A' is 1 less than 'B')
		Console.WriteLine(chA.Equals('A'));				// Output: "True"
		Console.WriteLine(Char.GetNumericValue(ch1));	// Output: "1"
		Console.WriteLine(Char.IsControl('\t'));		// Output: "True"
		Console.WriteLine(Char.IsDigit(ch1));			// Output: "True"
		Console.WriteLine(Char.IsLetter(','));			// Output: "False"
		Console.WriteLine(Char.IsLower('u'));			// Output: "True"
		Console.WriteLine(Char.IsNumber(ch1));			// Output: "True"
		Console.WriteLine(Char.IsPunctuation('.'));		// Output: "True"
		Console.WriteLine(Char.IsSeparator(str, 4));	// Output: "True"
		Console.WriteLine(Char.IsSymbol('+'));			// Output: "True"
		Console.WriteLine(Char.IsWhiteSpace(str, 4));	// Output: "True"
		Console.WriteLine(Char.Parse("S"));				// Output: "S"
		Console.WriteLine(Char.ToLower('M'));			// Output: "m"
		Console.WriteLine('x'.ToString());				// Output: "x"
	}
}

import System.* ;

public class CharStructureSample
{
    public static void main(String[] args)
    {
        Character chA = new Character('A');
        char ch1 = '1';
        String str = "test string";

        // Output: "-1" (meaning 'A' is 1 less than 'B')        
        Console.WriteLine(chA.compareTo(new Character('B')));        
        // Output: "True"
        Console.WriteLine(chA.equals(new Character('A')));            
        // Output: "1"
        Console.WriteLine(System.Char.GetNumericValue(ch1));        
        // Output: "True"
        Console.WriteLine(Char.IsControl('\t'));                    
        // Output: "True"
        Console.WriteLine(System.Char.IsDigit(ch1));                
        // Output: "False"
        Console.WriteLine(Char.IsLetter(','));                        
        // Output: "True"
        Console.WriteLine(Char.IsLower('u'));                        
        // Output: "True"
        Console.WriteLine(System.Char.IsNumber(ch1));                
        // Output: "True"
        Console.WriteLine(Char.IsPunctuation('.'));                    
        // Output: "True"
        Console.WriteLine(Char.IsSeparator(str, 4));                
        // Output: "True"
        Console.WriteLine(Char.IsSymbol('+'));                        
        // Output: "True"
        Console.WriteLine(Char.IsWhiteSpace(str, 4));                
        // Output: "S"
        Console.WriteLine(Char.Parse("S"));                            
        // Output: "m"
        Console.WriteLine(Char.ToLower('M'));                        
        // Output: "x"
        Console.WriteLine(System.Convert.ToString('x'));                
    } //main
} //CharStructureSample

This type is safe for multithreaded operations.

Windows 98, Windows 2000 SP4, Windows CE, Windows Millennium Edition, Windows Mobile for Pocket PC, Windows Mobile for Smartphone, Windows Server 2003, Windows XP Media Center Edition, Windows XP Professional x64 Edition, Windows XP SP2, Windows XP Starter Edition

The .NET Framework does not support all versions of every platform. For a list of the supported versions, see System Requirements.

.NET Framework

Supported in: 2.0, 1.1, 1.0

.NET Compact Framework

Supported in: 2.0, 1.0
Did you find this helpful?
(1500 characters remaining)
Community Content Add
Annotations FAQ
WARNING: Chars don't make sense in many languages

It is worth mentioning that the "char" type represents a single 16 bit value. In Unicode some characters consist of 2 UTF-16 code points, so in that case a "char" cannot represent a complete "character". This doesn't happen to English, but many Chinese and other characters exist outside of the BMP (ie: require 2 chars to represent the Unicode code point).

Also note that the notion of a "character" is also flexible. Many people think of them as "glyphs", but many "glyphs" require multiple code points. For example ä can be "a" + U+0308 (combining diaresis) or "ä" (U+00A4). In some languages all "letters/characters/glyphs" cannot be represented correctly by a single Unicode code point and instead require multiple code points.

Additionally some concepts get confused by this behavior. For example, There is a ΰ (U+03B0 greek small letter Upsilon with Dialytika and Tonos), however there's no equivilent capital letter. Trying to do ToUpper() ends up returning the same value, although you could perhaps argue for Ϋ́ (U+03AB + U+0301, greeke capital letter upsilon with dialytika, and then a combining tonos) Some other operating systems/environments choose that as the ToUpper() value for U+03B0, so then a single "char" ends up with a 2 "char" upper case form.

Another example is when combinations of characters cause their form to change. This isn't common in the "latin" characters, but its kind of like æ (U+00E6) looking like a and e crammed together, or, in German ß being the equivilent of ss. In some scripts the form changes a lot depending on the subsequent letters. An oversimplification would be to describe it as kind of like a hyperactive cursive where the letters connect in different ways depending on the following letters.

There are many other examples of cases when the "character" concept breaks down, so use caution. Strings are generally preferrable to better represent linguistic content.