C++ Character Constants
Updated: October 2008
Character constants are one or more members of the basic source character set, the character set in which a program is written, enclosed in single quotation marks ('). They are used to represent characters in the basic execution character set, the character set on the computer where the program executes.
For the Microsoft C/C++ compiler, the source and execution character sets are both ASCII.
The basic source character set consists of 96 characters: the space character; the control characters that represent horizontal tab, vertical tab, formfeed, and newline; and the following 91 characters:
The basic execution character set consists of the characters in the basic source characters set, and also the control characters that represent alert, backspace, carriage return, and null.
END Microsoft Specific
There are three kinds of character constants:
Normal character constants
Use wide-character constants in place of multicharacter constants to ensure portability.
Character constants are specified as one or more characters enclosed in single quotation marks. For example:
char ch = 'x'; // Specify normal character constant. int mbch = 'ab'; // Specify system-dependent // multicharacter constant. wchar_t wcch = L'ab'; // Specify wide-character constant.
Note that mbch is of type int. If it were declared as type char, the second byte would not be retained. A multicharacter constant has four meaningful characters; specifying more than four generates an error message.
Characters within a character constant may be any graphic characters in the source character set, except for newline, backslash \, single quote ' and double quote ", or they may be specified using an escape sequence. There are three types of escape sequences: simple, octal and hexadecimal escape sequences.
Simple escape sequences may be any of the following:
\' \" \? \\ \a \b \f \n \r \t \v
An octal escape sequence is a backslash followed by a sequence of up to 3 octal digits.
A hexadecimal escape sequence is a backslash, followed by the character x, followed by a sequence of hex digits.
Microsoft C++ supports normal, multicharacter, and wide-character constants. Use wide-character constants to specify members of the extended execution character set (for example, to support an international application). Normal character constants have type char, multicharacter constants have type int, and wide-character constants have type wchar_t. (The type wchar_t is defined in the standard include files STDDEF.H, STDLIB.H, and STRING.H. The wide-character functions, however, are prototyped only in STDLIB.H.)
The only difference in specification between normal and wide-character constants is that wide-character constants are preceded by the letter L. For example:
char schar = 'x'; // Normal character constant wchar_t wchar = L'\x8119'; // Wide-character constant
The following table shows reserved or nongraphic characters that are system dependent or not allowed within character constants. These characters should be represented with escape sequences.
10 or 0x0a
11 or 0x0b
13 or 0x0d
12 or 0x0c
92 or 0x5c
63 or 0x3f
Single quotation mark
39 or 0x27
Double quotation mark
34 or 0x22
If the character following the backslash does not specify a legal escape sequence, the result is implementation defined. In Microsoft C++, the character following the backslash is taken literally, as though the escape were not present, and a level 1 warning ("unrecognized character escape sequence") is issued.
Octal escape sequences, specified in the form \ooo, consist of a backslash and one, two, or three octal characters. Hexadecimal escape sequences, specified in the form \xhhh, consist of the characters \x followed by a sequence of hexadecimal digits. Unlike octal escape constants, there is no limit on the number of hexadecimal digits in an escape sequence.
Octal escape sequences are terminated by the first character that is not an octal digit, or when three characters are seen. For example:
wchar_t och = L'\076a'; // Sequence terminates at a char ch = '\233'; // Sequence terminates after 3 characters
Similarly, hexadecimal escape sequences terminate at the first character that is not a hexadecimal digit. Because hexadecimal digits include the letters a through f (and A through F), make sure the escape sequence terminates at the intended digit.
Because the single quotation mark (') encloses character constants, use the escape sequence \' to represent enclosed single quotation marks. The double quotation mark (") can be represented without an escape sequence. The backslash character (\) is a line-continuation character when placed at the end of a line. If you want a backslash character to appear within a character constant, you must type two backslashes in a row (\\). (See Phases of Translation in the Preprocessor Reference for more information about line continuation.)