Was this page helpful?
Your feedback about this content is important. Let us know what you think.
Additional feedback?
1500 characters remaining
Export (0) Print
Expand All

String and Character Literals (C++)

C++ supports various string and character types, and provides ways to express literal values of each of these types. A raw string literal enables you to avoid using escape characters, and can be used to express all types of string literals.

// Character literals
        auto c = 'A'; // char
        auto c1 = L'A'; // wchar_t
        auto c2 = u'A'; //char16_t
        auto c3 = U'A'; //char32_t

        // string literals
        auto s = "hello"; // const char*
        auto s2 = L"hello"; // const wchar_t*
        auto s3 = u"hello"; // const char16_t*
        auto s4 = U"hello"; // const char32_t*
        auto s5 = R"("Hello world")"; // raw const char*
        auto s6 = "hello"s; // std::string

A raw string literal can also have u8R, uR, LR, and UR prefixes for other encodings. See below.

A character literal is composed of a constant character. It is represented by the character surrounded by single quotation marks. There are four kinds of character literals:

  • Narrow-character literals of type char, for example 'a'

  • Wide-character literals of type wchar_t, for example L'a'

  • Wide-character literals of type char16_t, for example u'a'

  • Wide-character literals of type char32_t, for example U'a'

The character used for a character literal may be any character, except for reserved characters such as newline ('\n'), backslash ('\'), single quotation mark ('), and double quotation mark ("). Reserved characters are be specified with an escape sequence.

There are five kinds of escape sequences: simple, octal, hexadecimal, Unicode (UTF-8), and Unicode (UTF-16). Escape sequences may be any of the following:

Value

Escape sequence

Value

Escape sequence

newline

\n

question mark

? or \?

horizontal tab

\t

single quote

\'

vertical tab

\v

double quote

\"

backspace

\b

the null character

\0

carriage return

\r

octal

\ooo

form feed

\f

hexadecimal

\xhhh

alert

\a

bell

\uxxxx

backslash

\\

backslash character

\Uxxxxxxxx

The following code shows some examples of escaped characters using narrow string literals. The same syntax is valid for the other string literal types.

#include <iostream>
using namespace std;

int main() {
    char newline = '\n';
    char tab = '\t';
    char backspace = '\b';
    char backslash = '\\';
    char nullChar = '\0';

    cout << "Newline character: " << newline << "ending" << endl; // Newline character:
                                                                  //  ending
    cout << "Tab character: " << tab << "ending" << endl; // Tab character : ending
    cout << "Backspace character: " << backspace << "ending" << endl; // Backspace character : ending
    cout << "Backslash character: " << backslash << "ending" << endl; // Backslash character : \ending
    cout << "Null character: " << nullChar << "ending" << endl; //Null character:  ending
}

Microsoft Specific

An octal escape sequence is a backslash followed by a sequence of up to 3 octal digits. The behavior of an octal escape sequence that contains more than three digits is implementation-defined; they can give surprising results. For example:

char c1 = '\100';     // char '@'
char c2 = '\1000';   // char '0' 

Escape sequences that contain non-octal characters are evaluated as the last non-octal character. For example:

char c3 = '\009'// char '9'
char c4 = '\089'     // char '9'
char c5 = '\qrs'     // char 's'

A hexadecimal escape sequence is a backslash followed by the character x, followed by a sequence of hexadecimal digits. An escape sequence that contains no hexadecimal digits causes compiler error C2153: "hex literals must have at least one hex digit". An escape sequence that has hexadecimal and non-hexadecimal characters is evaluated as the last non-hexadecimal character. The highest hexadecimal value is 0xff.

char c1 = '\x0050';  // char 'P'
char c2 = '\x0pqr';  // char 'r'

END Microsoft Specific

The backslash character (\) is a line-continuation character when it is placed at the end of a line. If you want a backslash character to appear as a character literal, you must type two backslashes in a row (\\). For more information about the line continuation character, see Phases of Translation.

A string literal represents a sequence of characters that together form a null-terminated string. The characters must be enclosed between double quotation marks. There are the following kinds of string literals:

A narrow string literal is a null-terminated array of constant char that contains any graphic character except the double quotation mark ("), backslash (\), or newline character. A narrow string literal may contain the escape sequences listed in C++ Character Literals.

const char *narrow = "abcd";

// represents the string: yes\no
const char *escaped = "yes\\no";

UTF-8 encoded strings

A UTF-8 encoded string is also a narrow string of type char*. A UTF-8 string literal is prefixed by u8:

const char* str = u8"Hello World";
const char* str2 = u8"😇 = \U0001F607 is O:-)";

A wide string literal is a null-terminated array of constant wchar_t that is prefixed by 'L' and contains any graphic character except the double quotation mark ("), backslash (\), or newline character. A wide string literal may contain the escape sequences listed in C++ Character Literals.

const wchar_t* wide = L"zyxw";
const wchar_t* newline = L"hello\ngoodbye";

char16_t and char32_t (C++11)

C++11 introduces the portable char16_t (16-bit Unicode) and char32_t (32-bit Unicode) character types:

auto s3 = u"hello"; // const char16_t*
auto s4 = U"hello"; // const char32_t*

A raw string literal is a null-terminated array—of any character type—that contains any graphic character, including the double quotation mark ("), backslash (\), or newline character. Raw string literals are often used in regular expressions that use character classes, and in HTML strings and XML strings. For examples, see the following article: Bjarne Stroustrup's FAQ on C++11.

// represents the string: An unescaped \ character
const char* raw_narrow = R"(An unescaped \ character)";
const wchar_t* raw_wide = LR"(An unescaped \ character)";
const char*       raw_utf8  = u8R"(An unescaped \ character)";
const char16_t* raw_utf16 = uR"(An unescaped \ character)";
const char32_t* raw_utf32 = UR"(An unescaped \ character)";

A delimiter is a user-defined sequence of up to 16 characters that immediately precedes the opening parenthesis of a raw string literal and immediately follows its closing parenthesis. You can use a delimiter to disambiguate strings that contain both double quotation marks and parentheses. This causes a compiler error:

// meant to represent the string: )”
const char* bad_parens = R"()")";

But a delimiter resolves it:

const char* good_parens = R"xyz()")xyz";

You can construct a raw string literal in which there is a newline (not the escaped character) in the source:

// represents the string: hello
//goodbye
const wchar_t* newline = LR"(hello
goodbye)";

std::string literals are Standard Library implementations of user-defined literals (see below) that are represented as "xyx"s (with a s suffix). This kind of string literal produces a temporary object of type std::string, std::wstring, std::u32string or std::u16string depending on the prefix that is specified. When no prefix is used, as above, a std::string is produced. L"xyz"s produces a std::wstring. u"xyz"s produces a std::u16string, and U"xyz"s produces a std::u32string.

//#include <string>
string str{ "hello"s };
string str2{ u8"Hello World" };
wstring str3{ L"hello"s };
u16string str4{ u"hello"s };
u32string str5{ U"hello"s };

The s suffix may also be used on raw string literals:

  u32string str6{ UR"(She said "hello.")"s };

std::string literals are defined in the namespace std::literals::string_literals in the <string> header file. Because std::literals::string_literals, and std::literals are both declared as inline namespaces, std::literals::string_literals is automatically treated as if it belonged directly in namespace std.

For ANSI char* strings (not UTF-8), the size (in bytes) of a string literal is the number of characters plus 1 (for the terminating null character). For all other string types, the size is not strictly related to the number of characters. UTF-8 uses up to four char elements to encode some code units, and char16_t or wchar_t encoded as Unicode 16 may use two elements (for a total of four bytes) to encode a single code unit. , . This shows the size of a wide string literal:

const wchar_t* str = L"Hello!";
const size_t byteSize = (wcslen(str) + 1) * sizeof(wchar_t);

Notice that strlen() and wcslen() do not include the size of the terminating null character, whose size is equal to the element size of the string type: one byte on a char* string, two bytes on wchar_t* or char16_t* strings, and four bytes on char32_t* strings.

The maximum length of a string literal is 65535 bytes. This limit applies to both narrow string literals and wide string literals.

Because string literals (not including std:string literals) are constants, trying to modify them—for example, str[2] = 'A'—causes a compiler error.

Microsoft Specific

In Visual C++ you can use a string literal to initialize a pointer to non-const char or wchar_t. This is allowed in C code, but is deprecated in C++98 and removed in C++11. An attempt to modify the string causes an access violation, as in this example:

wchar_t* str = L"hello";
str[2] = L'a'; // run-time error: access violation

You can cause the compiler to emit an error when a string literal is converted to a non_const character when you set the /Zc:strictStrings (Disable string literal type conversion) compiler option. It is a good practice to use the auto keyword to declare string literal-initialized pointers, because it resolves to the correct (const) type. For example, this example catches an attempt to write to a string literal at compile time:

auto str = L"hello";
str[2] = L'a'; // Compiler error: you cannot assign to a variable that is const.

In some cases, identical string literals may be pooled to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. To enable string pooling, use the /GF compiler option.

End Microsoft Specific

Adjacent wide or narrow string literals are concatenated. This declaration:

char str[] = "12" "34";

is identical to this declaration:

char atr[] = "1234";

and to this declaration:

char atr[] =  "12\
34";

Using embedded hexadecimal escape codes to specify string literals can cause unexpected results. The following example seeks to create a string literal that contains the ASCII 5 character, followed by the characters f, i, v, and e:

"\x05five"

The actual result is a hexadecimal 5F, which is the ASCII code for an underscore, followed by the characters i, v, and e. To get the correct result, you can use one of these:

"\005five"     // Use octal literal.
"\x05" "five"  // Use string splicing.

std::string literals, because they are std::string types, can be concatenated with the + operator that is defined for basic_string types. They can also be concatenated in the same way as adjacent string literals. In both cases, the string encoding and the suffix much match:

auto x1 = "hello" " " " world"; // OK
auto x2 = U"hello" " " L"world"; // C2308: disagree on prefix
auto x3 = u8"hello" " "s u8"world"s; // OK, agree on prefixes and suffixes
auto x4 = u8"hello" " "s u8"world"z; // C3688, disagree on suffixes

In all native string types and encodings, universal character names are represented with a prefix \U or \u followed by the code point.

u8” \U00000410 is same as \u0410” –UCNs inside UTF-8 encoded string
u” \U00000410 is same as \u0410” –UCNs inside UTF-16 encoded string
U” \U00000410 is same as \u0410” –UCNs inside UTF-32 encoded string

In C++11, universal character name support is extended to the char16_t* and char32_t* string types:

// ASCII smiling face
const char*     s1 = ":-)";  

// UTF-16 (on Windows) encoded WINKING FACE (U+1F609)
const wchar_t*  s2 = L"😉 = \U0001F609 is ;-)";  

// UTF-8  encoded SMILING FACE WITH HALO (U+1F607)
const char*     s3 = u8"😇 = \U0001F607 is O:-)";

// UTF-16 encoded SMILING FACE WITH OPEN MOUTH (U+1F603)
const char16_t* s4 = u"😃 = \U0001F603 is :-D";

// UTF-32 encoded SMILING FACE WITH SUNGLASSES (U+1F60E)
const char32_t* s5 = U"😎 = \U0001F60E is B-)";

In C++03, the language only allowed a subset of characters to be represented by their code points, and allowed some code points that didn’t actually represent any valid characters. This was fixed in the C++11 standard. In C++11, control characters (00-1F, 7F-9F) and the basic source characters (20-23  !"#, 25-3F %&'()*+,-./0123456789:;<=>?, 41-5F ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_, 61-7E abcdefghijklmnopqrstuvwxyz{|}~) can be represented by code points. Code points can no longer be used to represent characters within the range D800 through DFFF inclusive.

Surrogate Pairs

For Unicode surrogate pairs, just specify the code point and the compiler will generate a surrogate pair if required.

For more information about Unicode, see Unicode). For more information about surrogate pairs, see Surrogate Pairs and Supplementary Characters.

Show:
© 2015 Microsoft