2.2.1.1 Character Sequences
In all dialects prior to NT LAN Manager, all character sequences were encoded using the OEM character set (extended ASCII). The NT LAN Manager dialect introduced support for Unicode, which is negotiated during protocol negotiation and session setup. The use of Unicode characters is indicated on a per-message basis by setting the SMB_FLAGS2_UNICODE flag in the SMB_Header.Flags2 field. All Unicode characters MUST be in UTF-16LE encoding.
In CIFS, character sequences are transmitted over the wire as arrays of either UCHAR (for OEM characters) or WCHAR (for Unicode characters). Throughout this document, null-terminated character sequence fields that may be encoded in either Unicode or OEM characters (depending on the result of Unicode capability negotiation) are labeled as SMB_STRING fields.
Unless otherwise noted, when a Unicode string is passed it MUST be aligned to a 16-bit boundary with respect to the beginning of the SMB Header (section 2.2.3.1). In the case where the string does not naturally fall on a 16-bit boundary, a null padding byte MUST be inserted, and the string MUST begin at the next address. For Core Protocol messages in which a buffer format byte precedes a Unicode string, the padding byte is found after the buffer format byte.
String fields that restrict character encoding to OEM characters only, even if Unicode support has been negotiated, are labeled as OEM_STRING. Some examples of strings that are never passed in Unicode are:
-
The dialect strings in the SMB_COM_NEGOTIATE (section 2.2.4.52) command.
-
The service name string in the SMB_COM_TREE_CONNECT_ANDX (section 2.2.4.55) command.