Byte Classification

Each of these routines tests a specified byte of a multibyte character for satisfaction of a condition. Except where specified otherwise, the test result depends on the multibyte code page currently in use.

Note   By definition, the ASCII character set is a subset of all multibyte-character sets. For example, the Japanese katakana character set includes ASCII as well as non-ASCII characters.

The manifest constants in the following table are defined in CTYPE.H:

Multibyte-Character Byte-Classification Routines

Routine Byte Test Condition
isleadbyte Lead byte; test result depends on LC_CTYPE category setting of current locale
_ismbbalnum isalnum || _ismbbkalnum
_ismbbalpha isalpha || _ismbbkalnum
_ismbbgraph Same as _ismbbprint, but _ismbbgraph does not include the space character (0x20)
_ismbbkalnum Non-ASCII text symbol other than punctuation. For example, in code page 932 only, _ismbbkalnum tests for katakana alphanumeric
_ismbbkana Katakana (0xA1 – 0xDF), code page 932 only
_ismbbkprint Non-ASCII text or non-ASCII punctuation symbol. For example, in code page 932 only, _ismbbkprint tests for katakana alphanumeric or katakana punctuation (range: 0xA1 – 0xDF).
_ismbbkpunct Non-ASCII punctuation. For example, in code page 932 only, _ismbbkpunct tests for katakana punctuation.
_ismbblead First byte of multibyte character. For example, in code page 932 only, valid ranges are 0x81 – 0x9F, 0xE0 – 0xFC.
_ismbbprint isprint || _ismbbkprint. ismbbprint includes the space character (0x20)
_ismbbpunct ispunct || _ismbbkpunct
_ismbbtrail Second byte of multibyte character. For example, in code page 932 only, valid ranges are 0x40 – 0x7E, 0x80 – 0xEC.
_ismbslead Lead byte (in string context)
_ismbstrail Trail byte (in string context)
_mbbtype Return byte type based on previous byte
_mbsbtype Return type of byte within string

The MB_LEN_MAX macro, defined in LIMITS.H, expands to the maximum length in bytes that any multibyte character can have. MB_CUR_MAX, defined in STDLIB.H, expands to the maximum length in bytes of any multibyte character in the current locale.