Export (0) Print
Expand All

2.2.3 [ISO10646] Section D.7, Incorrect sequences of octets: Interpretationby receiving devices


The specification states:

According to D.2 an octet in the range 00 to 7F or C0 to FB is the first octet of a 
UTF-8 sequence, and is followed by the appropriate number (from 0 to 5) of 
continuing octets in the range 80 to BF. Furthermore, octets whose value is FE or 
FF are not used; thus they are invalid in UTF-8.

If a CC-data-element includes either:
* a first octet that is not immediately followed by the correct number of 
continuing octets, or
* one or more continuing octets that are not required to complete a sequence of 
first and continuing octets, or
* an invalid octet,
then according to D.2 such a sequence of octets is not in conformance with the 
requirements of UTF-8. It is known as a malformed sequence. If a receiving device 
that has adopted the UTF-8 form
receives a malformed sequence, because of error conditions either:
* in an originating device, or
* in the interchange between an originating and a receiving device, or
* in the receiving device itself,
then it shall interpret that malformed sequence in the same way that it interprets 
a character that is outside the adopted subset that has been identified for the 
device (see sub-clause 2.3c).

All Document Modes (All Versions)

Incorrect octets are replaced with the character 0xFFFD.

© 2015 Microsoft