2.2.3 [ISO10646] Section D.7, Incorrect sequences of octets: Interpretation by receiving devices


The specification states:

 According to D.2 an octet in the range 00 to 7F or C0 to FB is the first octet of a 
 UTF-8 sequence, and is followed by the appropriate number (from 0 to 5) of 
 continuing octets in the range 80 to BF. Furthermore, octets whose value is FE or 
 FF are not used; thus they are invalid in UTF-8.
 If a CC-data-element includes either:
 * a first octet that is not immediately followed by the correct number of 
 continuing octets, or
 * one or more continuing octets that are not required to complete a sequence of 
 first and continuing octets, or
 * an invalid octet,
 then according to D.2 such a sequence of octets is not in conformance with the 
 requirements of UTF-8. It is known as a malformed sequence. If a receiving device 
 that has adopted the UTF-8 form
 receives a malformed sequence, because of error conditions either:
 * in an originating device, or
 * in the interchange between an originating and a receiving device, or
 * in the receiving device itself,
 then it shall interpret that malformed sequence in the same way that it interprets 
 a character that is outside the adopted subset that has been identified for the 
 device (see sub-clause 2.3c).

All Document Modes (All Versions)

Incorrect octets are replaced with the character 0xFFFD.