2.2.14 [XML] Section 4.2.2, External Entities


The specification states:

 It is an error for a fragment identifier (beginning with a # character) to be part 
 of a system identifier. Unless otherwise provided by information outside the scope 
 of this specification (e.g. a special XML element type defined by a particular DTD, 
 or a processing instruction defined by a particular application specification), 
 relative URIs are relative to the location of the resource within which the entity 
 declaration occurs. This is defined to be the external entity containing the '<' 
 which starts the declaration, at the point when it is parsed as a declaration. A 
 URI might thus be relative to the document entity, to the entity containing the 
 external DTD subset, or to some other external parameter entity. Attempts to 
 retrieve the resource identified by a URI may be redirected at the parser level 
 (for example, in an entity resolver) or below (at the protocol level, for example, 
via an HTTP Location: header). In the absence of additional information outside the
scope of this specification within the resource, the base URI of a resource is 
always the URI of the actual resource returned. In other words, it is the URI of 
the resource retrieved after all redirection has occurred.


A fragment identifier that begins with a number sign (#) is allowed in a system identifier to refer to external entity.


When the system identifier contains a relative URI, its base URI is the original URI of the referrer document before redirection.


The specification states:

 System identifiers (and other XML strings meant to be used as URI references) may 
 contain characters that, according to [IETF RFC 3986], must be escaped before a URI 
 can be used to retrieve the referenced resource. The characters to be escaped are 
 the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML), 
 space #x20, the delimiters '<' #x3C, '>' #x3E and '"' #x22, the unwise characters 
 '{' #x7B, '}' #x7D, '|' #x7C, '\' #x5C, '^' #x5E and '`' #x60, as well as all 
 characters above #x7F. Since escaping is not always a fully reversible process, it 
 MUST be performed only when absolutely necessary and as late as possible in a 
 processing chain.


The characters '{', '}', '|', '\', '^', or '`' (and their equivalent character entities #x7B, #x7D, #x7C, #x5C, #x5E, and #x60 respectively) are not escaped in a URI reference. These unwise characters are treated as string and are passed along as is.