W3C XML Schema Design Patterns: Dealing With Change

Article
08/30/2006

Dare Obasanjo
Microsoft Corporation

January 2003

Originally published on https://www.xml.com.

Applies to:
W3C XML Schema

Summary: W3C XML Schema provides a mechanism for specifying the structure and constraints on XML documents. In support of common usage patterns, this article focuses on techniques for building schemas that are flexible and allow evolution of the underlying data and schema in a modular manner. (15 printed pages)

Contents

Introduction
Using Wildcards to Create Open Content Models
Gaining Flexibility from Substitution Groups and Abstract Elements
Runtime Polymorphism via xsi:type and Abstract Types
Using xs:redefine to Update Type Definitions
Further Reading
Acknowledgements

Introduction

W3C XML Schema provides a mechanism for specifying the structure and constraints on XML documents. As usage of W3C XML Schema has grown, certain usage patterns have become common. This article focuses on techniques for building schemas that are flexible and allow evolution of the underlying data, schema, or both, in a modular manner.

Designing schemas that support data evolution is beneficial in situations where the structure of the XML documents being processed may change as the application is updated, but still need to be validated with the original schema. This is particularly important in scenarios where multiple entities share XML documents whose format may change over time but may not receive updated schemas.

There are also situations where dealing with change in the schema is important. For instance, ensuring that older versions of the XML document can be validated by newer versions of the schema is one such case. Another is where multiple entities share XML documents that have similar structure but significant domain-specific differences. An example of this is the address.xsd example in the W3C XML Schema Primer which describes a situation where a generic address format exists that can be extended to encompass regional address formats.

Using Wildcards to Create Open Content Models

W3C XML Schema provides the wildcards xs:any and xs:anyAttribute that can be used to allow the occurrence of elements and attributes from specified namespaces into a content model. Wildcards allow schema authors to enable extensibility of the content model while maintaining a degree of control over the occurrence of elements and attributes in the XML document.

The most important attributes for wildcards are the namespace and processContents attributes. The namespace attribute is used to specify what namespace the elements or attributes the wildcard matches can come from. The possible values for the namespace attribute are described in the Namespace Attribute In Any table in the XML Schema Primer. The processContents attribute is used to specify if and how the XML content matched by the wildcard should be validated. The possible values of the processContents attribute are described in the WildCard Schema Component section of the W3C XML Schema Recommendation.

The following is an example of a schema that uses wildcards to allow valid instances to add elements and attributes not specified in the schema:

      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
     elementFormDefault="qualified"> 
       <xs:element name="Customer"> 
        <xs:complexType> 
    <xs:sequence>
      <xs:element name="FirstName" type="xs:string" />
      <xs:element name="LastName" type="xs:string" />
      <xs:any namespace="##targetNamespace" processContents="strict" 
            minOccurs="0" maxOccurs="unbounded" />
      <xs:any namespace="##other" processContents="lax" minOccurs="0" 
            maxOccurs="unbounded" />
    </xs:sequence>
         <xs:attribute name="customerID" type="xs:integer" />
         <xs:anyAttribute namespace="##any" processContents="skip" />
   </xs:complexType>
       </xs:element> 
       <xs:element name="PhoneNumber" type="xs:string" />
       <xs:element name="FrequentShopper" type="xs:boolean" />
      </xs:schema>

The schema describes a Customer element that contains a FirstName and LastName element in sequence as well as possesses a CustomerID attribute. Additionally, two wildcards (xs:any elements) are used to specify that zero or more elements from the urn:xmlns:25hoursaday-com:customer namespace can appear after the customers name elements followed by zero or more elements from any other namespace. The attribute wildcard (xs:anyAttribute element) specifies that the Customer element can have attributes from any namespace. The wildcards now gives authors of instance documents the leeway to tailor their XML documents to their specific needs yet makes the content model rigid enough to satisfy a set of minimal constraints. The following are examples of documents that are valid with the above schema.

     <Customer  customerID="12345" xmlns="urn:xmlns:25hoursaday-com:customer">
      <FirstName>Dare</FirstName>
      <LastName>Obasanjo</LastName>
     </Customer>

     EXAMPLE 1
     
     <cust:Customer  customerID="12345" numPurchases="17" 
            xmlns:cust="urn:xmlns:25hoursaday-com:customer">
      <cust:FirstName>Dare</cust:FirstName>
      <cust:LastName>Obasanjo</cust:LastName>
      <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
     </cust:Customer>

     EXAMPLE 2

     <cust:Customer  customerID="12345" numPurchases="17" 
       xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
       xmlns:addr="urn:xmlns:25hoursaday-com:address" >
      <cust:FirstName>Dare</cust:FirstName>
      <cust:LastName>Obasanjo</cust:LastName>
      <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
      <addr:Address>2001 Beagle Drive</addr:Address>
      <addr:City>Redmond</addr:City>
      <addr:State>WA</addr:State>
      <addr:Zip>98052</addr:Zip>
     </cust:Customer>

     EXAMPLE 3

The third example is particularly interesting in that it combines elements from multiple vocabularies and allows users to validate the XML instance using different schemas, none of which complains about elements from a namespace they do not know about. Thus, applications that can only process various parts of the document can validate and manipulate the parts they know while ignoring the rest, which is a very important for extensibility. Secondly, if the format of the instance document changes and more customer information appears in later documents, they are still valid against the original schema, and any subsequent schemas, as long as elements and attributes that were originally declared (in this case FirstName, LastName and customerID) are not removed from the content model.

However, there are some caveats with using the xs:any wildcard. The first is that xs:any makes it easier to create non-deterministic content models by accident, which may be tricky to find in the schema. The following is an example of such a schema:

     
      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
     elementFormDefault="qualified"> 
       <xs:element name="Customer"> 
        <xs:complexType> 
    <xs:sequence>
      <xs:element ref="cust:FirstName" />
      <xs:element ref="cust:LastName" minOccurs="0" />
      <xs:any namespace="##targetNamespace" processContents="strict"   />   
    </xs:sequence>         
   </xs:complexType>
       </xs:element>    
       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />
      </xs:schema>

The previous schema is non-deterministic because when a LastName element is seen, the validator cannot tell whether the sequence is over or not because the element may be validated as the optional LastName element that follows a FirstName, or against the wildcard, which allows any element from the urn:xmlns:25hoursaday-com:customer namespace to appear.

Another caveat for dealing with wildcards is taking care in how one uses the namespace attribute of an xs:any or an xs:anyAttribute. In particular, one should take care of the ##other value for this attribute, which the Namespace Attribute In Any table in the XML Schema Primer describes as meaning, "Any well-formed XML that is not from the target namespace of the type being defined," which is not an entirely accurate description. In fact, ##other means, "Any well-formed XML that is not from the target namespace of the type being defined, excluding elements with no namespace."

Creating a wildcard that allows elements from any namespace besides the target namespaces involves using an xs:choice. The following is a schema that demonstrates this:

      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
     elementFormDefault="qualified"> 
       <xs:element name="Customer"> 
        <xs:complexType> 
    <xs:sequence>
      <xs:element ref="cust:FirstName" />
      <xs:element ref="cust:LastName" />
      <!-- allow any element except those from target namespace -->
      <xs:choice minOccurs="0" maxOccurs="unbounded" > 
      <xs:any namespace="##other" processContents="strict"  />   
      <xs:any namespace="##local" processContents="strict"  />
      </xs:choice>
    </xs:sequence>         
   </xs:complexType>
       </xs:element>    
       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
      </xs:schema>

Note that a choice is used because the ##other value for the namespace attribute of a wildcard cannot be combined with other values. This is described in the XML Representation Summary for the xs:any Element Information Item section of the W3C XML Schema Recommendation.

Gaining Flexibility from Substitution Groups and Abstract Elements

W3C XML Schema borrows a number of concepts from object-oriented programming, including the notions of abstract types, type substitutability, and polymorphism. Abstract elements and substitution groups allow schema authors to create or utilize schemas that define generic base types and extend these types to be more domain-specific without affecting the original schema.

A substitution group contains elements that can appear interchangeably in an XML instance document in a manner reminiscent of subtype polymorphism in object-oriented programming languages. Elements in a substitution group must be of the same type or have types that are members of the same type hierarchy. An element declaration that is marked 'abstract' indicates that a member of its substitution group must appear in its place in the instance document. The following is an example of a schema that defines an abstract element, and another that defines an element that is substitutable for the abstract element, and whose type is derived from that of the abstract element.

      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
          elementFormDefault="qualified"> 

       <xs:element name="Customers">
         <xs:complexType>
          <xs:sequence>
           <xs:element ref="cust:Customer" maxOccurs="unbounded" />
          </xs:sequence>
         </xs:complexType>
       </xs:element>

       <xs:element name="Customer" type="cust:CustomerType" abstract="true" /> 

        <xs:complexType name="CustomerType" > 
          <xs:sequence>
            <xs:element ref="cust:FirstName" />
            <xs:element ref="cust:LastName" />   
          </xs:sequence>         
          <xs:attribute name="customerID" type="xs:integer" />
         </xs:complexType>

       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />

      </xs:schema> 
     cust.xsd

      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          xmlns:addr="urn:xmlns:25hoursaday-com:address" 
          targetNamespace="urn:xmlns:25hoursaday-com:address" 
          elementFormDefault="qualified"> 

       <xs:import namespace="urn:xmlns:25hoursaday-com:customer" 
            schemaLocation="cust.xsd"/> 

       <xs:element name="MyCustomer" substitutionGroup="cust:Customer" 
            type="addr:MyCustomerType"  /> 

        <xs:complexType name="MyCustomerType" > 
         <xs:complexContent>
          <xs:extension base="cust:CustomerType">
           <xs:sequence>
            <xs:element ref="cust:PhoneNumber" /> 
            <xs:element ref="addr:Address" />    
            <xs:element ref="addr:City" />    
            <xs:element ref="addr:State" />    
            <xs:element ref="addr:Zip" />    
           </xs:sequence>     
          </xs:extension>
         </xs:complexContent>
   </xs:complexType> 

   <xs:element name="Address" type="xs:string" />
   <xs:element name="City" type="xs:string" />
   <xs:element name="State" type="xs:string" fixed="WA" />   

   <xs:element name="Zip">
    <xs:simpleType>
     <xs:restriction base="xs:token" >
      <xs:pattern value="[0-9]{5}(-[0-9]{4})?"/>
     </xs:restriction>
    </xs:simpleType>
   </xs:element>

     </xs:schema> 
     my_cust.xsd

The my_cust.xsd schema contains an addr:MyCustomer element declaration that can appear in instance documents in place of cust:Customer elements. Thus, the cust:Customers element can have addr:MyCustomer elements as children, but not cust:Customer elements, since they are abstract. The my_cust.xsd schema can validate the following XML instance document:

     <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
              xmlns:addr="urn:xmlns:25hoursaday-com:address">
      <addr:MyCustomer customerID="12345" >
       <cust:FirstName>Dare</cust:FirstName>
       <cust:LastName>Obasanjo</cust:LastName>
       <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
       <addr:Address>2001</addr:Address>
       <addr:City>Redmond</addr:City>
       <addr:State>WA</addr:State>
       <addr:Zip>98052</addr:Zip>
       </addr:MyCustomer>
      </cust:Customers>

Note that substitution groups allow the mixing of vocabularies, similar to wildcards, but without the original schema author having to specifically plan for them. The only consideration a schema author has to make is to globally declare elements that can participate in substitution groups. However, content models derived by restriction or extension are not as open as content models that use wildcards. Although this seems like a disadvantage, it actually is an advantage because it gives the schema author more control over the appearance and structure of additional content that may appear in valid XML instance documents.

Runtime Polymorphism via xsi:type and Abstract Types

Abstract types are complex type definitions that have true as the value of their abstract attribute indicating elements in an instance document that cannot be of that type, but instead must be replaced by another type derived by either restriction or extension. The xsi:type attribute can be placed on an element in an XML instance document to change its type, as long as the new type is in the same type hierarchy as the original type of the element. Although it is not necessary to use abstract types in conjunction with xsi:type, in situations where a generic format is being created for which most users will create domain-specific extensions, then they provide some benefit. The following is an example of a schema that declares an abstract type and an element that uses that as its type definition, followed by a schema that defines two types that derive from the abstract type.

       <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
           elementFormDefault="qualified"> 

       <xs:element name="Customers">
        <xs:complexType>
         <xs:sequence>
          <xs:element ref="cust:Customer" maxOccurs="unbounded" />
         </xs:sequence>
        </xs:complexType>
       </xs:element>

       <xs:element name="Customer" type="cust:CustomerType" /> 

        <xs:complexType name="CustomerType" abstract="true" > 
         <xs:sequence>
           <xs:element ref="cust:FirstName" />
           <xs:element ref="cust:LastName" />
              <xs:element ref="cust:PhoneNumber" minOccurs="0"/>   
         </xs:sequence>         
         <xs:attribute name="customerID" type="xs:integer" />
        </xs:complexType>

        <xs:element name="FirstName" type="xs:string" />
        <xs:element name="LastName" type="xs:string"  />  
        <xs:element name="PhoneNumber" type="xs:string" />

       </xs:schema> 
      cust.xsd

     <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
             elementFormDefault="qualified"> 

     <xs:include schemaLocation="cust.xsd"/> 

     <xs:complexType name="MandatoryPhoneCustomerType" > 
      <xs:complexContent>
        <xs:restriction base="cust:CustomerType">
         <xs:sequence>
           <xs:element ref="cust:FirstName" />
           <xs:element ref="cust:LastName" />
           <xs:element ref="cust:PhoneNumber" minOccurs="1" />
         </xs:sequence>           
        </xs:restriction>
       </xs:complexContent>
      </xs:complexType> 
    

        <xs:complexType name="AddressableCustomerType" > 
          <xs:complexContent>
          <xs:extension base="cust:CustomerType">
           <xs:sequence>    
            <xs:element ref="cust:Address" />    
            <xs:element ref="cust:City" />    
            <xs:element ref="cust:State" />    
            <xs:element ref="cust:Zip" />    
           </xs:sequence>     
          </xs:extension>
         </xs:complexContent>
        </xs:complexType> 

   <xs:element name="Address" type="xs:string" />
   <xs:element name="City" type="xs:string" />
   <xs:element name="State" type="xs:string" fixed="WA" />   

   <xs:element name="Zip">
    <xs:simpleType>
     <xs:restriction base="xs:string" >
      <xs:pattern value="\d{5}(-\d{4})?"/>
     </xs:restriction>
    </xs:simpleType>
   </xs:element>

     </xs:schema> 
     
     derived_cust.xsd

The Customer elements in the instance document validated by the schemas uses xsi:type to assert their type even though they are declared as being of the abstract CustomerType in the original schema. Note that both restrictions and extensions of the base type can be the targets of the xsi:type attribute.

     <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
      xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" >
      <cust:Customer customerID="12345" xsi:type="cust:MandatoryPhoneCustomerType" >
       <cust:FirstName>Dare</cust:FirstName>
       <cust:LastName>Obasanjo</cust:LastName>
       <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
      </cust:Customer>
      <cust:Customer customerID="67890" xsi:type="cust:AddressableCustomerType" >
       <cust:FirstName>John</cust:FirstName>
       <cust:LastName>Smith</cust:LastName>
       <cust:Address>2001</cust:Address>
       <cust:City>Redmond</cust:City>
       <cust:State>WA</cust:State>
       <cust:Zip>98052</cust:Zip>
       </cust:Customer>
      </cust:Customers>

Type substitutability and polymorphism will become even more beneficial once type-aware XML processing becomes commonplace, which should occur soon after XQuery 1.0 and XSLT 2.0 become standardized. To further engender extensibility, applications can combine both abstract types and abstract elements in a type hierarchy by creating abstract elements whose type definitions are abstract.

Using xs:redefine to Update Type Definitions

W3C XML Schema provides a mechanism for updating a type definition in a process whereby the type effectively derives from itself. The xs:redefine that is used for redefinition performs two tasks. The first is that it acts as an xs:include element by bringing in declarations and definitions from another schema document and makes them available as part of the current target namespace. The included declarations and types must be from a schema with the same target namespace or that has no namespace. Secondly, types can be redefined in a manner similar to type derivation with the new definition replacing the old one.

The following is an example of type redefinition showing the included and including schemas as well as a valid instance document for the schemas.

      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
          elementFormDefault="qualified"> 

       <xs:element name="Customers">
        <xs:complexType>
         <xs:sequence>
          <xs:element ref="cust:Customer" maxOccurs="unbounded" />
         </xs:sequence>
        </xs:complexType>
       </xs:element>

       <xs:element name="Customer" type="cust:CustomerType" /> 

        <xs:complexType name="CustomerType"> 
          <xs:sequence>
           <xs:element ref="cust:FirstName" />
           <xs:element ref="cust:LastName" />
          </xs:sequence>         
          <xs:attribute name="customerID" type="xs:integer" />
         </xs:complexType>

       <xs:element name="FirstName" type="xs:string" />
       <xs:element name="LastName" type="xs:string"  />  
       <xs:element name="PhoneNumber" type="xs:string" />

      </xs:schema> 
     cust.xsd


      <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema" 
          xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
          targetNamespace="urn:xmlns:25hoursaday-com:customer" 
          elementFormDefault="qualified"> 

       <xs:redefine schemaLocation="cust.xsd"> 

     <xs:complexType name="CustomerType" > 
       <xs:complexContent>
          <xs:extension base="cust:CustomerType">
           <xs:sequence>
            <xs:element ref="cust:PhoneNumber" />
           </xs:sequence>           
          </xs:extension>
         </xs:complexContent>
        </xs:complexType> 
    
    </xs:redefine> 
   </xs:schema> 
     redefined_cust.xsd
     
    <cust:Customers xmlns:cust="urn:xmlns:25hoursaday-com:customer" 
      xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" >
      <cust:Customer customerID="12345" >
       <cust:FirstName>Dare</cust:FirstName>
       <cust:LastName>Obasanjo</cust:LastName>
       <cust:PhoneNumber>425-555-1234</cust:PhoneNumber>
      </cust:Customer>
      <cust:Customer customerID="67890" >
       <cust:FirstName>John</cust:FirstName>
       <cust:LastName>Smith</cust:LastName>
        <cust:PhoneNumber>425-555-5555</cust:PhoneNumber>
       </cust:Customer>
     </cust:Customers>
     cust.xml

Type redefinition is pervasive because it not only affects elements in the including schema, but also those in the included schema as well. Thus, all references to the original type in both schemas refer to the redefined type, while the original type definition is overshadowed. This causes a certain degree of fragility because redefined types can adversely interact with derived types and generate conflicts. A common conflict is when a derived type uses extensions to add an element or attribute to a type's content model and a redefinition also adds a similarly named element or attribute to the content model. Such a conflict would have occurred if either of the schemas shown has a type derived from the CustomerType via an extension that added a PhoneNumber element of a different type than that in the redefinition.

Acknowledgements

I would like to thank Priya Lakshminarayanan and Jeni Tennison for their help in proofreading and suggesting content for this article.