Export (0) Print
Expand All
Expand Minimize

Death, Taxes, and Relational Databases, Part 1

 

Andrew Conrad
Microsoft Corporation

April 15, 2003

Summary: Showcases the XML functionality of SQL Server 2000 by turning SQL Server into an object repository using XML as the bridge between relational and object-oriented data. (23 printed pages)


Download the ObjectStorePart1.exe sample file.

Requirements   This download requires Microsoft.NET Framework 1.0, Microsoft SQL Server 2000, and SQLXML 3.0 Service Pack 1
Editor's Note   This month's installment of Extreme XML is written by guest columnist Andrew Conrad. Andy is a Software Design Engineer with the WebData XML team. He has been working on SQL Server 2000 XML support and the SQLXML Web releases for the last four years. He can be reached at aconrad@microsoft.com.

Introduction

In the world of XML enterprise development, nothing is certain but death, taxes, and having to work with a relational back-end data store. And just like death and taxes, it can be a very painful endeavor. The problem is translating between the hierarchical, semi-structured XML data model and the normalized relational data model. The former is well suited for data exchange and modeling business data, the latter for persistence storage. One can only speculate that if Ben Franklin were alive today he would be developing loosely-coupled architectures that leveraged the extensible nature of the XML data model—and loathing the one certainty in enterprise development, having to integrate with the antiquated relational database (in addition to life's other "certainties"—death and taxes). However, there is a solution—SQLXML. SQLXML 3.0 is add-on product for SQL Server 2000 that enhances SQL Server 2000 XML support. It enables the developers to bridge the gap between XML and relational databases by providing a number of features that translate between the two data models.

Note   This column is based on the SQL Server XML functionality that is available in SQLXML 3.0. Since SQL Server 2000 was released in fall 2000, the WebData group at Microsoft has provided three fully supported upgrades to enhance and improve the XML features. These frequent releases are a result of customer demand and the fast-changing world of XML standards. For example, Web Release 3 adds support for W3C XML Schema, SOAP, and the .NET Framework. To download the latest Web release or to find more information on the new features offered in the XML for SQL Server Web Releases, see http://www.msdn.microsoft.com/sqlxml.

In general, the SQLXML feature set can be divided into two major areas:

  1. Server side functionality which can be accessed using Transact-SQL extensions
  2. Client side/ Middle tier functionality accessible through ADO, HTTP, or .NET data provider.

This article will concentrate on the server side functionality, and show how to leverage that functionality to turn SQL Server into an object repository. In part two of this article, the server-side will be replaced with client-side features that provide equivalent functionality. Both articles will also discuss the merits and pitfalls of each solution and how the functionality can be leveraged to solve real world use cases.

Note   This article assumes the reader has some knowledge of SQLXML functionality. For an introduction to the product, please see A Survey of Microsoft SQL Server 2000 XML Features.

XML Serialization

One of the features provided by the Microsoft .NET Framework is the ability to serialize common language runtime objects to XML and vice versa. Serialization is the process of converting an object's state into a format that can be transported, persisted, and/ or reconstructed. The .NET Framework provides several classes in the System.Xml.Serialization namespace for serializing the state of an object as XML. Through the use of XML serialization, it is possible to serialize the state of an entire graph of business objects. In other words, create an XML document that represents the logical state of a given application. This is useful for a variety of purposes, for example, persisting the state of an application at a checkpoint so that the user can back out of changes or recover from errors, or even transporting the state of a application across the network. This article will develop a sample object repository that leverages the XML extensions of SQL Server for transforming between XML and relational data structures.

Although this article highlights a few features of XML serialization in the .NET Framework it is not intended to be an XML Serialization primer. For a far more in depth look at XML serialization, see XML Serialization in the .NET Framework .

The Sample

Imagine the task of developing an employee tracking application that contains employee information. For version 1, the project team wants to make sure it can deliver on time so the business layer contains only two classes:

public class Employee
{
private int _employeeID = -1;
   private string _lastName = null;
   private string _firstName = null;
   private string _title = null;
   public ArrayList _addressList = new ArrayList();        
 
   public Employee() {}
public Employee(int employeeID, string firstName, string lastName, string title)
   {
        _employeeID = employeeID;
      _firstName = firstName;
       _lastName = lastName;
       _title = title;
   }
    public string FName
     {
       get { return _firstName;}
       set { _firstName = value;}
    }  
    public string LName
{
       get { return _lastName;}
       set { _lastName = value;}
   }  
   public int ID
     {
        get { return _employeeID;}
        set { _employeeID = value;}
     }  
   public string Title
   {
      get { return _title;}
      set { _title = value;}
   }  
}
 
public class EmployeeAddress
{
    private string _streetAddress = null;
     private string _city = null;
     private string _state = null;
     private string _zip = null;
    private AddressType _type;

     public EmployeeAddress(){}
   public EmployeeAddress(AddressType type, string streetAddress, string 
   city, string state, string zip)
   {
      _type = type;
        _streetAddress = streetAddress;
       _city = city;
      _state = state;
      _zip = zip;
   }
   public string Street
   {         
get { return _streetAddress; }
      set {  _streetAddress = value; }
   }
   public string City
   {
       get { return _city; }
        set {  _city = value; }
     }     
    public string State
     {
        get { return _state; }
        set {  _state = value; }
   }
     public string Zip
   {         
get { return _zip; }
      set {  _zip = value; }
   }
   public AddressType Type
   {
        get { return _type; }
      set {  _type = value; }
   }
}

An Employee class that contains an employee ID, first and last name, and title for the employee—fairly standard stuff. In addition, each employee contains zero to many address objects (stored in the Employee's _addressList ArrayList) with all the standard address information. Again, fairly simple stuff but don't worry, the customer is asking for a lot in version 2, so the project will be extending the business logic object model soon enough.

Following standard database design, the following relational tables have been designed for persistence storage of the business objects:

CREATE TABLE [dbo].[Employee] (
   [EmployeeID] [int] NOT NULL ,
   [LastName] [nvarchar] (50) NULL ,
   [FirstName] [nvarchar] (50) NULL ,
   [Title] [nvarchar] (50) NULL ,
   [Overflow] [ntext] NULL 
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

CREATE TABLE [dbo].[Address] (
   [EmployeeID] [int] NOT NULL ,
   [AddressType] [nvarchar] (10) NOT NULL ,
   [Street] [nvarchar] (200) NULL ,
   [City] [nvarchar] (100) NULL ,
   [State] [nvarchar] (2) NULL ,
   [Zip] [nvarchar] (20) NULL 
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Address] ADD 
   CONSTRAINT [PK_Address] PRIMARY KEY  CLUSTERED 
   (
      [EmployeeID],
      [AddressType]
   )  ON [PRIMARY] 
GO

ALTER TABLE [dbo].[Employee] ADD 
   CONSTRAINT [PK_Employee] PRIMARY KEY  CLUSTERED 
   (
      [EmployeeID]
   )  ON [PRIMARY] 
GO

ALTER TABLE [dbo].[Address] ADD 
   CONSTRAINT [FK_Address_Employee] FOREIGN KEY 
   (
      [EmployeeID]
   ) REFERENCES [dbo].[Employee] (
      [EmployeeID]
   )
GO

In general, for this version of the application, the business logic classes and the database tables are fairly similar. Really the only issues here are that some of the property and the column names are different, and that the Address table contains EmployeeID as a foreign key. Standard problems when mapping objects to relational objects and overall, not particularly challenging. In fact, it is possible to use any of a number of standard database access mechanisms for persisting changes to the relational database. However, the goal here is to introduce a course-grained data persistence layer that the application can use to persist the entire business object hierarchy as needed. This design is necessary because the application's most common use case pattern will be that:

  1. The users will select out a particular set of data.
  2. View it for an indefinite amount of time.
  3. Persist changes to the business objects in batch.

In other words, a classic disconnected client scenario and one in which a coarse-grained persistence interface will work best. Further, by abstracting out the fine grain control needed to persist the business objects to the relational database, the consumer code can worry about domain specific business logic and is not coupled to either the data source or the process needed to update the data source.

One other (currently) innocuous detail of the relational metadata is the Overflow field in the Employee table. From the initial design, it is not clear why it is included. The truth is that it was added for extensibility, another goal of the design. How it accomplishes this is a different question and will be answered later in the article.

An Object Repository

In an effort to isolate the business logic from the database access code, an object repository is going to be created. This repository will be responsible for persisting and creating business objects. Basically, the consumer of the repository will be provided with an interface for the CRUD operations (create, retrieve, update, and delete) and will look like an in-memory collection of business objects. The API for my object repository is as follows:

public class ObjectRepository
{    
   public ObjectRepository(String connectionString);
public Employee[] GetEmployee(string lastName);
public void PersistEmployee(Employee employee); 
public void PersistEmployees(Employee[] employee);
public void RemoveEmployee(Employee employee); 
}

This class includes methods for retrieving Employee objects based on the employee's last name, persisting changes to an array of Employee object, and removing Employee objects. Note that a helper method is provided for persisting a single Employee object. Under the covers, this method just wraps the single Employee object in an array and uses the same code as the method for persisting an array of employee.

From the object repository API, the consumer is not coupled at all with the underlying data source; they just query for objects, run their business logic, and then persist the changes. This allows object repository to change the underlying storage mechanism without affecting the consumer's code and provides a very object oriented view of the persistence layer for the business logic developer to execute the CRUD operations against. In fact, the ObjectRespository class could implement the IList or IDictionary interfaces and behave like the collections classes in the frameworks. One further benefit of this abstraction is that the developer could cache data and reduce the trips to the database without changing the interface.

So, let's walk through creating a new employee and persisting to the object repository. First, the code simply creates an Employee object and passes it to the object repository through the PersistEmployee method:

ObjectRepository  repository = new ObjectRepository(_connectionString);
Employee emp = new Employee(1, "Jeffrey", "Jones", "SYSTEMS ANALYST");
emp._addressList.Add((object)new EmployeeAddress(AddressType.Home, "111 
Oak", "Redmond", "WA", "98052"));
emp._addressList.Add((object)new EmployeeAddress(AddressType.Work, 
"1111 North Corporate Road", "Redmond", "WA", "98052"));
repository.PersistEmployee(emp);

Next the object repository code wraps the employee in an Employee array (so that persisting a single employee and an array of employees can share the same code), serializes the Employee array to XML, and sends the XML to the PersistEmployees stored procedure (described below in the "Heavy Lifting" section) in the database:

private void PersistEmployee(Employee employee)
{
Employee[]employees = new Employee[1] {employee};
      MemoryStream stream = new MemoryStream();
      XmlRootAttribute xmlRoot = new XmlRootAttribute();
      xmlRoot.ElementName = "EmployeeList";
      Type type = employees.GetType();
      XmlSerializer serializer = new XmlSerializer(type, xmlRoot);
      serializer.Serialize(stream, (object) employees);
      stream.Position = 0;
      PersistChanges( new StreamReader(stream).ReadToEnd(), false); 
}
 
private void PersistChanges(string serializedEmployees, bool delete)
{
SqlXmlCommand command = new SqlXmlCommand(_SqlXmlconnectionString);
      command.CommandType = SqlXmlCommandType.Sql;
      command.CommandText = "exec PersistEmployees ?, ?";
      SqlXmlParameter param = command.CreateParameter();
      param.Value = serializedEmployees;            
      SqlXmlParameter param2 = command.CreateParameter();
      param2.Value = delete ? 1 : 0;  
      command.ExecuteNonQuery();
}

Notice when creating the XMLSerializer, an XmlRootAttribute is included. This is so the name of the root element can be explicitly specified. If an XmlRootAttribute is not specified, then XML Serialization uses the name ArrayOfEmployee and thereby means the type of collection that the Employee objects are stored in can't be changed without changing the serialization format.

Also note that when passing the serialized object hierarchy onto the database, a bit value is included to indicate whether the objects being serialized should be removed from the database or if the list should be interpreted as inserts or updates (depending on whether the object already exists in the database). It would be possible to remove this flag and have multiple stored procedures in the database, but that would have meant different command text would have had to been specified depending on the operation, so it wouldn't have made the design any better.

A quick look at the Employee class shows how the design is handling the difference member variable names and the column names in the database. Basically by adding the XmlElement attribute, one can control the names that are used in the XML that is generated for the object state:

public class Employee
{
   [XmlArrayItem(ElementName= "Address", Type=typeof(EmployeeAddress))]
     [XmlArray("AddressList")]
     public ArrayList _addressList = new ArrayList();

[XmlElement(ElementName = "FirstName")]
     public string FName
     {
      get { return _firstName;}
      set { _firstName = value;}
    }     
     [XmlElement(ElementName = "LastName")]
     public string LName
    {
        get { return _lastName;}
       set { _lastName = value;}
     }  
    [XmlElement(ElementName = "EmployeeID")]
     public int ID
    {
      get { return _employeeID;}
       set { _employeeID = value;}
     }     
     public string Title
     {
      get { return _title;}
      set { _title = value;}
     }     
   ...
}
 

Note that these names specified by the XmlElement attribute don't necessarily need to match the names in the database. The database layer will be able to deal with the name variation. However, by providing different names than the property names, one can introduce a degree of freedom in that the property names could change without having to change the object repository or the database layer.

One other detail of interest is that the XmlArrayItem attribute has been explicitly included for the _addressList member. If it is not, an error is raised from the XmlSerializer object saying that the type EmployeeAddress was unexpected when trying to serialize the Employees. Although there are several ways to solve this problem, it is easiest to include an XmlArrayItem attribute to let XML Serialization know about the EmployeeAddress type.

The object retrieval code is fairly similar to the persistence code, however, basically in reverse. In this case, the data layer sends an array of employee objects in the same XML format that was used to persist the changes:

public Employee[] GetEmployee(string lastName) 
{
SqlXmlCommand command = new SqlXmlCommand(_SqlXmlconnectionString);
      command.CommandType = SqlXmlCommandType.Sql;
      command.CommandText = "exec FetchEmployee ?";
        SqlXmlParameter param = command.CreateParameter();
      param.Value = lastName;
      return DeserializeEmployee(command.ExecuteXmlReader());
}
 
private Employee[] DeserializeEmployee(XmlReader reader)
{
      XmlRootAttribute xmlRoot = new XmlRootAttribute();
      xmlRoot.ElementName = "EmployeeList";
      XmlSerializer serializer = new XmlSerializer(typeof(Employee[]),xmlRoot);
      Employee[] emps = (Employee[]) serializer.Deserialize(reader);
      reader.Close();
      return emps;
}
 

The slick thing here is that it is not necessary to write directional code for deserializing the XML as business objects. The same code works in both directions. In general, the object repository is a rather simple design. It merely serializes the objects to and from XML and passes them onto to the database, thereby showing the first of the two advantages of choosing XML as the transport mechanism for this scenario—that XML Serialization can provide a majority of the client side plumbing code without much programming effort.

A Quick Look at OpenXML

To transform the serialized objects to relational data structures, the object repository is going to utilize the OpenXML feature of SQL Server 2000 XML support. OpenXML provides the ability to represent XML documents as in-memory rowsets inside the SQL Server process. Because OPENXML is a rowset provider, it can be used in Transact- Sql statements wherever rowset providers (table, view, OPENROWSET function) can be used. Before getting back to the object repository example, let's take a closer look at OpenXML. The best way to illustrate the functionality is through a few simple examples:

DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc ='
<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
   <Order OrderID="10248" CustomerID="VINET" EmployeeID="5" 
           OrderDate="1996-07-04T00:00:00">
      <OrderDetail ProductID="11" Quantity="12"/>
      <OrderDetail ProductID="42" Quantity="10"/>
   </Order>
</Customer>
</ROOT>'
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

--selecting order details from document
SELECT *
FROM OPENXML (@idoc, '/ROOT/Customer/Order/OrderDetail')
      WITH (ProdID      int         '@ProductID',
                Qty         int         '@Quantity')

EXEC sp_xml_removedocument  @idoc

Selects Order Details rows from XML:

ProdID      Qty         
----------- ----------- 
11            12
42            10

(2 row(s) affected)

The XML document in this example consists of the Root, Customer, Order, and OrderDetail elements. The OPENXML statement retrieves OrderDetail elements in to a two-column rowset (ProdID and Qty). First, the sp_xml_preparedocument stored procedure is called to obtain a document handle. This document handle is then passed to the OPENXML statement. In the OPENXML statement, the rowpattern (/ROOT/Customer/Order/OrderDetail) identifies the OrderDetail elements to process. The WITH clause contains the mapping between the XML nodes and the rowset columns.

Note   This article is not intended to be a complete tutorial of OpenXML. Instead, I am going to highlight a few of the features that are going to help develop the object repository for the article's sample employee application. For further information on OpenXML, please see Writing XML Using OPENXML .

One key feature of OpenXML is the ability to shred XML into relational tables. By providing a mapping between the XML and the relational metadata, one can easily transform the tree represented by the XML into relational table(s) that can be manipulated and queried by standard relational mechanisms. In the case of the object repository, this functionality can be leverage to shred the serialized view of our business objects into standard relational format. This is accomplished by providing the ColPattern parameter in the WITH clause of the OPENXML statement. The ColPattern is specified by providing a ColName (name) and ColType (database type) for the relational column and an XPath pattern to indicate the elements or attributes to map to in the XML document. From the example above, the ProdID and Qty columns in the rowset map to the ProductID and Quantity attributes of the Order Detail element.

Notice that because of the ability to specify an XPath pattern to identify the nodes in the XML tree, one can easily flatten the XML tree into relational rowsets. This is possible because the XPath language is specifically designed for navigation of hierarchical, semi-structured data. To illustrate the flexibility of this feature, let's extend the example above to select Customer and Order rows:

DECLARE @idoc int
DECLARE @doc varchar(1000)
SET @doc ='
<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
   <Order OrderID="10248" EmployeeID="5" 
           OrderDate="1996-07-04T00:00:00">
      <OrderDetail ProductID="11" Quantity="12"/>
      <OrderDetail ProductID="42" Quantity="10"/>
   </Order>
</Customer>
</ROOT>'
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

-- select customer rows
SELECT *
FROM OPENXML (@idoc, '/ROOT/Customer')
      WITH (CustomerID  varchar(10) '@CustomerID',
            Quantity       int           
            'Order/OrderDetail[@ProductID=''11'']/@Quantity',
            Contact    varchar(50) '@ContactName')

-- select order details 
SELECT *
FROM OPENXML (@idoc, '/ROOT/Customer/Order/OrderDetail')
      WITH (OrderID     int         '../@OrderID',
            OrderDate   datetime    '../@OrderDate',
            ProdID      int         '@ProductID',
            Qty         int         '@Quantity')

EXEC sp_xml_removedocument  @idoc

Query Results:

CustomerID Quantity      Contact                                            
---------- -----------  -------------------------------------------------- 
VINET      12           Paul Henriot

(1 row(s) affected)

OrderID     OrderDate                    ProdID      Qty         
----------- -----------------------    ----------- ----------- 
10248       1996-07-04 00:00:00.000     11          12
10248       1996-07-04 00:00:00.000     42          10

(2 row(s) affected)

Two key things to note here. First of all, several fields were added into the Order Details rows, specifically the OrderID, CustomerID, and OrderDate columns in the rowset map to the attributes of the parent Order element. Most likely the OrderID field is a foreign key in the Order Details rows, so the ability to navigate into the parent is leveraged to translate the hierarchical XML relationship into a relation PK-FK relationship. Second, for the Customer rows, the OpenXML statement has navigated down do the Order Details elements and ran a predicate against the ProductID attribute to filter the Order Details elements, and return the Quantity attribute for the a single Order Detail. Although, it probably doesn't make a whole lot of sense with the provided sample data, it is easy to see the power here.

One other significant feature of OpenXML is the ability to consume open content, specifically elements and attributes of the XML that are not explicitly mapped in the OpenXML statement. Here is an example of doing so:

DECLARE @idoc int
DECLARE @doc varchar(1000)
-- sample XML document
SET @doc ='
<root>
   <Customer cid="C2" name="Ursula" city="Oelde" >
      <Order oid="O3" date="7/14/1999" amount="100" note="Wrap it blue white red">
          <Urgency>Important</Urgency>
      </Order>
      <Order oid="O4" date="1/20/1996" amount="10000"/>
   </Customer>
</root>'

EXEC sp_xml_preparedocument @idoc OUTPUT, @doc

-- selecting orders with open content
SELECT *
FROM OPENXML (@idoc, '/root/Customer/Order', 8)
      WITH (oid char(5), 
            date datetime,
            comment ntext '@mp:xmltext')
EXEC sp_xml_removedocument @idoc

Query results:

oid     date                  comment                                     
-----  ----------------------   --------------------------------------------
O3    1999-07-14 00:00:00.000 <Order amount="100" note="Wrap it blue 
white red">
<Urgency>Important</Urgency></Order>
O4    1996-01-20 00:00:00.000 <Order amount="10000"/>

Here the code is selecting Order rows and has specified that the XML content of the Order element should be put in the comment field by mapping the comment field to the special @mp:xmltext attribute. Further, by passing the value '8' to the flags parameter of the OpenXML command, the code has specified that only unconsumed XML content of the Order element should to be sent to the comment field. This @mp:xmltext is called a metaproperty and has special meaning to the OpenXML command that allows it to process XML documents with open content. In other words, content that is not known about during development of the database persistence code thereby allows the XML to be modified or extend without having to update the database code. This is a feature that will be leveraged by this article's sample application to deal with extension of the business objects.

The Heavy Lifting

So let's take a look at the stored procedure used by the object repository to persist changes to the database:

CREATE PROCEDURE PersistEmployees 
   @employeeXML nvarchar(4000),
   @remove bit = 0 
AS

DECLARE @hdoc int

exec sp_xmlxml_preparedocument @hdoc OUTPUT, @employeeXML

DELETE Address where EmployeeID in
(SELECT EmployeeID
FROM OpenXML(@hdoc,'/EmployeeList/Employee')
     WITH (EmployeeID     int 'EmployeeID'))

DELETE Employee where EmployeeID in
(SELECT EmployeeID
FROM OpenXML(@hdoc,'/EmployeeList/Employee')
     WITH (EmployeeID     int 'EmployeeID'))

if (@remove = 0)
BEGIN
    INSERT INTO Employee(EmployeeID, LastName, FirstName, Title, Overflow)
    SELECT EmployeeID, LastName, FirstName, Title, Overflow 
    FROM OpenXML(@hdoc,'/EmployeeList/Employee', 8)
    WITH (EmployeeID         int                'EmployeeID',
            LastName        nvarchar(100)       'LastName',
              FirstName       nvarchar(100)       'FirstName',
              Title           nvarchar(100)       'Title',
              Address         nvarchar(1)       'AddressList',
            Overflow        ntext               '@mp:xmltext')

    INSERT INTO Address(EmployeeID, AddressType, Street, City, State, Zip)
    SELECT EmployeeID, AddressType, Street, City, State, Zip
    FROM OpenXML(@hdoc, '/EmployeeList/Employee/AddressList/Address', 8)
    WITH (EmployeeID   int       '../../EmployeeID',
              AddressType     nvarchar(10)      'Type',
              Street          nvarchar(200)      'Street',
              City            nvarchar(100)      'City',
              State           nvarchar(4)         'State', 
              Zip             nvarchar(20)      'Zip')
END

The first thing to note is that there is a single stored procedure to perform insert, update, and delete operations. The stored procedure contains two parameters; the serialized list of employees as XML (@employeeXML) and a bit that specifies whether to delete the employees or not (@remove). Independent of whether the @remove is true or not, the first thing the stored procedure does is remove rows from the Employee and Address tables based on the EmployeeID. Note that these delete statements are set based, and may remove more then a single employee. Then if @remove is false, rows are inserted into the Employee and Address tables. This is a fairly crude way to handle update cases (delete then insert), but should suffice for now. Like the delete statement, multiple Employee and Address rows could be added thereby allowing the entire stored procedure to handle changes to multiple Employees all at once.

Notice that handling the primary key/foreign key relationship between the Employee and Address table is actually quite straightforward. All that is needed is to insert the Employee first, and then in the insert statement for the Address table navigate to the parent of the Address element to retrieve the primary key EmployeeID value. So although the Address element does not contain an explicit foreign key value, the hierarchy preserves the relationship in the OpenXML statement and makes it possible to persist the relationship into the relational tables.

Another interesting aspect of the stored procedure is that the @mp:xmltext property is specified so that all unconsumed content of the Employee element is put into the overflow column of the Employee table. For now, this doesn't really do much since all of the Employee element's content is consumed by the OpenXML statement. In other words, all the child elements and attributes of Employee are mapped in the WITH clause. (Note that a mapping for the AddressList element is included even though it is not used by the insert statement. That way the addresses are kept out of the overflow since those entities will be inserted those into the Address table.) In general, by providing this overflow mapping, this will allow the object repository to reuse the same stored procedure when the Employee business object is extended.

So at this point, the object repository is capable of shredding the serialized business objects into relation tables. But what is the real advantage here? Couldn't the entire object hierarchy be stored as XML into a text column in the database? Well, unless one is ready to write XML parsing code, there answer is no. The reason is because by shredding the data into relational structures, it is now much easier to query with traditional transact sql code.

Now let's examine the query that generates the XML serialization format from the shredded data:

CREATE PROCEDURE FetchEmployee
   @LastName nvarchar(100)
AS

SELECT 1   as Tag, 
       null       as Parent,
       null       as [EmployeeList!1!],
       null       as [Employee!2!EmployeeID!element],
       null       as [Employee!2!LastName!element],
       null       as [Employee!2!FirstName!element],
       null       as [Employee!2!Title!element],
       null       as [Employee!2!!xmltextxmltext],
       null       as [AddressList!3!],
       null       as [Address!4!Street!element],
       null       as [Address!4!City!element],
       null       as [Address!4!State!element],
       null       as [Address!4!Zip!element],
       null       as [Address!4!Type!element]
UNION ALL
SELECT 2, 1, null, EmployeeID, LastName, FirstName, Title, Overflow ,null, 
null, null, null, null, null
FROM Employee
where LastName = @LastName
UNION ALL
SELECT 3, 2, null, EmployeeID, null, null, null, null, null, null, null, 
null, null, null
FROM Employee
where LastName = @LastName
UNION ALL
SELECT 4, 3, null, Address.EmployeeID, null, null, null, null, null, 
Street, City, State, Zip, AddressType        
FROM Address, Employee
where Address.EmployeeID = Employee.EmployeeID and Employee.LastName = @LastName
order by [Employee!2!EmployeeId!element], Tag, Parent
FOR XML EXPLICIT

This stored procedure is a FOR XML Explicit query that selects Employee and Address rows based a certain last name by filtering on the LastName column in the Employee table. If the persistence code hadn't shredded the Employee and Address elements, this task would have been exponentially harder because the retrieval code would have had to query into the serialized XML to find the LastName for filtering, and the EmployeeID for joining.

If you are new to the FOR XML Explicit mode support of SQL Server 2000, please see Using Explicit Mode. In general, explicit mode is the most powerful version of the FOR XML support that allows the user to shape the query results in virtually any reasonable XML shape. In this case, since the Employee fetching stored procedure needs to fetch the overflow (unconsumed) content, the explicit mode was required since that is the only mode that supports merging the overflow into the query results. In effect, this stored procedure reverses the OpenXML shredding process, and allows the round-tripping of Employee business objects.

Note also that by shredding the XML, non-XML relational clients can more easily access the data. In other words, one could write an Employee reporting tool that uses the Employee table through ADO.NET and SQLClient. In effect, this design has chosen the relational storage as the common persistence data model. Other applications and clients are then free to choose either relation or XML API's for accessing the data.

Extending the Business Objects

So, shortly after releasing version 1.0 of the product, the work on the second version of the application begins, and the customer is waiting with a list of new requirements for the Employee objects. In particular, each employee needs to include phone number information and needs to be mapped to the project they are working on. To handle these additions, the decision has been made to introduce two new classes—Employee2 which is an sub-class of the original Employee class, and a Project class:

    public class Employee2 : Employee
    {   
        private string _workPhone = null;
        private string _homePhone = null;
        private string _cellPhone = null;
        private Project _project = null;       
 
        public Employee2(int employeeID, string firstName, string 
        lastName, string title) 
            : base(employeeID, firstName, lastName, title) {}
 
        public Employee2() : base() {}
        
        public Project Project
        {
            get {return _project;}
            set {_project = value;}
        }
 
        public string WorkPhone
        {
            get {return _workPhone;}
            set {_workPhone = value;}
        }
        
        public string HomePhone
        {
            get {return _homePhone;}
            set {_homePhone = value;}
        }
        
        public string CellPhone
        {
            get {return _cellPhone;}
            set {_cellPhone = value;}
        }         
    }
 
        public class Project
    {
        [XmlElement(ElementName = "ProjectName")]
        public string _projectName = null;
        [XmlElement(ElementName = "Manager")]
        public int _manager = -1;
        
        public Project(){}
        public Project(string projectName, int manager)
        {
            _projectName = projectName;
            _manager = manager;
        }          
    }

In traditional object to relational database access code, these changes would have meant making changes to the structure of the relational storage. Perhaps adding a couple of fields to the Employee table and adding a Project table with a PK-FK relationship to the Employee table. (And everyone knows that with some DBA's, this can be like pulling teeth). However, because XML was used as the transport to the relational database, and the database designer planned ahead by including an overflow field into the initial Employee table, the object repository can support the new business objects without changing the relational database structures, stored procedures, or object repository code. For example, the code persisting Employee2 objects now looks like this:

Employee2 emp = new Employee2(3, "Don", "Johnson", "Developer");
emp._addressList.Add((object)new EmployeeAddress(AddressType.Other, 
"7878 Fleet Street", "Redmond", "WA", "98052"));
emp._cellPhone = "555-555-5151";
emp._homePhone = "111-111-9090";
emp.Project = new Project("Code Refactoring",1);
 _repository.PersistEmployee(emp);

And fetching Employee2 objects:

Employee[] employees = _repository.GetEmployee("Johnson");
Employee2 emp = (Employee2) employees[0];

Observe that this was basically the same code that was used for the original business objects, the only difference being that Employee2 objects were explicitly created. Note that even this could have been abstracted out by adding an EmployeeFactory, which hid the type creation from the object repository consumer. In general, the object repository has proven to be a very extensible design. In fact, unless the shape of the XML document that is generated from the business object hierarchy is dramatically changed, the business object developer has the freedom to extend or tweak the objects as much as they like. For example, this tweaking could be as drastic as removing fields from the Employee objects, something that will break most code. In that case, the OpenXML code will not find the element in the serialization format and will insert a Null value instead.

So there are a couple of technical details worth noting here. First of all, when the Employee2 was added as a subclass of the Employee class, it was necessary to add the XmlInclude attribute to the Employee class:

[XmlInclude(typeof(Employee2))]
public class Employee
{
   …
}

By including this attribute and specifying the new Employee2 type, the XmlSerializer can then recognize both the base and derived types when serializing and deserializing. For serializing this is not so complicated because the XMLSerializer can dynamically figure out the type of the instance to be serialized. However, when deserializing, the XMLSerializer actually creates Employee2 types when fetching the XML from the database for persisted Employee2 objects. How does the XmlSerializer figure this out? The key to that question lies in examining the overflow content in the database.

So what does the relational data look like for the Employee2 objects after the database shredding code has taken place? For the standard relational fields in the Employee and Address tables, everything is the same as it was for Employee objects. However, the following XML is contained in the overflow field in the Employee table:

<Employee xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance 
xsi:type="Employee2">
<HomePhone>111-111-9090</HomePhone>
<CellPhone>555-555-5151</CellPhone>
   <Project>
      <ProjectName>CodeCleanUp</ProjectName>
      <Manager>1</Manager>
   </Project>
</Employee>

Basically the OpenXML shredding code has taken all the unmapped content of the Employee element and stored that in the overflow field. Another way to think about it is that every time the OpenXML code found a mapping (through the WITH clause) for a given node in the XML, it was removed from the initial XML tree. Once all of the mapping was completed, the remainder of the tree was serialized to XML and placed in the overflow column. The real interesting thing here is the xsi:type attribute that was added by the XmlSerialization code. This is precisely how the deserialization code can dynamically determine what type was persisted to the database when fetching. Depending on how this feature was leveraged to use or manipulate the type at the relational database level, particularly if it was shredded into a standard relational column. For example, one could upgrade legacy Employee data to the new Employee2 type or run queries based on the type of object as part of the query predicate.

Possible Enhancements

As illustrated, the object repository design is quite flexible and loosely coupled with the business logic layer, which was the original design goal. There are a few enhancements that could be made to the design of the object repository to further its overall usefulness though:

  • Currently, the entire serialized image of the business objects is sent to the database for update and delete operations. That could definitely be reduced by only including the updated values for updates, and values used for concurrency for updates and deletes. However, that would require the object repository to keep track of the difference between new and updated business objects, and would couple it with the database layout since it would have to know which values are used for concurrency control. So basically, it would be a trade off between design and performance.
  • The concurrency control mechanism could be better. The current design only uses the EmployeeID for checking for concurrency problems. In fact, since the update code deletes first and then inserts, with no error checking for concurrency problems, this in effect could allow the object repository to update an object that has been deleted from the backend database. However, because the database layer uses OpenXML to shred the XML into relational structures, this problem can be handled in the same way traditional relational database code handles concurrency issues, so it is not a significantly difficult problem to fix.
  • One feature of OpenXML that was not utilized in the example object repository was the ability to specify "//" or XPath wildcards as part of the column and/or row patterns, thereby allowing the database code to be further decoupled from the business logic layer. This would allow the shape of the object serialization XML to change (as long as entity names remained the same) without modifying the database code. Keep in mind though that one should only introduce imprecision into the column and row patterns only if the application requires that kind of flexibility.
  • Storing overflow as XML is somewhat clumsy for non-XML API access. One possible option for solving this problem is to use OpenXML to shred the data into an edge table format. The edge table format represent the fine-grained XML document structure (for example, element/attribute names, the document hierarchy, the namespaces, and so on) in a single table. However, writing the querying code for the edge table is still not as simple as transact SQL statements as in the developer of the query still needs to understand the hierarchy represented by the table. In general, there is no clear way to solve this issue presently. It really comes down to what is important to the application developer: the flexibility of overflow versus access to the data.

Conclusion

As with any design, there are definitely some tradeoffs. XML as a transport is clearly not the most performant mechanism for database access. There is a lot of markup that is being transported over the wire. But as shown, by using XML as the transport to and from the database and OpenXML as the persistence mechanism, one can significantly decouple the database layer from the business logic layer and thereby develop a very flexible and extensible design. In general, it comes down to the particular requirements of a given application.

It is possible to move the XML shredding/ generation code off the database server by doing this client side with the SQLXML mapping technologies. Part 2 of this article will examine those technologies and will refactor the object repository to use those technologies.

Show:
© 2014 Microsoft