The Nordic Object/Relational Database Design
by Paul Nielsen
Summary: The New Object/Relational Database Design (Nordic), like many tools, is not the most fitting or expedient solution for every kind of database problem. However, the object/relational hybrid model can provide more power, greater flexibility, better performance, and even higher data integrity than traditional relational models, particularly for databases that benefit from inheritance, creative data mining, flexible class interactions, or workflow constraints. Discover some of the innovations that are possible when object-oriented technology is modeled using today's mature relational databases.
The differences between object-oriented development and the relational database model create a tension often called the object-relational impedance mismatch. Inheritance just does not translate well into a relational schema. The technical impedance mismatch is aggravated by the cultural disconnect between application coders and database administrators (DBAs). Often neither side fully understands nor respects the other's lexicon. A sure way to get under a DBA's skin is to refer to the database as the "object persistence utility." That relationship is unfortunate because each side brings a unique set of advantages to the data architecture problem.
In many ways object-oriented design is superior to the relational model. For example, designing inherence between classes can be accomplished using the relational supertype/subtype pattern, but the object-oriented design is a more elegant solution. Also, nearly all application code is object oriented and an object-oriented database would interface with the application more easily than would a relational database.
As sophisticated as object-oriented technology is at modeling reality, the relational side is not without significant advantages. Relational database engines offer performance, scalability, and high-availability options, and the financial muscle to ensure the database platform will still be here in a few decades. Relational technology is well understood, and relational databases offer more powerful query and reporting tools than object databases. The few pure object database companies available simply do not have the resources to compete with Microsoft, Oracle, or IBM.
The conundrum of the object-relational impedance mismatch then is how to best embrace the elegance of object-oriented technologies while retaining the power, flexibility, and long-term stability of a mature relational database engine. Since application programmers are generally the most interested in solving this problem, and human nature tends to solve problems using the most comfortable skill set, it is not surprising that most solutions are implemented in a mapping layer between the database and the application code that translates objects into relational tables for persisting objects.
I propose that a relational model, designed to emulate object-oriented features, can perform extremely well within today's relational database engines, and that manipulating the class inheritance and complex associations directly in the database, close to the data, is in fact very efficient. This efficiency was not always the case. I was hired to optimize a Transact-SQL (T-SQL)-intensive, object/relational (O/R) database design implemented with SQL Server 6.5 and failed. Development of the Nordic Object/Relational Database Design involved a year's worth of iterations, a simplified metadata schema, and SQL Server's maturing T-SQL.
As with any database project, a strictly enforced data abstraction layer that encapsulates the database is necessary for long-term extensibility. For an O/R hybrid database, the data abstraction layer also provides the facade for the object-oriented features. Behind the facade's code are the metadata schema and code generation for the classes, objects, and associations (see Figure 1). When implementing this solution there are a few key design decisions.
Class management. Within the relational schema, class and attribute metadata are easily modeled using a common one-to-many relationship. The superclass/subclass relationship is modeled as a hierarchical tree, using either the more common adjacency list pattern or the more efficient materialized path pattern.
Navigating up and down the class hierarchy with user-defined functions enables SQL queries to join easily with any class's decedent or ancestor classes. These user-defined functions are leveraged throughout the facade layer. For instance, when selecting all the properties for a class, joining with the superclasses() user-defined function returns all superclasses, and the query can then select all of a given class's properties including properties inherited from superclasses.
In the context of working with persisted objects, polymorphism refers to the select method's ability to retrieve not only the current class's objects but all subclass objects as well. For example, selecting all contacts should select not only objects of the contact class but also objects of the customer and major customersubclasses. A user-defined function that returns a table variable of all subclasses of a given class makes writing this query and stored procedure efficient and reusable.
Object management. Objects are best modeled using a single object table that stores the object's common data such as the unique objectid, object's class, audit data, and a few search attributes common to nearly every class, like name, a date attribute, and so forth. Additional attributes are stored in custom class tables that use an objectid foreign key to relate the custom attributes to the object table. The createclass and other class-management, facade-stored procedures execute the Data Definition Language (DDL) code to create or modify the custom class tables and generate the custom facade code for selecting, inserting, and updating objects.
The major design decision for modeling how objects are stored is how to represent the custom attribute data. There are three possible methods: the value-pair pattern, concrete custom class tables, and cascading custom class tables. The value-pair pattern, also called the generic pattern, uses a diamond-shaped pattern consisting of class, property, object, and value. The value table uses a single column to store all values. This long, narrow table uses one row for every attribute. Ten million objects with 15 attributes would use 150 million rows in the value table. SQL Server is more than capable of working with large tables; that's not a problem. This model appears to offer the most flexibility because attributes can be added without modifying the relational schema; however, this model suffers from nonexistent, or at best awkward, data typing and is difficult to query using SQL.
The concrete custom class model uses a table for each class with columns for every custom attribute, including nonabstract inherited attributes. An object exists in only two tables—the object table and the concrete custom class table—while attributes are replicated in every subclass's concrete custom class tables. Therefore, if the animal class has a birthdate attribute, and the mammal subclass has a gender attribute (since some animals do not have a gender), then the mammal custom class table includes objectid, birthdate, and gender columns.
This pattern has the advantage that selecting all attributes for a given class requires, only joining the object metadata table with a single custom class table. The disadvantage is implementing polymorphism; selecting all animals requires performing a union of every subclass and either eliminating subclass attributes or adding surrogate superclass attributes so all selects in the union have compatible columns.
Figure 1. The O/R hybrid design uses a facade to encapsulate the object-oriented functionality executed within a relational database schema. (Click on the picture for a larger image.)
An improvement over the value-pair pattern, the concrete custom class tables use a relational column for each attribute so attribute data typing can easily implement the native data types of the host relational database.
The third option implemented in the Nordic Object/Relational Database Design, cascading custom class tables, uses a table for each class like the concrete class solution. However, instead of replicating attributes, each attribute is represented only once in its own class, and every object is represented once in every cascading class. Using the mammal and animal example, the animal table contains objectid and birthdate, and the mammal table consists of objectid and gender. An instance of the mammal object is stored in the object metadata, the animal table, and the mammal table. Polymorphism is very easy in this option, however, more joins are required to select all attributes from subclasses. Since SQL Server is optimized for joins, cascading custom class tables perform very well. As with concrete custom class tables, strong data typing is supported by the host database.
Association management. Object-oriented technology's associations are very similar to relational database technology's foreign key constraints, and it is certainly possible to define associations within an O/R hybrid model by adding foreign key attributes and assigning relational declarative referential integrity constraints to the custom class tables.
However, storing every object in a single table offers some exciting alternatives for modeling associations. Where a normalized relational database might contain dozens of foreign keys, each with a different foreign key table and column, and each references a different primary key, an O/R hybrid association table needs to reference only a single table. Every foreign key relationship in the database can be generalized into a single association.objectid to object.objectid foreign key.
Figure 2. A generic three-join query locates all associated objects, regardless of the class. (Click on the picture for a larger image.)
The association table can be designed as a paired list (ObjectA_id, ObjectB_id), but this design is too limiting for complex collections, and queries must identify objectA and objectB. The more flexible alternative uses an object-association list consisting of a single objectid to reference the object, and an associationid to group associated objects. An associationtypeid column can reference association metadata that describes the association and might provide association constraints.
A single table for every object and another table for every association sounds radically different than a normalized schema, and it is. The physical structure is not a problem though; SQL Server excels at long narrow tables, and this design lends itself to clustered and nonclustered covering index tuning, which yields high performance.
For the database architect the object-association list pattern provides amazing possibilities. First, joining an object with n number of other objects of any class always uses the same query (see Figure 2). Adding additional tables to a relational query adds n-1 joins, but the object-association list consistently uses the same three joins regardless of the number of classes involved. Depending on the application, this style of relating objects can scale considerably better than a normalized relational model. As new classes are added to the data model or to the association they are automatically included in "find all associations" queries without modifying any existing code.
Finding all associations between classes, or all objects not participating in any association with another class, or other creative yet powerful data mining queries are all trivial and reusable set-based queries. SQL Server user-defined functions can encapsulate working with associations and the many logical combinations of objects that do or do not participate in associations.
Complex collections are also possible with an object-association list. For instance, a classroom collection might include one classroom, one or more instructors, one or more desks, curriculum, and one or more students as defined in the association metadata.
Generalized associations open up more possibilities. Web pages are also linked using a generalized method; every hyperlink uses an anchor tag and a URL. It is essentially an object-association list embedded within HTML code. It is trivial to graphically map a Web page and display the navigation between Web pages regardless of the content of the Web page. Likewise, it is trivial to bounce between the object and association tables and instantly locate objects associated by several degrees of separation regardless of class.
I am currently developing a child-sponsorship management database to help organizations fight poverty. Using the object-association list, a single user-defined function that finds associations for any object can find that Joe sponsors a child in Peru, is scheduled to attend a meeting about poverty, wrote three letters to the child, sent a gift last year, and inquired about a child in Russia.
Spidering the association-object list for multiple degrees of separation also reveals that Greg is scheduled to visit the town in Peru where Joe's child lives, 14 others attended the same poverty meeting as Joe, and 3 of those sponsor children in Peru. This flexibility for any object with one query is impossible with a relational design.
The generality of the object-association list lends itself to another database innovation—integrating object workflow state into the database. While workflow state does not apply to all classes, for some classes, workflow is a dimension of data integrity missing from the relational database model.
A typical workflow for an order might be shopping cart, order confirmed, payment confirmed, inventory allocated, in process, ready to ship, and shipped. A relational foreign key only constrains the secondary table to referencing a valid primary key value. Using a relational database, a shipdetail row can be created that references the order, regardless of the workflow state of the order. Custom code must validate that the order has completed certain steps prior to shipping.
With inheritable workflow states defined as part of the class metadata, the association metadata can restrict the objects to class and workflow state integrating workflow into the object data.
I have highlighted some of the innovations possible when object-oriented technology is modeled using today’s mature relational databases. As with any tool, the Nordic Object/Relational Database Design is not the best solution for every database problem; however, for databases that benefit from inheritance, creative data mining, flexible class interactions, or workflow constraints, the O/R hybrid model can provide more power, flexibility, performance, and even data integrity than traditional relational models.
Paul Nielsen is a SQL Server MVP, author of the SQL Server Bible series (Wiley, 2002), and is writing Nordic Object/Relational Design in Action (Manning), which is due to be published later this year. He offers workshops on database design and optimization, and may be contacted though his Web site, www.SQLServerBible.com.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.