October 2011

Volume 26 Number 10

Cutting Edge - Objects and the Art of Data Modeling

By Dino Esposito | October 2011

Dino Esposito Many of today’s apps are built around a single data model, typically persisted to a data store via an object-relational mapper (ORM) tool. But sometimes—for several different reasons—you may need more flexibility, which requires multiple models. In this article, I’ll discuss some strategies you can use to handle these situations and develop more layered and robust applications.

Using different models clearly makes the whole application more complex, but it’s a sort of necessary, positive complexity that makes the whole project more manageable. Understanding when one model of data just doesn’t fit all use cases is the challenge for the architect.

Different parts of a software application may have their own model of data. The way in which you represent data at the UI level is different from the way in which you organize data in the middle tier, and it may be even different from how the data is physically persisted in some data store.

But, for many years, developers used just one model of data, regardless of the part of the application involved. Many of us have grown with relational databases and their related modeling techniques. It was natural to spend a lot of effort in elaborating a model based on relations and normalization rules. Any data stored in a relational database was then extracted and moved around using memory structures similar to an in-memory database. Record Set is the name of this generic data pattern (see bit.ly/nQnyaf); the ADO.NET DataSet was an excellent implementation of it.

The table-based approach was progressively pushed to the corner by the growing complexity of applications. ORM tools emerged for two main pragmatic reasons. First, working with objects is easier than dealing with generic super-arrays of data such as a recordset. The second reason is productivity. The goal of an ORM is taking an object model and mapping it to a relational schema. By using an ORM, you essentially make the object model look like the real database to the application’s eyes.

How do you design this model? And should you really use just one model for each application? Let’s find out.

One Model Doesn’t Always Fit All

The recent release of the Entity Framework (EF) 4.1 included support for “Code First” programming, which, as the name implies, lets Microsoft .NET Framework developers take a code-first approach to data modeling. This basically means that you begin by writing the Plain Old CLR Object (POCO) classes that model the domain of the application. The second step consists of mapping these classes to some persistent store and having the ORM tool (the EF) take care of persistence details. The ORM exposes a query language (LINQ to Entities), transactional semantics (the xxxContext objects in the EF) and a basic create, read, update and delete (CRUD) API that supports concurrency, lazy loading and fetch plans.

So you have the model, and you know how to persist it and how to query information from it. Is this a model you can use everywhere in your application? More specifically, is this a model you can effectively bring up to the presentation layer? This is a really sore point where theory and practice differ substantially, context is king and uncertainty (when not really confusion) reigns. Let me offer a concrete example bound to the popular ASP.NET MVC technology. The only reason I’m using ASP.NET MVC instead of, say, Silverlight, is that ASP.NET MVC has the word Model in the name (the M in MVC)—and I’ve been asked too many times in classes and conferences about the exact location of the “model” in an ASP.NET MVC application.

Today, even after three versions, too many tutorials on ASP.NET MVC insist on using just one model for queries, updates and presentation. Many tutorials also keep the definition of the model inside the main project. It works, so where’s the problem? The problem isn’t with the solution, which works, is effective, and is something I’ve often used myself and plan to keep using. There’s actually a set of real-world, but simple and relatively short-lived applications (not just demos and tutorials), that can be developed with simple patterns. The problem of using just one model is with the message it transmits to the primary consumers of tutorials: developers looking to learn a technology. Having just one model in just one place suggests it’s the preferred (if not recommended) way of doing things. It is, instead, just a special case—a very simple and favorable scenario. If the real scenario you’ll actually work in matches that, you’re more than fine. Otherwise, you’re stuck and ready to grow your first “big ball of mud” bigger and bigger.

A More General Architecture

Let’s begin by using slightly different names. Let’s call the model of data that you persist the domain model and call the data you manage in the view the view model. I should mention that domain model isn’t exactly a neutral term in software, as it refers to an object model designed according to a number of additional rules. In the scope of this article, I’m not using the meaning that results from the Domain-Driven Design (DDD) methodology. For me, here, the domain model is simply the object model you persist—entity model might be another equivalent—and less confusing—term.

You use the classes in the entity model in the back end of the application; you use the classes in the view model in the presentation layer. Note, however, that in ASP.NET MVC, the presentation layer is the controller. The controller should receive data ready for the UI. The middle-tier components receive and return view model objects and internally use entity objects. Figure 1 shows the web of dependencies in a typical multilayered project.

Connections Among Layers
Figure 1 Connections Among Layers

The presentation (that is, the codebehind or controller class) references the application logic, namely a component that implements the application’s use cases. The application logic technically belongs to the business layer or tier, and in very simple cases may be merged with the presentation. This is what happens in some tutorials that are either too simple to really feel the need for isolating application logic in its own layer, or poorly designed and splitting application logic between the presentation and the data access layer (DAL).

The application logic assembly implements the service layer pattern and decouples two interfacing layers: presentation and data. The application logic references both the entity model (your domain classes) and the DAL. The application logic orchestrates the DAL, domain classes and services to pursue the expected behavior. The application logic communicates with the presentation layer through view model objects and communicates with the DAL through domain objects. The DAL, in turn, references the model and the ORM assembly.

A Few Words on the Entity Framework

Let’s examine this architecture, assuming the EF as the ORM. The EF isn’t simply an ORM, but as its name suggests, it does the typical work of an ORM, plus offering a framework for creating a model. Nice idea, but it shouldn’t be forgotten that we’re straddling two distinct layers—business and the DAL. The classes are business; the persistence engine is the DAL. For the EF, the persistence engine (the ORM assembly) is system.data.entity and its dependencies, including the ObjectContext class. Put another way, unless you use POCO classes and Code First, you’ll likely end up with a domain model that has a dependency on the ORM. A domain model should use POCO—that is, among other things, it should be a self-contained assembly. An interesting thread on this topic exists on Stack Overflow at bit.ly/mUs6cv. For more details on how to split entities from context in the EF, see the ADO.NET team blog entries, “Walkthrough: POCO Template for the Entity Framework” (bit.ly/bDcUoN) and “EF 4.1 Model & Database First Walkthrough” (bit.ly/hufcWN).

From this analysis, it seems that the POCO attribute is essential for domain classes. POCO, of course, indicates a class with no dependencies outside its own assembly. POCO is about simplicity and is never wrong. Not going the POCO way means having forms of tight coupling between layers. Tight coupling isn’t poisonous and doesn’t kill you instantly, but it can take your project to a slow death. Check out my column about software disasters in last month’s issue (msdn.microsoft.com/magazine/hh394145).

What’s the Model?

When you create an object model, you create a library of classes. You can organize your object model according to a number of patterns, but essentially it boils down to choosing between a table-oriented approach and an object-oriented approach.

How do you devise your classes? Which capabilities should they feature? In particular, should your classes be aware of the database? Should they include logic (that is, methods) or be limited to just exposing properties? There are two main patterns you can refer to: Active Record and Domain Model.

In the Active Record pattern, your classes are closely modeled after database tables. You mostly have one class per table and one property per column. More importantly, classes are responsible for their own persistence and their own simple, minimal domain logic.

According to the Domain Model pattern, your classes are aimed at providing a conceptual view of the problem’s domain. These classes have no relationships with the database and have both properties and methods. Finally, these classes aren’t responsible for their own persistence. If you opt for a Domain Model approach, persistence has to be delegated to a distinct layer—the DAL. You can write this layer yourself, but it wouldn’t be much fun. A well-done DAL for a library designed according to the Domain Model pattern is nearly the same as an ORM tool. So why not use one of the existing ORM tools?

Especially if you use the automatic code generator, the EF gets you an object model made of classes that have only properties. The pattern used certainly isn’t Active Record, but classes have no methods by default. Fortunately, classes are marked as partial, which makes it possible for you to shape up a much richer domain model by adding logic through partial classes. If you choose the Code First approach, you’re then entirely responsible for writing the source code of your classes from start to finish.

Classes in an object model that only feature properties are often referred to as an anemic domain model.

Repositories

A good practice that’s gaining well-deserved popularity is wrapping data access code in façade classes known as repositories. A repository consists of an interface and an implementation class. You typically have a repository for each significant class in the model. A significant class in an object model is a class that controls its own persistence and stands on its own in the domain without depending on other classes. In general, for example, Customer is a significant class, whereas OrderDetails is not, because you’ll always use order details if you have an order. In DDD, a significant class is said to be an aggregate root.

To improve the design, you can establish a dependency between application logic and the DAL through repository interfaces. Actual repositories can then be injected in the application logic as appropriate. Figure 2 shows the resulting architecture where the DAL is based on repositories.

Using Repositories in the DAL
Figure 2 Using Repositories in the DAL

Repositories enable dependency injection in the DAL because you can easily unplug the current module that provides persistence logic and replace it with your own. This is certainly beneficial for testability, but it isn’t limited to that. An architecture based on repositories allows mocking the DAL and testing the application logic in isolation.

Repositories also represent an excellent extensibility point for your applications. By replacing the repository, you can replace the persistence engine in a way that’s transparent to the rest of the layers. The point here isn’t so much about switching from, say, Oracle to SQL Server, because this level of flexibility is already provided by ORM tools. The point here is being able to switch from the current implementation of the DAL to something different such as a cloud API, Dynamics CRM, a NoSQL solution and the like.

What Repositories Should Not Be

A repository is part of the DAL; as such, it isn’t supposed to know about the application logic. This may be too obvious a statement, but a type of architecture diagram I see quite often these days is based on presentation and repositories.

If your actual logic is thin and simple—or is essentially CRUD—then this diagram is more than OK. But what if that’s not the case? If so, you definitely need a place in the middle where you deploy your application logic. And, trust me, I’ve seen only a few applications that are just CRUD in my career. But probably I’m just not a lucky man …

Clarifying Obscurity

Wrapping up, nowadays applications tend to be designed around a model of data that an ORM tool then persists to some data store. This is fine for the back end of the application, but don’t be too surprised if the UI requires you to deal with significantly different aggregates of data that just don’t exist in the original model. These new data types that exist for the sole purpose of the UI must be created. And they end up forming an entirely parallel object model: the view model. If you reduce these two models to just one, then by all means stick to that and be happy. If not, hopefully this article clarified some obscure points.


Dino Esposito* is the author of “Programming Microsoft ASP.NET MVC3” (Microsoft Press, 2011) and coauthor of “Microsoft .NET: Architecting Applications for the Enterprise” (Microsoft Press, 2008). Based in Italy, Esposito is a frequent speaker at industry events worldwide. You can follow him on Twitter at twitter.com/despos.*

Thanks to the following technical experts for reviewing this article: Andrea Saltarello and Diego Vega