Article
02/01/2019

November 2016

Volume 31 Number 11

[Data Points]

CQRS and EF Data Models

Julie Lerman Command Query Responsibility Segregation (CQRS) is a pattern that essentially provides guidance around separating the responsibility of reading data and causing a change in a system’s state (for example, sending a confirmation message or writing to a database), and designing objects and architecture accordingly. It was initially devised to help with highly transactional systems such as banking. Greg Young evolved CQRS from Bertrand Meyer’s command-query separation (CQS) strategy, whose most valuable idea, according to Martin Fowler, “is that it’s extremely handy if you can clearly separate methods that change state from those that don’t” (bit.ly/2cuoVeX). What CQRS adds is the idea of creating entirely separate models for commands and queries.

CQRS has often been put into buckets incorrectly, as a particular type of architecture, or as part of Domain-Driven Design, or as messaging or eventing. In a 2010 blog post, “CQRS, Task-Based UIs, Event Sourcing, Agh!” (bit.ly/1fZwJ0L), Young explains that CQRS is none of those things, but just a pattern that can help with architecture decisions. CQRS is really about “having two objects where there was previously only one.” It’s not specific to data models or service boundaries, though it can certainly be applied to those parts of your software. In fact, he states that “the largest possible benefit though is that it recognizes that their (sic) are different architectural properties when dealing with commands and queries.

When defining data models (most often with Entity Framework [EF]), I’ve become a fan of leveraging this pattern—in particular scenarios. As always, my ideas are meant as guidance, not rules, and just as I’ve chosen to apply CQRS in a way that helps me achieve my architecture, I hope you’ll take them and shape them to suit your own particular needs.

Benefits of Relationship Handling with EF

Entity Framework makes working with relationships at design time so easy. When querying, this is a huge benefit. Relationships that exist between entities allow you to navigate those relationships when expressing queries. Retrieving related data from the database is easy and efficient. You can choose from eager loading with the Include method or with projections, after-the-fact lazy loading or after-the-fact explicit loading. These features haven’t changed much since the original version of EF, nor since I wrote about them back in June 2011, “Demystifying Entity Framework Strategies: Loading Related Data” (msdn.com/magazine/hh205756).

The canonical example in the model in Figure 1 makes querying easy to display the details of a customer’s order, the line items and the product names on a page. You can write an efficient query like this:

var customersWithOrders = context.Customers
  .Include(c => c.Orders.Select(
  o => o.LineItems.Select(p => p.Product)))
  .ToList();

Figure 1 Entity Framework Data Model with Tightly Coupled Relationships

EF will transform this into SQL that will retrieve all of the relevant data in one database command. Then, from the results, EF will materialize the full graphs of customers, their orders, the line items for the orders and even the product details for each line item.

It certainly makes populating a page like the Window Presentation Foundation (WPF) window in Figure 2 easy. I can do it in a single line of code:

customerViewSource.Source = customersWithOrders

Figure 2 Data Controls Bound to a Single Object Graph

Here’s another benefit that developers love: When creating graphs, EF will work out the back and forth to the database to insert the parent, return the new primary key value, and then apply that as the foreign key value to the children before building and executing their insert commands.

It’s all quite magical. But magic has its downsides, and in the case of EF data models, the magic that comes from having tightly bound relationships can result in side effects when it’s time to perform updates, and sometimes even with queries. A notable side effect can happen when you attach reference data to a new record using a navigation property and then call SaveChanges. As an example, you might create a new line item and set its Product property to an instance of an existing product that came from the database. In a connected app, such as a WPF app, where EF may be tracking every change to its objects, EF will get that the product was pre-existing. But in disconnected scenarios where EF begins tracking the objects only after the changes have been made, EF will assume that the product, like the line item, is new and will insert it into the database again. There are workarounds for these problems, of course. For this problem, I always recommend setting the foreign key value (ProductId) instead of the instance. There are also ways to track the state and sort things out with EF prior to saving the data. In fact, my recent column, “Handling the State of Disconnected Entities in EF” (msdn.com/magazine/mt694083), shows a pattern for doing that.

Here’s another common pitfall: navigation properties that are required. Depending on how you’re interacting with an object, you may not care about the navigation property—but EF will certainly notice if it’s missing. I wrote about this type of problem in another column, “Making Do with Absent Foreign Keys” (msdn.com/magazine/hh708747).

So, yes, there are workarounds. But you can also leverage the CQRS pattern to create cleaner and more explicit APIs that don’t require workarounds. This also means that they will be more maintainable and less prone to additional side effects.

Applying CQRS Pattern for DbContext and Domain Classes

I’ve often used the CQRS pattern to help me get around this problem. Granted that it does mean that whatever models you’re breaking up will result in twice as many classes (although not necessarily twice as much code). Not only do I create two separate DbContexts, but quite often I’ll end up with pairs of domain classes, each focused on the relevant tasks around reading or writing.

I’ll use as my example a model that’s slightly different form the simpler one I already presented. This example comes from a sizable solution I built for a recent Pluralsight course. In the model, there’s a SalesOrder class that acts as the aggregate root in the domain. In other words, the SalesOrder type controls what happens to any of the other related types in the aggregate—it controls how LineItems are created, how discounts are calculated, how a shipping address is derived and so forth. If you think about the tasks I just mentioned, they’re focused more on order creation. You don’t really need to worry about the rules around creating a new line item for an order when you’re simply reading the order information from the database.

On the other hand, when viewing data, there may be a lot more interesting information to see than I care about when I’m just pushing data into the database.

A Model for Queried Data

Figure 3 shows the SalesOrder type in the Order.Read.Domain project of my solution. There are a lot of properties here and only a single method for creating better display data. You don’t see business rules in here because I don’t have to worry about data validation.

Figure 3 The SalesOrder Type Defined To Be Used for Data Reads

namespace Order.Read.Domain {
 public class SalesOrder : Entity  {
  protected SalesOrder()   {
    LineItems = new List<LineItem>();
  }
  public DateTime OrderDate { get; set; }
  public DateTime? DueDate { get; set; }
  public bool OnlineOrder { get; set; }
  public string PurchaseOrderNumber { get; set; }
  public string Comment { get; set; }
  public int PromotionId { get; set; }
  public Address ShippingAddress { get; set; }
  public CustomerStatus CurrentCustomerStatus { get; set; }
  public double Discount   {
    get { return CustomerDiscount + PromoDiscount; }
  }
  public double CustomerDiscount { get; set; }
  public double PromoDiscount { get; set; }
  public string SalesOrderNumber { get; set; }
  public int CustomerId { get; set; }
  public double SubTotal { get; set; }
  public ICollection<LineItem> LineItems { get; set; }
  public decimal CalculateShippingCost()   {
    // Items, quantity, price, discounts, total weight of item
    // This is the job of a microservice we can call out to
    throw new NotImplementedException();
  }
}

Compare this to the SalesOrder in Figure 4, which I’ve defined for scenarios where I’ll store SalesOrder data to the database—whether it’s a new order or one that I’m editing. There’s a lot more business logic in this version. There’s a factory method along with a private and protected constructor that ensure that an order can’t be created without particular data being available. There are methods with logic and rules for how a new line item can be created for an order, as well as how to apply a shipping address. There’s a method to control how and when a particular set of order details can be modified.

Figure 4 The SalesOrder Type for Creating and Updating Data

namespace Order.Write.Domain {
  public class SalesOrder : Entity   {
    private readonly Customer _customer;
    private readonly List<LineItem> _lineItems;
    public static SalesOrder Create(IEnumerable<CartItem>
      cartItems, Customer customer) {
      var order = new SalesOrder(cartItems, customer);
      return order;
    }
    private SalesOrder(IEnumerable<CartItem> cartItems, Customer customer) : this(){
      Id = Guid.NewGuid();
      _customer = customer;
      CustomerId = customer.CustomerId;
      SetShippingAddress(customer.PrimaryAddress);
      ApplyCustomerStatusDiscount();
      foreach (var item in cartItems)
      {
        CreateLineItem(item.ProductId, (double) item.Price, item.Quantity);
      }
      _customer = customer;
    }
    protected SalesOrder() {
      _lineItems = new List<LineItem>();
      Id = Guid.NewGuid();
      OrderDate = DateTime.Now;
    }
    public DateTime OrderDate { get; private set; }
    public DateTime? DueDate { get; private set; }
    public bool OnlineOrder { get; private set; }
    public string PurchaseOrderNumber { get; private set; }
    public string Comment { get; private set; }
    public int PromotionId { get; private set; }
    public Address ShippingAddress { get; private set; }
    public CustomerStatus CurrentCustomerStatus { get; private set; }
    public double Discount{
      get { return CustomerDiscount + PromoDiscount; }
    }
    public double CustomerDiscount { get; private set; }
    public double PromoDiscount { get; private set; }
    public string SalesOrderNumber { get; private set; }
    public int CustomerId { get; private set; }
    public double SubTotal { get; private set; }
    public ICollection<LineItem> LineItems  {
      get { return _lineItems; }
    }
    public void CreateLineItem(int productId, double listPrice, int quantity)
    {
      // NOTE: more rules to be implemented here
      var item = LineItem.Create(Id, productId, quantity, listPrice,
        CustomerDiscount + PromoDiscount);
      _lineItems.Add(item);
    }
    public void SetShippingAddress(Address address) {
      ShippingAddress = Address.Create(address.Street, address.City,
        address.StateProvince, address.PostalCode);
    }
    public bool HasLineItems(){
      return LineItems.Any();
    }
    public decimal CalculateShippingCost() {
      // Items, quantity, price, discounts, total weight of item
      // This is the job of a microservice we can call out to
      throw new NotImplementedException();
    }
    public void ApplyCustomerStatusDiscount() {
      // The guts of this method are in the sample
    }
    public void SetOrderDetails(bool onLineOrder,
      string PONumber, string comment, int promotionId, double promoDiscount){
      OnlineOrder = onLineOrder;
      PurchaseOrderNumber = PONumber;
      Comment = comment;
      PromotionId = promotionId;
      PromoDiscount = promoDiscount;
    }
  }
}

The write version of SalesOrder is more complex. But if I ever need to work on the read version, I won't have all of that extraneous write logic in my way. If you’re a fan of the guidance that readable code is code that’s less prone to errors, you may, like me, have yet another reason to prefer this separation. And surely someone like Young would think even this class has way too much logic in it. But for our purposes, this will do.

The CQRS pattern lets me focus on the problems of populating a SalesOrder (which, in this case, are few) and the problems of building a SalesOrder separately when defining the classes. These classes do have some things in common. For example, both versions of the SalesOrder class define a relationship to the LineItem type with an ICollection<List> property.

Now let’s take a look at their data models; that is, the DbContext classes I use for data access.

The OrderReadContext defines a single DbSet, which is for the SalesOrder entity:

public DbSet<SalesOrder> Orders { get; set; }

EF discovers the related LineItem type and builds the model shown in Figure 5. However, as EF requires the DbSet to be exposed, it also makes it possible for anyone to call OrderReadContext.SaveChanges. This is where layers are your friend. Andrea Saltarello provides a great way to encapsulate the DbContext so that only the DbSet is exposed and developers (or future you) using this class don’t have direct access to the OrderReadContext. This can help to avoid accidentally calling SaveChanges on the read model.

Figure 5 The Data Model Based on the OrderReadContext

A simple example of such a class is:

public class ReadModel {
  private OrderReadContext readContext = null;
  public ReadModel() {
    readContext = new OrderReadContext();
  }
  public IQueryable<SalesOrder> Orders {
    get {
      return readContext.Orders;
    }
  }
}

Another protection you can add to this implementation is to take advantage of the fact that SaveChanges is virtual. You can override SaveChanges so that it never calls the internal DbContext.SaveChanges method.

The OrderWriteContext defines two DbSets: not just one for SalesOrder, but another for the LineItem entity:

public DbSet<SalesOrder> Orders { get; set; }
public DbSet<LineItem> LineItems { get; set; }

Already that’s interesting, as I didn’t bother exposing a DbSet for LineItems in the other DbContext. In the OrderReadContext, I’ll query only through the SalesOrders. I won’t ever query directly against the LineItems, so there’s no need to expose a DbSet for that type. Remember in the query to populate the WPF window as in Figure 2. I eager-loaded the LineItems via the Orders DbSet.

The other important logic in the OrderWriteContext is that I’ve explicitly told EF to ignore the relationship between SalesOrder and LineItem using the fluent API:

protected override void OnModelCreating(DbModelBuilder modelBuilder) {
  modelBuilder.Entity<SalesOrder>().Ignore(s => s.LineItems);
}

The resulting model looks like Figure 6.

Figure 6 The Data Model Based on the OrderWriteContext

That means I can’t use EF to navigate from SalesOrder to LineItem. It doesn’t prevent me from doing that in my business logic; as you’ve seen, I have lots of code in the SalesOrder class that interacts with LineItems. But I won’t be able to write a query that navigates through to LineItems, like context.SalesOrders.Include(s=>s.LineItems). That may raise a moment of panic until I remind you that this is the model for writing data, not for reading it. EF can retrieve related data with no problem using the OrderReadContext.

Pros and Cons of a Relationship-Free DbContext for Writes

So what have I gained by separating the writing responsibilities from the querying responsibilities? It’s easy for me to see the downsides. I have more code to maintain. More important, EF won’t magically update graphs for me. I’ll have to do more work manually to ensure that when I’m inserting, updating or deleting data, the relationships are handled properly. For example, if you have code that adds a new LineItem into a SalesOrder, simply writing myOrder.LineItems.Add(someItem) won’t trigger EF to push the orderId into the LineItem when it’s time to persist the LineItem into the database. You’ll have to explicitly set that orderId value. If you look back at the CreateLineItem method of the SalesOrder in Figure 4, you’ll see I’ve got that covered. In my system, the only way to create a new line item for an order is through that very method, which means I can’t write code elsewhere that misses that critical step of applying the orderId. Another question you may ask is: “What if I want to change the orderId of a particular line item?” In my system, that’s an action that doesn’t make a lot of sense. I can see removing line items from orders. I can see adding line items to orders. But there’s no business rule that allows for changing the orderId. However, I can’t help thinking of these “what ifs” because I’m so used to just building these capabilities into my data model.

In addition to the explicit control I have over the relationships, breaking up the read and write logic also gets me thinking about all the logic I add to my data models by default, when some of that logic will never be used. And that extraneous logic may be forcing me to write workarounds to avoid its side effects.

The problems I brought up earlier about reference data being re-added to the database accidentally or null values being introduced when you’re reading data you don’t intend to update—these problems will also disappear. A class defined for reading may include values that you want to see but not update. My SalesOrder example doesn’t have this particular problem. But a write class could avoid including properties you may want to view but not update and, therefore, avoid overwriting ignored properties with null values.

Make Sure It’s Worth the Effort

CQRS can add a lot of work to your system development. Be sure to take a look at articles that provide guidance on when CQRS might just be overkill for the problem you’re solving, such as the one by Udi Dahan at bit.ly/2bIbd7i. Dino Esposito’s “CQRS for the Common Application” (msdn.com/magazine/mt147237) also provides insight. My particular use of this pattern isn’t what you might think of as full-blown CQRS, but being given “permission” to split up the reads and writes by CQRS has helped me reduce the complexity of solutions where an overreaching data model had been getting in the way. Finding a balance between writing extra code to get around side effects or writing extra code to provide cleaner, more direct paths to solving the problem takes some experience and confidence. But sometimes your instinct is the best guide.

Julie Lerman is a Microsoft MVP, .NET mentor and consultant who lives in the hills of Vermont. You can find her presenting on data access and other .NET topics at user groups and conferences around the world. She blogs at thedatafarm.com/blog and is the author of “Programming Entity Framework,” as well as a Code First and a DbContext edition, all from O’Reilly Media. Follow her on Twitter: @julielerman and see her Pluralsight courses at juliel.me/PS-Videos.

Thanks to the following technical expert for reviewing this article: Andrea Saltarello (Managed Designs) (andrea.saltarello@manageddesigns.it)
Andrea Saltarello is an entrepreneur and software architect from Milan, Italy, who still loves writing code for real projects to get feedback about his design decisions. As a trainer and speaker, he has had several speaking engagements for courses and conferences across Europe, such as TechEd Europe, DevWeek and Software Architect. He has been a Microsoft MVP since 2003 and was recently been appointed a Microsoft Regional Director. He is passionate about music, and is devoted to Depeche Mode, with whom he has been in love ever since listening to “Everything Counts” for the first time.

Discuss this article in the MSDN Magazine forum