Article
08/10/2015

July 2012

Volume 27 Number 07

ASP.NET - How to Handle Relational Data in a Distributed Cache

By Iqbal Khan | July 2012

The Microsoft .NET Framework has become popular for developing an array of high-transaction applications including Web, service-oriented architecture (SOA), high-performance computing or grid computing, and cloud computing. All these application architectures are scalable. But a major bottleneck occurs in the data storage—normally a relational database—which isn’t able to scale and handle the increased transaction load that the application tier can.

As a result, distributed caching is becoming quite popular because it allows you to cache data in an in-memory cache that’s much faster to access than any database. Also, distributed caching provides linear scalability across multiple cache servers. With linear scalability, you can add more cache servers to the distributed cache cluster if you need to handle a bigger transaction load. The incremental gain in transactional capacity doesn’t diminish as you add more servers (it’s a straight line in an X-Y graph where X is the number of cache servers and Y is the transaction capacity). Figure 1 shows how distributed caching fits into a typical ASP.NET or Windows Communication Foundation (WCF) application and how it provides linear scalability.

Figure 1 Distributed Cache Used in ASP.NET or Windows Communication Foundation Applications

In this article, I’ll discuss how developers should handle data relationships while caching their data.

Although distributed caching is great, one challenge it presents is how to cache relational data that has various relationships between data elements. This is because a distributed cache provides you a simple, Hashtable-like (key, value) interface where the “key” uniquely identifies a cached item and “value” is your data or object. However, your application probably has a complex domain object model with inheritance, aggregation and other types of relationships.

Additionally, a distributed cache stores individual cached items separately, which is different from the database where you have relationships between different tables. Usually, there are no relationships between the different cached items in a typical distributed cache. That poses a challenge to .NET application developers. In other words, you’re facing the challenge of tracking a lot of the relationship information inside your application to make sure you can handle relationships correctly in the cache, and at the same time take advantage of distributed caching.

I’ll explain how these relationships can be mapped and provide source code examples. The net effect is that the applications don’t have to keep track of these relationships themselves. Rather, the cache can be made aware of them and then handle them independently.

Object Relation Mapping Is Good

First, make sure you transform your relational data into a domain object model. Without this step, you’d have a difficult time handling all of the relationships in a distributed cache.

In any .NET application, you’re most likely using DataReader (SqlDataReader, OracleDataReader or OleDbDataReader) or DataTable to retrieve data from the database, which is fine. But many developers then directly access data from these objects (especially DataTable) throughout their application. Some do it out of laziness, because they don’t want to create custom domain objects. Others do it because they believe they can use intelligent filtering capabilities in the DataTable.

I strongly recommend you transform your DataReader or DataTable into a domain object model. This will greatly simplify your application design and also allow you to use distributed caching effectively. And a good distributed cache usually provides you with a SQL-like or LINQ querying capability so you won’t miss the filtering features of DataTable.

When using a DataTable, you’re forced to cache it as one cached item. Note that a large number of rows in a DataTable slows down your application when you cache it.

You can either transform DataReader and DataTable into domain objects manually or use one of the leading Object Relation Mapping (ORM) engines. The Entity Framework from Microsoft is one such engine. Another popular one is NHibernate, which is open source. An ORM tool greatly simplifies your task of transforming relational data into a domain object model.

CacheDependency Helps Manage Relationships

With a domain object model, the first question is how to handle all those relationships in the cache. And the answer is CacheDependency, which is part of ASP.NET Cache and is now also found in some commercial distributed caching solutions.

CacheDependency allows you to inform the cache about different types of relationships between cached items and then let the distributed cache manage data integrity for them. Basically, a CacheDependency allows you to tell the cache that one cached item depends on another cached item. Then the distributed cache can track any changes in the target item and invalidate the source item that depends on the target item.

Let’s say that if data item A depends on B, then if B is ever updated or removed from the cache, the distributed cache automatically removes A as well. CacheDependency also provides cascading capability, so if A depends on B and B depends on C and then C is updated or removed, it results in B being automatically removed by the distributed cache. And when that happens, A is also automatically removed by the distributed cache. This is called the cascading CacheDependency.

I use CacheDependency later in this article to demonstrate how to handle relationships in a distributed cache.

Three Different Types of Relationships

First of all, let’s use an example data model for discussion purposes, shown in Figure 2.

Figure 2 Example Data Model for Relationships

As you can see, in this data model, Customer has a one-to-many relationship with Order; Product has a one-to-many relationship with Order; and Customer and Product have a many-to-many relationship with each other through the Order table. For our example data model, Figure 3 shows the equivalent object model that represents the same relationships.

Figure 3 Example Object Model Against the Data Model

Before I go into the details of different relationship types, I want to explain one thing. Unlike the database, an application domain object model always has a primary object that the application has fetched from the database and that is therefore the starting point for the application during a particular transaction. All other objects are seen as related to this primary object. This concept is valid for all types of relationships and affects how you see relationships in a domain object model. Now, let’s proceed.

One-to-One and Many-to-One Relationships

One-to-one and many-to-one relationships are similar. In a one-to-one relationship between table A and table B, one row in table A is related to only one row in table B. You keep a foreign key in either table A or B that is the primary key of the other table.

In the case of a many-to-one relationship between A and B, you must keep the foreign key in table A. This foreign key is the primary key of table B. In both one-to-one and many-to-one relationships, the foreign key has a unique constraint to make sure there are no duplicates.

The same relationship can easily be transformed into a domain object model. The primary object (explained earlier) keeps a reference to the related object. Figure 4 shows an example of the primary object keeping relationship information. Note that the Order class contains a reference to the Customer class to indicate a many-to-one relationship. The same would happen even in a one-to-one relationship.

Figure 4 Caching a Related Object Separately

public void CacheOrder(Order order)
{
  Cache cache = HttpRuntime.Cache;
  DateTime absolutionExpiration = Cache.NoAbsoluteExpiration;
  TimeSpan slidingExpiration = Cache.NoSlidingExpiration;
  CacheItemPriority priority = CacheItemPriority.Default;
  if (order != null) {
    // This will prevent Customer from being cached with Order
    Customer cust = order.OrderingCustomer;
    // Set orders to null so it doesn't get cached with Customer
    cust.Orders = null;
    string custKey = "Customer:CustomerId:" + cust.CustomerId;
    cache.Add(custKey, cust, null,
              absolutionExpiration,
              slidingExpiration,
              priority, null);
    // Dependency ensures order is removed if Cust updated/removed
    string[] keys = new string[1];
    keys[0] = custKey;
    CacheDependency dep = new CacheDependency(null, keys);
    string orderKey = "Order:CustomerId:" + order.CustomerId
      + ":ProductId:" + order.ProductId;
    // This will only cache Order object
    cache.Add(orderKey, order, dep,
              absolutionExpiration,
              slidingExpiration,
              priority, null);
  }
}

One-to-Many Relationships (Reverse of Many-to-One)

In the case of a one-to-many relationship in the database between tables A and B, table B (meaning the “many side”) keeps the foreign key, which is actually the primary key of table A, but without a unique constraint on the foreign key.

In the case of a one-to-many domain object model, the primary object is the Customer and the related object is the Order. So the Customer object contains a collection of Order objects. Figure 4 also shows an example of this relationship between the Customer and Order objects.

Many-to-Many Relationships

In the case of a many-to-many relationship between tables A and B, there’s always an intermediary table AB. In our case, Order is the intermediary table and has two foreign keys. One is against the Customer table to represent a many-to-one relationship and the other is against the Order table to again represent a many-to-one relationship.

In case of a many-to-many relationship, the object model is usually one-to-many from the perspective of a primary object (defined earlier) that’s either Customer or Product. The object model also now contains a many-to-one relationship between the intermediary object (Order, in our case) and the other object (Product, in our case). Figure 4 also shows an example of a many-to-many relationship, which in this case is between Customer and Product objects, with the Order object being an intermediary object.

As you can see, the object model is from the perspective of the Customer object that’s the primary object, and the application fetched it from the database. All related objects are from the perspective of this primary object. So a Customer object will have a collection of Order objects, and each Order object will contain a reference to the Product object. The Product object probably won’t contain a collection of all Order objects belonging to the Product object because that isn’t needed here. If Product were the primary object, it would have a collection of Order objects—but then the Customer object wouldn’t have a collection of Order objects.

Caching Strategies for Different Relationship Types

So far, I’ve discussed how to fetch data from the database, transform it into a domain object model and keep the same relationships in the domain object model as in the database—although from the perspective of the primary object. But if your application wants to cache data in a distributed cache, you must understand how to handle all these relationships in the cache. I’ll go through each case.

In all cases, you must ensure that relationship information in the cache isn’t lost, impacting either data integrity or your ability to fetch related objects from the cache later.

Caching One-to-One and Many-to-One Relationships

You have two options here:

Cache related objects with the primary object: This option assumes that the related object isn’t going to be modified by another user while it’s in the cache, so it’s safe to cache it as part of the primary object as one cached item. If you fear this isn’t the case, then don’t use this option.
Cache related objects separately: This option assumes that the related object might be updated by another user while it’s in the cache, so it’s best to cache the primary and related objects as separate cached items. You should specify unique cache keys for each of the two objects. Additionally, you can use the tag feature of a distributed cache to tag the related object as having a relationship to the primary object. Then you could fetch it later through the tag.

Caching One-to-Many Relationships

In the case of one-to-many relationships, your primary object is always on the “one side” (in our example it’s the Customer object). The primary object contains a collection of Order objects. Each collection of related objects represents a one-to-many relationship. Here you have three caching options:

Cache related objects collections with the primary object: This of course assumes that related objects won’t be updated or fetched independently by another user while in the cache, so it’s safe to cache them as part of the primary object. Doing so improves your performance because you can fetch everything in one cache call. However, if the collection is large (meaning tens of thousands of objects and reaching into megabytes of data), you won’t get the performance gain.
Cache related objects collections separately: In this situation, you believe other users might want to fetch the same collections from the cache; therefore, it makes sense to cache the collection of related objects separately. You should structure your cache key in such a way that you’ll be able to find this collection based on some information about your primary object. I’ll discuss the issue of caching collections in much more detail later in this article.
Cache all individual related objects from collections separately: In this situation, you believe each individual object in the related collection might be updated by other users; therefore, you can’t keep an object in the collection and must cache it separately. You can use the tag feature of a distributed cache to identify all objects as related to your primary object so you’ll be able to fetch them quickly later on.

Cache Many-to-Many Relationships

Many-to-many relationships in a domain object model don’t really exist. Instead, they’re represented by one-to-many relationships, with the exception that the intermediary object (Order, in our case) contains a reference to the other side object (Product, in our case). In a pure one-to-many, this reference wouldn’t exist.

Handling Collections

The topic of how to handle collections is an interesting one, because you often fetch a collection of objects from the database and you want to be able to cache collections in an efficient way.

Let’s say you have a scenario where you request all your New York-based customers and you don’t expect any new customers to be added from New York in the next day or in the next few minutes. Keep in mind you aren’t caching data for weeks and months, only for a minute or a few hours in most cases. In some situations you might cache it for many days.

There are different caching strategies for caching collections, as I’ll explain.

Cache an Entire Collection as One Item

In this case, you know you won’t be adding any more customers from New York and that other users aren’t accessing and modifying any New York customer data during the period in which the data will be cached. Therefore, you can cache the entire collection of New York customers as one cached item. Here, you can make the search criteria or the SQL query part of the cache key. Any time you want to fetch customers who are from New York, you just go to the cache and say, “Give me the collection that contains New York customers.”

Figure 5 shows how an entire collection of related Order objects is cached.

Figure 5 Caching a Related Collection Separately

public void CacheCustomer(Customer cust)
{
  Cache cache = HttpRuntime.Cache;
  DateTime absolutionExpiration = Cache.NoAbsoluteExpiration;
  TimeSpan slidingExpiration = Cache.NoSlidingExpiration;
  CacheItemPriority priority = CacheItemPriority.Default;
  if (cust != null)
  {
    string key = "Customer:CustomerId:" + cust.CustomerId;
    // Let's preserve it to cache separately
    IList<Order> orderList = cust.Orders;
    // So it doesn't get cached as part of Customer
    cust.Orders = null;
    // This will only cache Customer object
    cache.Add(key, cust, null,
              absolutionExpiration,
              slidingExpiration,
              priority, null);
    // See that this key is also Customer based
    key = "Customer:CustomerId:" + cust.CustomerId + ":Orders";
    cache.Add(key, orderList, null,
              absolutionExpiration,
              slidingExpiration,
              priority, null);
  }
}

Cache Each Collection Item Separately

Now let’s move on to the second strategy. In this scenario, you or other users want to individually fetch and modify a New York customer. But the previous strategy would require everybody to fetch the entire collection from the cache, modify this one customer, put it back in the collection and cache the collection again. If you do this frequently enough, it will become impractical for performance reasons.

So, in this instance, you don’t keep the collection of all the New York customers as one cached item. You break up the collection and store each Customer object separately in the cache. You need to group all these Customer objects so at a later point you can just fetch them all back in one call as either a collection or an IDictionary. The benefit of this approach is being able to fetch and modify individual Customer objects. Figure 6 shows an example of how to cache each of the related objects in the collection separately.

Figure 6 Caching Each Collection Item Separately

public void CacheOrdersListItems(IList<Order> ordersList)
{
  Cache cache = HttpRuntime.Cache;
  DateTime absolutionExpiration = Cache.NoAbsoluteExpiration;
  TimeSpan slidingExpiration = Cache.NoSlidingExpiration;
  CacheItemPriority priority = CacheItemPriority.Default;
  foreach (Order order in ordersList)
  {
    string key = "Order:CustomerId:" + order.CustomerId
      + ":ProductId" + order.ProductId;
    string[] keys = new string[1];
    keys[0] = key;
    // Dependency ensures Order is removed if Cust updated/removed
    CacheDependency dep = new CacheDependency(null, keys);
    Tag [] tagList = new Tag [1];
    tagList[0] = new Tag ("customerId" + order.CustomerId);
     // Tag allows us to find order with a customerId alone
     cache.Add(key, order, dep,
               absolutionExpiration,
               slidingExpiration,
               priority, null, tagList);
  }
}

Keep in mind, however, that this strategy assumes no New York customers have been added to the database in the period in which they’ve been cached. Otherwise, when you fetch all the New York customers from the cache, you’ll get only a partial list. Similarly, if a customer is deleted from the database but not from the cache, you’ll get stale list of customers.

Handling Collections Where Objects Are Added or Deleted

The third option for handling collections is where you think new customers may be added from New York or some existing customer may be deleted. In this case, whatever you cache is only partial data or old data. Perhaps the collection had only 100 customers and you added two more today. Those two won’t be part of the cache. However, when you fetch all the customers from New York, you want the correct data: You want 102 results, not 100.

The way to ensure this happens is to make a call to the database. Get all the IDs for the customers and do a comparison to see which ones are in the cache and which ones aren’t. Individually fetch from the database those that aren’t in the cache and add them to the cache. As you can see, this isn’t a fast process; you’re making multiple database calls and multiple cache calls. With a distributed cache that provides basic features, if you have a collection of 1,000 customers and 50 new are added, you’ll end up making 51 database calls and 101 distributed cache calls. In this case, it might be faster just to fetch the collection from the database in one call.

But if the distributed cache provides bulk operations, you’d make one database call to fetch IDs, one distributed cache call to see which IDs exist in the cache, one call to add all those new customers to the cache and one call to fetch the entire collection of customers from the cache. This would be a total of one database call and three distributed cache calls, and that isn’t bad at all. And, if no new customers were added (which would be the case 90 percent of the time), this would be reduced to one database call and two distributed cache calls.

Performance and Scalability

I covered various situations that arise when your application fetches relational data from the database, transforms it into a domain object model and then wants to cache the objects. The purpose of my article is to highlight all those areas where object relationships affect how you cache the objects and how you later modify or remove them from the cache.

Distributed caching is a great way to improve your application’s performance and scalability. And, because applications deal with relational data most of the time, I hope this article has provided you with some insight into how you should handle relational data in a distributed cache.

Iqbal Khan is the president and technology evangelist of Alachisoft, which provides NCache and StorageEdge (alachisoft.com). NCache is a distributed cache for .NET and Java and improves application performance and scalability. StorageEdge is an RBS provider for SharePoint and helps optimize storage and performance of SharePoint. Khan received his master’s degree in computer science from Indiana University, Bloomington, in 1990. Reach him at iqbal@alachisoft.com.

Thanks to the following technical expert for reviewing this article: Damian Edwards