Use Cache, Save Cash
Telligent Systems, Inc.
October 24, 2004
Summay: Learn how to use database cache invalidation in ASP.NET 1.1 with a similar technique as the one used by ASP.NET 2.0. (8 printed pages)
Ok, the title is a little tongue-in-cheek, but it got your attention right? The ASP.NET Cache is by far one of my favorite ASP.NET features. Why? Because by using the Cache you can gain some fantastic performance and scalability results, which can be easily measured and translated to real dollar savings for your application. This makes you the favorite code-monkey of your CTO because you impacted the all-important return on investment (ROI) of the application. In other words, the Cache does save cash!
At a high-level, in ASP.NET 1.1, the Cache is implemented as a Hashtable supporting the concepts of a least recently used (LRU) algorithm to ensure that if memory is needed, items can be removed from the Cache—a set of programming interfaces for inserts and removing items from the Cache—and finally the notion of dependencies, supporting time, file, and key dependencies.
The dependency model is one of the more important features of the Cache because it allows statements such as:
- At this point in time, this item in the Cache will no longer be valid (time based dependency).
- If this Cache entry changes, this other Cache entry is also invalid (key based dependency).
- If this file changes, this item in Cache is no longer valid (file based dependency).
When programming with the Cache, you always check to see if the item exists before using it. Thus, when working with the Cache it's a good idea to follow the pattern below:
' #1 – Get a reference to the object Dim myCachedArrayList As ArrayList = Cache("MyArrayList") ' #2 – Check if the reference is null If (myCachedArrayList) Is Nothing Then ' Add items to array list myCachedArrayList = PopulateMyArrayList() ' Cache Cache.Insert("MyArrayList", myCachedArrayList) End If ' #3 – Return the data Return myCachedArrayList
Looks good right? We have a flexible model for storing data in memory that can be invalidated under certain conditions. There are a couple shortcomings that ASP.NET developers have frequently requested be addressed. One is database cache invalidation, which means when this data in the database changes, remove this item from the Cache. The second is the ability to create your own cache dependencies.
The good news is that in ASP.NET 2.0, previously code-named "Whidbey," both of these shortcomings are fixed. There are now two database cache dependency models—one for Microsoft SQL Server 7 and 2000, and another for SQL Server 2005, previously code-named "Yukon." The second big change is that the CacheDependency class is unsealed and replumbed to allow you to write your own dependency rules for the Cache.
The bad news is ASP.NET 2.0 isn't available yet for production use. However, we can build a similar database cache invalidation system today with ASP.NET 1.1. In fact, while I was on the ASP.NET team this is the technique I used to prototype the database cache invalidation used for Microsoft SQL Server 7 and SQL Server 2000 in early 2002.
The two database cache invalidation options available in ASP.NET 2.0 are quite different. The option for SQL Server 2000 or SQL Server 7.0 is restricted to what is being called table level notification. Only notifications to actions performed on a SQL table are raised. For example, UPDATE, INSERT, and DELETE actions on a table. Whereas in SQL Server 2005, notifications can be received from changes to the results of dynamic SQL, stored procedures, views, and simple table level invalidations.
Unfortunately, the database cache dependency model used by SQL Server 2005 cannot be replicated with ASP.NET 1.1 as there are new features built directly into the database to support notification not available in prior versions of SQL Server.
Database Cache Invalidation: What You Need to Know
There are several techniques in use today for accomplishing database Cache invalidation. The first of which I made available nearly four years ago, and used an extended stored procedure technique to notify ASP.NET when data in the database changed. This prototype was the ASP.NET team's first investigation into how we could implement database cache invalidation. If you'd like to read about this technique in more detail, you can read the article at http://www.dotnetjunkies.com/Tutorial/A4ED5FD6-D385-4475-A357-27CF43A78205.dcik
Another technique that many people have used is to have SQL Server touch an external file (file dependency) when data in the database changes and the Cache needs to be invalidated.
These techniques demo very well, make for great articles, and work for small applications. However, I would strongly recommend avoiding these techniques if you are building large complex applications. Case in point, we wouldn't use these techniques for our own applications, such as the Forums running at www.asp.net/forums or the next version we're building called Community Server (www.communityserver.org).
Let's briefly discuss why these existing techniques are flawed, starting with the extended stored procedure model.
Extended Stored Procedure HTTP Push Model
The extended stored procedure push model uses the following architecture:
- A custom class that is used to add an item to the Cache and make it dependent upon a database notification.
- An HttpHandler that can receive notifications and invalidate items from the Cache.
- A trigger on the table in the database that is being monitored for changes.
- A table in the database tracking the tables being monitored for changes and the applications requiring notification when data changes.
- An extended stored procedure used to call an application's HttpHandler to notify it of a change.
When a change is detected, using whatever logic is necessary in a stored procedure or trigger, a change notification stored procedure is called. This change notification procedure retrieves a list of Web applications to be notified of the change, and then for each application calls an extended stored procedure that makes an HTTP call to the Web application instructing it to remove a specific cache entry, thus pushing the change to the application. The Web application simply receives an HTTP request that contains a cache key that needs to be removed. Internally the application simply calls Cache.Remove() with the cache key sent.
Sounds great, right? Yes, and it actually works very well. But here are the downsides:
- The extended stored procedure makes an HTTP callback to the server that Cache needs to be invalidated. The stored procedure is executed as part of the ATOMIC operation that changed the data within the database. It is called from a trigger when the table is modified. Unlike other protocols, HTTP isn't fire and forgot, instead any given request also expects a response. Thus, the extended stored procedure can't complete until the HTTP call completes. If the Web server is on a slow network or takes a long time to respond, this delay blocks the database operation from completing. The blocking further causes SQL Server to potentially serialize other UPDATESs/DELETEs/INSERTs or worse, block threads from completing. Now, multiply this by the number of servers in your server farm and the total number of database operations.
- If the Web server is running in Web garden mode, where multiple process are created to simulate virtual Web servers, there is no way to instruct a given request to be assigned to a particular process. In other words, when running in Web garden mode the server is running many virtual Web servers for your application. When the extended stored procedure calls back to your server to notify it a change has occurred, it has no way of ensuring that all the virtual Web servers get notified. You can potentially end up with one application being updated, but several others remaining out of sync.
If you are running in a small server environment, not running the Web servers in Web garden mode, and SQL Server is not a contentious resource in your system. The extended stored procedure HTTP push model works great, but if you get into a situation where you're application suddenly grows, you may bottleneck your database or get into a situation where your Cache is not always in sync.
File Update Model
The second technique, and definitely the simpler of the two, updates a file when data changes versus attempting to use HTTP to callback to the application. The ASP.NET application developer uses the standard Cache file change dependency to monitor the file for changes, when the file changes the cached item is removed.
This technique does not have the Web garden problems associated with the extended stored procedure push model, but it does share many of the same blocking issues associated with the extended stored procedure technique. Additionally, it introduces its own idiosyncrasies:
- File contention becomes an issue when using this technique because the SQL Server can only update the 'change' file when the file is not locked by another operation. Thus, all changes in the database that need to be notified of changes must coordinate locking this file, changing some value, and then unlocking the file. In other words, the database serializes work against the file. Again, the problem faced here is the serialization and blocking that can occur on SQL Server.
- File change notification also becomes an issue when using this technique because to be used in a Web farm the change file must be placed on a share and the proper security permissions granted for the various Web servers to view this share and monitor the file for changes.
The file update model works well for small server environments, supports Web gardens, and cases where SQL Server is not a contentious resource in your system. In fact, it's probably a better choice because it is much simpler than the extended stored procedure solution. This model breaks down, however, in larger server environments or environments where the database is already a gating resource in the system.
As you can see, both techniques have applicability, but you need to make good decisions as to whether or not you can use these techniques based on server size and load. Any time you introduce known blocking operations into your application, you are adding a potential bottleneck to scalability and performance.
Database Cache Invalidation: ASP.NET 2.0 Style
The original goal of the extended stored procedure cache invalidation model was to begin some early prototyping of the database cache invalidation problem. The ASP.NET team knew this functionality was something they wanted to solve in version 2.0, but needed to better understand how this could be built in a scalable manner.
In fact, as a company Microsoft knew how important this was and put together a team from ASP.NET, IIS, SQL Server, ADO.NET, and ISA server to work on this problem, also known as The Caching Taskforce.
Note My blog posting of the Bill Gates meeting was specifically to walk him through the work we had done in the ASP.NET/Yukon implementation. You can read the article at http://weblogs.asp.net/rhoward/archive/2003/04/28/6128.aspx.
The result of this caching task force was two new architectures for database cache invalidation. The first was designed into the system and was simply part of ASP.NET, ADO.NET, and SQL Server 2005. It built a scalable model for push notifications at a super-granular level. For example, tell me when this results of this stored procedure change. The SQL Server 2005 implementation of cache invalidation cannot be mirrored or implemented in version 1.1 of the .NET Framework. We'll save the discussion for how this system works for a later article. The second technique was created to support database cache invalidation for SQL Server 7.0 and SQL Server 2000. Obviously nothing could be added to the database, so we had to work in the constraints of today's technologies. The good news is the exact same technique used for SQL Server 7.0 and SQL Server 2000 support can be accomplished today with ASP.NET 1.1.
Are We There Yet?
We've all seen the ads on TV, or experienced it first hand if you have kids, where the kid in a car is going to some eagerly anticipated destination constantly asking if they have arrived yet. They are polling until they receive the desired response.
Similarly, the database cache invalidation technique used for SQL Server 7.0 and SQL Server 2000 polls the database to check for changes. The automatic reaction here is usually negative—isn't polling a bad thing? However, once you understand more of what is happening, the simplicity and scalability of the design quickly emerges.
Two large problems must be overcome for any database cache invalidation system:
- Prevent blocking on the database: It's super important for the database to be as fast and efficient as possible, thus avoiding blocking or serializing during data changes is a must. First, the purpose of the database is to effectively manage and allow access to the data.
- Ensure Cache consistency: A solution that can only guarantee notification to a single virtual server in a Web server running in Web garden mode is useless. All applications must be capable of receiving notifications when data changes.
The premise upon which the polling model is built is this—the cost of polling the database is substantially less than the cost of re-executing the original query. Furthermore, the polling should not happen on a request thread, but rather happen as a background operation.
A good scenario here is Community Server. Community Server is a complex application and uses many normalized tables to join related data together to fulfill requests. A common task is to retrieve paged sets of data through a stored procedure to display a paged view of threads in a particular forum. The stored procedure that fulfills a paged thread view requests executes a series of select statements joining from 3-5 tables, creates a temporary table, selects out of the temporary table, and performs another join. In other words, the data transformation that must occur to satisfy a request is quite an operation.
With the database cache invalidation model used in ASP.NET 2.0, which we'll implement shortly, a change notification table is created in the database. The originally requested data is cached at the application level, where it can be quickly retrieved, and every few seconds the application layer polls the database to determine if the data has changed. Unlike the original request, the poll is accessing a single table whose row depth is no greater than the number of tables in the database. The poll operation retrieves all the records from this change notification table. In all likelihood, this table is so small that the entire results can be paged to memory in SQL and quickly accessed.
The results of this change notification table are then analyzed within the application layer. The resultset is a list of change ids for each table in the database. If the change id values are different than the change ids currently cached, the related cached entries are invalidated and upon the next request the full look-up operation is used (because the data is not in the cache) and the cache repopulated and the process starts over again. Figure 1 shows how this architecture works.
Figure 1. Architecture diagram
A very, very important aspect employed by the polling is that the polling occurs on a background thread apart from the thread executing the request. Thus, if the results can be served from cache, the database is never accessed when the request is made. However, once a change is detected the entries are removed from the Cache and the next request executes as normal and repopulates the Cache starting the system all over again, as shown in Figure 2.
Figure 2. Architecture diagram after change
To accomplish polling on a background thread we'll take advantage of one of my favorite and little known classes in the .NET Framework—Timer. The Timer class can be found in the System.Threading namespace. What the Timer allows you to do is create an event that is raised periodically, by a programmatically determined interval. When the Timer is awakened, it grabs a thread from the thread pool of the current application domain and raises an event. It is within this event that our code can periodically run to validate changes, or lack of changes, in the database.
The Background Service Class
We use this same technique in Community Server to send e-mails or index posts on a preset interval of time versus on each request. This has saved us a lot of time spent in requests that were adding new posts, as previously we did these operations every time a new post was added.
In Community Server, we use the Timer as a static instance inside of an HttpModule. When the ASP.NET application is initialized, the static timer is instantiated and the poll internal period. When the poll event gets raised we:
- Execute the necessary SQL to pull from the database a list of change ids.
- Compare the change ids from the database to corresponding values stored in the ASP.NET Cache. Values in the Cache that don't match are updated, which force dependent items to be removed from the Cache.
You can download a fully-working sample of database cache invalidation from the Blackbelt Slides and Demos from my Microsoft Tech-Ed 2004 presentation at http://www.rob-howard.net/.
This particular piece of sample code works for the Northwind sample database that ships with SQL Server. Before using it, you'll also need to make a few changes to the database.
First, you need to add the ChangeNotification table:
CREATE TABLE [dbo].[ChangeNotification] ( [Table] [nvarchar] (256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [ChangeID] [int] NOT NULL )
Second, you need to add a trigger to the Products table:
CREATE TRIGGER [ChangeNotificationTrigger] ON [dbo].[Products] FOR INSERT, UPDATE, DELETE AS UPDATE ChangeNotification SET ChangeID = ChangeID + 1 WHERE [Table] = 'Products'
The trigger is applied whenever the Products table is modified and, in turn, updates a row in the ChangeNotification table.
While this is an overly simplistic example, it should provide a good starting point if you want to implement this for your own application. Some of the shortcomings not addressed in this sample include:
- Table level changes only: Modifying this for views or even row-level changes is not terribly difficult.
- Row locking: Update logic needs to be added to account for large modifications to the Products table. For example, updating 100 products would result in 100 changes to the ChangeNotification table. Similar to the version of this that ships with ASP.NET 2.0, you'll likely want to add some logic to better deal with large updates.
The caching system of ASP.NET is something that all ASP.NET developers should strive to use. Planning early for using the Cache and understanding what aspects of your application greatly aid in making the best use of the Cache. Several of the ASP.NET 1.1 limitations, such as database cache invalidation, can be overcome using some of the same techniques used in ASP.NET 2.0. The use of the Timer class in this article shows one possible way of accomplishing this and while there are several other options, this polling technique is recommended. I think you'll also find the Timer class to be useful for other problems as well.
- Microsoft ASP.NET Coding Strategies with the Microsoft ASP.NET Team
- ASP.NET v. 2.0-The Beta Version by Alex Homer, Dave Sussman, and Rob Howard
- Introducing Microsoft ASP.NET 2.0
Rob Howard is the founder of Telligent Systems, a Microsoft .NET best practices software company that specializes in ASP.NET and collaboration technologies. He is also a Microsoft Regional Director for Dallas, TX. Prior to founding Telligent Rob was a member of the Microsoft ASP.NET team where he helped design many features found in ASP.NET 1.0, 1.1, and 2.0. You can learn more about Telligent at www.telligentsystems.com and Community Server at www.communityserver.org. Rob's email address is email@example.com.