Abhijit Gadkari
Summary: The impact of cache is wellunderstood in the system-design domain. While the concept of cache isextensively utilized in the von Neumann architecture, the same is not true for thedistributed-computing architecture. For example, consider a three-tieredWeb-based business application running on a commercial RDBMS. Every time a newWeb page loads, many database calls are made to fill the drop down lists on thepage. Performance of the application is greatly affected by the unnecessarydatabase calls and the network traffic between the Web server and the databaseserver.
Introduction
Distributed Systems
Cache and the Principle of Locality
Temporal Cache
Spatial Cache
Cache Replacement Algorithms
Primed Cache vs. Demand Cache: Can We Predict the Future?
Primed Cache
Demand Cache
Caching in the ORM World!
Transactional Cache
Shared Cache
Managing the Interaction
Chasing the Right Size Cache
Conclusion
Acknowledgement
References
In production, many applications buckle down because they treatthe database as their cache. Web server-based application-level cache can beeffectively used to mitigate this problem. An effective caching mechanism isthe foundation of any distributed-computing architecture. The focus of thisarticle is to understand the importance of caching in designing effective andefficient distributed architecture. I will discuss the principle of locality ofcache, basic caching patterns like temporal and spatial cache, and primed anddemand cache, followed by an explanation of the cache replacement algorithms.
ORM technologies are becoming part of the mainstream applicationdesign, adding a level of abstraction. Implementing ORM-level cache willimprove the performance of a distributed system. I will explain different ORMcaching levels such as transactional cache, shared cache, and the details ofintercache interaction. I’ll also explore the impact of ORM caching onapplication design.
A distributed system is a heterogeneous system. Diverse systemcomponents are connected to each other via a common network. Applications usingTCP/IP-based Internet are examples of open distributed systems.
Figure 1 shows a typical distributed architecture:
.jpg)
Figure 1. Distributedarchitecture
In the distributed environment, different activities occur in concurrentfashion. Usually, common resources like the underlying network, Web/applicationservers, database servers, and cache servers are shared by many clients.Distributing the computing load is the hallmark of distributed systems.Resource sharing and allocation is a major challenge in designing distributedarchitecture. For example, consider a Web-based database-driven businessapplication. The Web server and the database server are hammered with clientrequests. Caching, load-balancing, clustering, pooling, and time-sharingstrategies improve the system performance and availability. I’ll focus oncaching in the distributed environment.
Any frequently consumed resource can be cached to augment theapplication performance. For example, caching a database connection, anexternal configuration file, workflow data, user preferences, or frequentlyaccessed Web pages improve the application performance and availability. Manydistributed-computing platforms offer out-of-the-box caching infrastructure.Java Caching System (JCS) is a distributed composite caching system. In Microsoft.NET Framework, the System.Web.CachingAPI provides the necessary caching framework. The Microsoft project code-named “Velocity”is a distributed-caching platform [1].
The performance of a caching system depends on the underlyingcaching data structure, cache eviction strategy, and cache utilization policy.Typically, a hash table with unique hash keys is used to store the cached data;JCS is a collection of hash tables [2]. The .NET cache implementation is based on the Dictionarydata structure. The cache eviction policy is implemented in terms of areplacement algorithm. Utilizing different strategies such as temporal,spatial, primed, and demand caching can create an effective caching solution.
The word “cache” comes from the French word meaning “to hide.” [3]. Wikipedia definescache as “a temporary storage area where frequently accessed data can be storedfor rapid access.” [4] Cached data is stored in the memory. Defining frequently accessed data is amatter of judgment and engineering. We have to answer two fundamental questionsin order to define a solid caching strategy. What resource should be stored inthe cache? How long should the resource be stored in the cache? The localityprinciple, which came out of work done on the Atlas System’s virtual memory in1959 [5], providesgood guidance on this front, defining temporal and spatial locality. Temporallocality is based on repeatedly referenced resources. Spatial locality statesthat the data adjacent to recently referenced data will be requested in thenear future.
Temporal locality is well suited for frequently accessed,relatively nonvolatile data—for example, a drop-down list on a Web page. Thedata for the drop down list can be stored in the cache at the start of theapplication on the Web server. For subsequent Web page requests, the drop downlist will be populated from the Web server cache and not from the database.This will save unnecessary database calls and will improve applicationperformance.
Figure 2 illustrates a flow chart for this logic:
.jpg)
Figure 2. Temporallocality flow chart
When a resource is added to the cache, resource dependencies canbe added to the caching policy. Dependencies can be configured in terms of anexternal file or other objects in the cache. An expiration policy defines thetime dependency for the cached resource. Many caching APIs provide aprogrammatic way to synchronize the cache with the underlying database.
Figure 3 is sample C# code to populate the temporal cache:
.jpg)
Figure 3. C# code exampleto populate temporal cache
Consider an example of tabular data display like a GridView or an on-screen report.Implementing efficient paging on such controls requires complex logic. Thelogic is based on the number of records displayed per page and the total numberof matching records in the underlying database table. We can either performin-memory paging or hit the database every time the user moves to a differentpage; both are extreme scenarios. A third solution is to exploit the principleof spatial locality to implement an efficient paging solution. For example,consider a GridView displaying 10records per page. For 93 records, we will have 10 pages. Instead of fetchingall records in the memory, we can use the spatial cache to optimize thisprocess. A sliding window algorithm can be used to implement the paging. Let’sdefine the data window just wide enough to cover most of the user requests, say30 records. On page one, we will fetch and cache the first 30 records. Thiscache entry can be user session specific or applicable across the application.As a user browses to the third page, the cache will be updated by replacingrecords in the range of 1–10 by 31–40. In reality, most users won’t browsebeyond the first few pages. The cache will be discarded after five minutes ofinactivity, eliminating the possibility of a memory leak. The logic is based onthe spatial dependencies in the underlying dataset. This caching strategy workslike a charm on a rarely changing static dataset.
Figure 4 illustrates the spatial-cache logic that is used in the GridView example:
.jpg)
Figure 4. Spatial-cachesequence diagram
The drawback of this logic is the possibility of a stale cache. Astale cache is a result of the application modifying the underlying datasetwithout refreshing the associated cache, producing inconsistent results. Manycaching frameworks provide some sort of cache synchronization mechanism tomitigate this problem. In .NET, the SqlCacheDependencyclass in the System.Web.Caching APIcan be used to monitor a specific table [6]. SqlCacheDependencyrefreshes the associated cache when the underlying dataset is updated.
A second important factor in determining an effective cachingstrategy is the lifetime of the cached resource. Usually, resources stored inthe temporal cache are good for the life of an application. Resources that are storedin the spatial cache are either time-dependent or place-dependent.Time-dependent resources should be purged as per the cache expiration policy.Place-specific resources can be discarded based on the state of theapplication. In order to store a new resource in the cache, an existing cachedresource will be moved out of the cache to a secondary storage, such as thehard disk. This process is known as paging. Replacement algorithms such asleast frequently used resource (LFU), least recently used resource (LRU), andmost recently used resource (MRU) can be applied in implementing an effectivecache-eviction strategy, which influences the cache predictability [7]. The goal inimplementing any replacement algorithm is to minimize paging and maximize thecache hit rate. The cache hit rate is the possibility of finding the requestedresource in the cache. In most cases, LRU implementation is a good enoughsolution. JCS and ASP.NET caching is based on the LRU algorithm. In morecomplex scenarios, a combination of LRU and LFU algorithms such as the adaptivereplacement cache (ARC) can be implemented. The idea in ARC is to replace theleast frequently and least recently used cached data. This is achieved bymaintaining two additional scoring lists. These lists will store theinformation regarding the frequency and timestamp of the cached resource. ARC outperformsLRU by dynamically responding to the changing access pattern and continuallybalancing workload and frequency features [8]. Some applications implement a cost-basedeviction policy. For example, in Microsoft SQL Server 2005, zero-cost plans areremoved from the cache and the cost of all other cached plans is reduced byhalf [9]. The costin SQL Server is calculated based on the memory pressure.
A study of replacement algorithms suggests that a good algorithmshould strike a balance between the simplicity of randomness and the complexityinherent in cumulative information [10]. Replacement algorithms play an important role in definingthe cache-eviction policy, which directly affects the cache hit-rate and theapplication performance.
Data-usage predictability also influences the caching strategy.The primed-cache pattern is applicable when the cache or part of the cache canbe predicted in advance [11].This pattern is very effective in dealing with static resources. A Web browsercache is an example of primed cache; cached Web pages will load fast and savetrips to the Web server. The demand-cache pattern is useful when cache cannotbe predicted [12].A cached copy of user credentials is an example of demand cache. The primedcache is populated at the beginning of the application, whereas the demandcache is populated during the execution of the application.
The primed cache minimizes the overhead of requesting externalresources. It is suitable for the read-only resources frequently shared by manyconcurrent users.
Figure 5 illustrates the typical primed cache architecture:
.jpg)
Figure 5. Primed-cacheexample
The cache server cache is primed in advance, and the individualWeb/application server cache is populated from the cache server. EachWeb/application server can read, write, update, and delete the cache on thecache server. The cache server in turn is responsible for synchronizing thecache with the resource environment. Because the primed cache is populated in advance,it improves the application response time. For example, reports with static,fixed parameters can be populated and stored in the cache. This way, thereports are available almost instantly. In .NET, the ICachedReport interface can be used to store the prepopulatedreports in the cache. Updating the primed cache mostly results in updatingexisting cached resources. The cache is refreshed based on a routine scheduleor a predictable event-based mechanism. The primed-cache results in an almostconstant size cache structure [11].
The demand cache is suitable when the future resource demandcannot be predicted. The resource environment acquires the resource only whenit is needed. This optimizes the cache and achieves a better hit-rate. As soonas the resource is available, it is stored in the demand cache. All subsequentrequests for the resource are satisfied by the demand cache. As soon as it iscached, the resource should last long enough to justify the caching cost.
Figure 6 illustrates a class diagram for implementing the demandcache [12]:
.jpg)
Figure 6. Demand cache
For example, a user can have many roles and one role can havemany permissions. Populating the entire permissions domain for all users at thestart of an application will unnecessarily overload the cache. The solution isto store the user credentials in the demand cache on a successful log-in. Allsubsequent authorization requests from the application for alreadyauthenticated users will be fulfilled by the demand cache. This way the demandcache will only store a subset of all possible user permissions in the system.
In the absence of a proper eviction policy, the resource will becached forever. Permanently cached resources will result in memory leak, whichdegrades the cache performance. For example, as the number of authenticatedusers grows, the size of the demand cache increases and the performancedegrades. One way to avoid this problem is to link resource eviction withresource utilization. In our example, the cache size can be managed by removingthe credentials of all logged-off users.
.jpg)
Figure 7. Demand-cacheexample
Predicting the future is a difficult business. In a dynamicenvironment, adaptive caching strategies represent a powerful solution, basedon some sort of application usage heuristics. However, adaptive cachingstrategies are beyond the scope of this article.
Object relational mapping is a way to bridge the impedancemismatch between object-oriented programming (OOP) and relational databasemanagement systems (RDBMS). Many commercial and open-source ORM implementationsare becoming an integral part of the contemporary distributed architecture. Forexample, Microsoft Entity Framework and Language Integrated Query (LINQ), JavaData Objects (JDO), TopLink, Hibernate, NHibernate, and iBATIS are all popularORM implementations. The ORM manager populates the data stored in persistentstorages like a database in the form of an object graph. An object graph is agood caching candidate.
The layering principle, based on the explicit separation ofresponsibilities, is used extensively in the von Neumann architecture tooptimize system performance. N-tier application architecture is an example ofthe layering principle. Similar layering architecture can be used in implementingthe ORM caching solution. The ORM cache can be layered into two differentcategories: the read-only shared cache used across processes, applications, ormachines and the updateable write-enabled transactional cache for coordinatingthe unit of work [13].
Cache layering is prevalent in many ORM solutions—for example,Hibernate’s two-level caching architecture [14]. In a layered-caching framework, the firstlayer represents the transactional cache and the second layer is the sharedcache designed as a process or clustered cache.
Figure 8 illustrates the layered-cache architecture:
.jpg)
Figure 8. Layered-cachearchitecture
Objects formed in a valid state and participating in atransaction can be stored in the transactional cache. Transactions are characterizedby their ACID (Atomicity, Consistency, Isolation, and Durability) properties.Transactional cache demonstrates the same ACID behavior. Transactions areatomic in nature; each transaction will either be committed or rolled back.When a transaction is committed, the associated transactional cache will beupdated. If a transaction is rolled back, all participating objects in thetransactional cache will be restored to their pretransaction state [15]. You can implementthis behavior by using the unit of work pattern [13].
Thrashing, cache corruption, and caching conflicts should bestrictly avoided in implementing the transactional cache. Many cachingimplementations offer a prepackaged transactional cache solution, including theTreeCache implementation in JBoss. TreeCache is a tree structured, replicated,transactional cache based on the pessimistic locking scheme [16].
The shared cache can be implemented as a process cache orclustered cache [14].A process cache is shared by all concurrently running threads in the sameprocess. A clustered cache is shared by multiple processes on the same machineor by different machines. Distributed-caching solutions implement the clusteredcache; for example, the project code-named “Velocity” is a distributed-cachingAPI [1]. The clusteredshared cache introduces resource replication overhead. Replication keeps thecache in a consistent state on all the participating machines. A safe failovermechanism is implemented in the distributed-caching platform; in case of afailure, the cached data can be populated from other participating nodes.
Objects stored in the transactional cache are useful inoptimizing the transaction. As soon as the transaction is over, they can bemoved into the shared cache. All read-only requests for the same resource canbe fulfilled by the shared cache; and, because the shared cache is read-only,all cache coherency problems are easily avoided. The shared cache can beeffectively implemented as an Identity Map [13].
As shown in Figure 9, requests for the same shared-cache resourceresult in returning the same object:
.jpg)
Figure 9. Shared-cacheexample
You can use different coordination techniques to manage theinteraction between the shared and transactional cache [15]. These techniques are explained in thefollowing section on intercache interaction.
The interaction between the shared cache and the transactionalcache depends on the nature of the cached data. Read-only cached data willresult in infrequent cache communication. There are many ways to optimizeintercache communication [15].
One solution is to populate the object graph simultaneously inthe shared and transactional cache. This saves the overhead of moving objectsfrom one cache to the other. On completion of the transaction, an updated copyof the object in the transactional cache will refresh the shared cache instanceof the object. The drawback of this strategy is the possibility of a rarelyused transactional cache in the case of frequent read-only operations.
Another solution is to use the just-in-time copy strategy. Theobject will be moved from the shared cache to the transactional cache at thebeginning of a transaction and will be locked for the duration of thetransaction. This way no other thread, application or machine can use thelocked object. The lock is released on completion of the transaction and theobject is moved back to the shared cache (see Figure 10).
.jpg)
Figure 10. Intercacheinteraction
It is important to minimize the possibility of a stale or corruptcache and maximize resource utilization. The data copying between thetransactional and shared cache should also be minimized in order to increasethe cache hit rate. Because locks are effectively managed in the database,there are some concerns in implementing the same at the application level. Thisdiscussion is important but beyond the scope of this article.
Caching domain-specific dependencies is an essential butdifficult task. As illustrated in Figure 7, caching the combination of allroles and corresponding permissions for the logged-on user will populate alarge object graph. Application patterns like Domain Model and Lazy Load can beeffectively applied in caching such domain dependencies [13]. One important consideration indesigning a caching strategy is the cache size.
There is no definite rule regarding the size of the cache. Cachesize depends on the available memory and the underlying hardware viz. 32/64 bitand single-core/multicore architecture. An effective caching strategy is basedon the Pareto principle (that is, the 80–20 rule). For example, on theecommerce book portal, 80 percent of the book requests might be related to thetop 10,000 books. The application’s performance will greatly improve if thelist of top 10,000 books is cached. Always remember the principle ofdiminishing returns and the bell-shaped graph in deciding cache size. (SeeFigure 11.)
.jpg)
Figure 11. Cached dataversus performance
How much data should be cached depends on many factors such asprocessing load patterns, the number of concurrent connections/requests, andthe type of application (real-time versus batch processing). The goal of anycaching strategy is to maximize the application performance and availability.
Small caching efforts can pay huge dividends in terms of performance.Two or more caching strategies and design patterns like GOF [17], PEAA [13], and Pattern ofEnterprise Integration (PEI) can be clubbed together to implement a solidcaching platform. For example, shared demand cache coupled with a stricttime-based eviction policy can be very effective in optimizing the performanceof a read-heavy distributed system like the enterprise reporting solution.
Forces like software transactional memory (STM), multicore memoryarchitecture such as NUMA (Non-Uniform Memory Access), SMP (symmetricmultiprocessing architectures), and concurrent programming will influence thefuture of caching platforms. In the era of cloud computing, caching will play apivotal role in the design of distributed systems. An efficient caching strategywill differentiate a great distributed architecture from the good. Let yournext design be a great one.
I would like to thank Gita Gadgil for reading the draft materialand providing invaluable feedback and suggestions.
[1] http://code.msdn.microsoft.com/velocity
[2] http://jakarta.apache.org/jcs/getting_started/intro.html
[3] http://dictionary.reference.com/browse/cache
[4] http://en.wikipedia.org/wiki/Cache
[5] Peter J. Denning, “The Locality Principle, Communications ofthe ACM,” July 2005, Vol 48, No 7.
[6] http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx
[7] Michael Kircher and Prashant Jain, “Caching,” EuroPloP 2003.
[8] Nimrod Megiddo and Dharmendra S. Modha, “Outperforming LRUwith an Adaptive Replacement Cache Algorithm,” IEEE Computer, April 2004.
[9] Kalen Delaney, InsideMicrosoft SQL Server 2005: Query Tuning and Optimization, Microsoft Press,2007.
[10] L.A. Belady, “A Study of Replacement Algorithms for Virtual StorageComputers,” IBM Systems J. 5, 2 (1966), 78–101.
[11] Octavian Paul Rotaru, “Caching Patterns and Implementation,”Leonardo Journal of Sciences LJS: 5:8, January-June 2006.
[12] Clifton Nock, DataAccess Patterns: Database Interactions in Object-Oriented Applications, Addison-Wesley,2003.
[13] Martin Fowler, Patternof Enterprise Application Architecture (P of EAA), Addison-Wesley, 2002.
[14] Christian Bauer and Gavin King, Java Persistence with Hibernate, Manning Publications, 2006.
[15] Michael Keith and Randy Stafford, “Exposing the ORM Cache,”ACM Queue, Vol 6, No 3, May/June 2008.
[16] TreeCache (http://www.jboss.org/file-access/default/members/jbosscache/freezone/docs/1.4.0/TreeCache/en/html_single/index.html#introduction)
[17] Erich Gamma, Richard Helm, Ralph Johnson, and John M.Vlissides, Design Patterns: Elements ofReusable Object-Oriented Software, Addison-Wesley, 1994.
Abhijit Gadkari is anEnterprise Architect with AMG-SIU. Abhijit’s background includes an M.S. in ISfrom Claremont Graduate University, and post graduation in computer sciencefrom DOE, India. He has 10 years of software design and development experience.His work and research focuses on building SaaS-based IT infrastructure. Heblogs on his SaaS-related work at http://soaas.blogspot.com.
This article was published in the Architecture Journal, a printand online publication produced by Microsoft. For more articles from thispublication, please visit the Architecture Journal Web site.