Microsoft Data
Platform Development Technical Article
Writers: Nithya Sampathkumar, Muralidhar Krishnaprasad, Anil
Nori
Microsoft Corporation
Published: June 2008
Applies To:
- Microsoft project code named “Velocity”
- .NET Framework
- Distributed Cache
- Scale Applications
- Availability
Summary:
“Velocity” is a distributed in-memory application cache platform for developing
scalable, available, and high-performance applications. “Velocity” fuses memory
across multiple computers to give a single unified cache view to applications.
Applications can store any serializable CLR object without worrying about where
the object gets stored. Scalability can be achieved by simply adding more
computers on demand. “Velocity” also allows for copies of data to be stored across
the cluster, thus protecting data against failures. “Velocity” can be
configured to run as a service accessed over the network or can be run embedded
with the distributed application. “Velocity” includes an ASP.NET session
provider object that enables ASP.NET session objects to be stored in the
distributed cache without having to write to databases. This increases the
performance and scalability of ASP.NET applications (17 printed pages)
>>> Introduction
>Most business, consumer, and
entertainment media is now conceived and delivered in digital form. Dramatic
advances in storage technology allow us to capture and store this data
relatively inexpensively. Managing this data—organizing and accessing it,
however is becoming more of a challenge. The advances in processors, memory,
storage, and connectivity have paved the way for next-generation applications
that are data-driven, whose data can reside anywhere (that is, on the server,
desktop, mobile devices, embedded in applications) and that require access from
anywhere (that is, local, remote, over the network, from mobile devices, in
connected and disconnected mode). This naturally leads to distributed,
multi-tiered, and composite application architectures.
Distributed applications require support for a large number
of users, and high performance, throughput, and response time. Service
orientation has a low cost of service, thereby allowing the scalability and
performance to also be at low cost. Another important requirement is the
support for distribution and heterogeneity of data and applications.
Applications are composing data and business logic (for example, mashups) from
sources that can be local, federated, or cloud-based. Composite applications
require aggregated data to be shaped in a form that is most suitable for the
application. Data and logic sharing is also an important requirement in
composite applications.
In this paper, we describe Microsoft’s caching technology,
the project that is code named “Velocity.”
“Velocity”
“Velocity” is a distributed in-memory cache that provides
.NET applications with high-speed access, scale, and high availability to
application data.
.jpg)
Figure 1: “Velocity”
Key Features of “Velocity”
- >>Caches
any serializable CLR object and provides access through simple cache APIs.
- Supports enterprise scale: tens to hundreds of computers.
- Configurable to run as a service accessed over the network
or run embedded with the application.
- Supports common cache configurations.
- Supports dynamic scaling by adding new nodes.
- Configurable number of backup copies to provide high
availability.
- Automatic load balancing.
- Integration with administration and monitoring tools such
as ETW, System Center, etc.
- Provides tight integration with ASP.NET to be able to cache
ASP.NET session data in the cache without having to write it to source
databases. It can also be used as a cache for application data to be able
to cache application data across the entire Web farm.
- Follows the cache-aside architecture (also known as
Explicit Caching) for V1. That is, you must decide explicitly which
objects to put/remove in your applications and “Velocity” does not
synchronize with any source database automatically.
- Support for multiple client languages (for example, PHP,
C#, C++, etc.).
Types of Data
>In this section, we show how
application data can be broadly classified into three categories, and define
their access patterns and what they mean for caching requirements.
Understanding the
different types of data and their semantics helps to understand the different
caching needs that come with usage of that data type.
Reference Data
>Reference data is a version of the
authoritative data. It is either a direct copy (version) of the original data
or aggregated and transformed from multiple data sources. Reference data is practically
immutable—changing the reference data (or the corresponding authoritative data)
creates a new version of the reference data. Reference data is an ideal
candidate for caching; because the reference data does not change, it can be
shared across multiple applications (and users), thereby increasing the scale
and performance.
Consider a product
catalog application aggregating product information across multiple data source
applications. The most common operation on the catalog data is read (or browse);
a typical catalog browse operation iterates over a large amount of product
data, filters it, personalizes it, and then presents the selected data to the
users. Key-based and query-based access is a common form of operation. Caching
is a critical requirement for catalog access because if not cached, operations
against such an aggregate catalog require the operations to be decomposed into
operations on the underlying sources, invoke the underlying operations, collect
responses, and aggregate the results into cohesive responses.
Accessing the large
sets of source data for every catalog operation can be prohibitively expensive,
and can significantly impact the response time and throughput of the
application. Caching the source product data closer to the catalog application
can significantly improve the performance and the scalability of the
application.
Referenced data is
refreshed periodically, usually at configured intervals, from its sources, or
refreshed when the authoritative data sources change. Access to reference data,
although shared, is mostly read. Local updates are often performed for tagging
(to better organize the data). To support a large scale, reference data can be
replicated in multiple caches on different computers in a cluster. As mentioned
above, reference data is easy to cache, which provides high scalability.
Activity Data
>Activity data is generated by the
currently executing activity as part of a business transaction. The data
originates as part of the business transaction and, at the close of the
business transaction, it is retired to the data source as historical (or log)
information.
Consider the
shopping cart data in an online buying application. There is one exclusive
shopping cart for each online buying session. During the buying session, the
shopping cart is cached and updated with selected products. The shopping cart
is visible and accessible only to the buying transaction. Upon checkout, once
the payment is applied, the shopping cart is retired from the cache to a source
application for further processing. Once the business transaction is processed
by the source application, the shopping cart information is logged for auditing
and historical purposes.
While the buying
session is active, the shopping cart is accessed both for read and write but it
is not shared. This exclusive nature of the activity data makes it suitable for
distributed caching. To support large scalability of the buying application, the
shopping carts can be distributed across the cluster of caches. Because the
shopping carts are not shared, the set of shopping carts can be partitioned
across the distributed cache.By
dynamically configuring the distributed cache, the degree of scale can be
controlled.
Resource Data
>>Both reference (shared read) and activity
(exclusive write) data are ideal for caching, but not all application data
falls into these two categories. There is data that is shared, concurrently
read and written into, and accessed by a large number of transactions. Consider
an inventory management application. The inventory of an item includes the
description of the item and the current quantity. The quantity information is
authoritative, volatile, and concurrently accessed by a large number of users
for read/write. Such data is known as the resource data; the business logic
(the order application logic) runs close to the resource data (such as quantity
data). Resource data is usually collected in data stores. For performance reasons,
in “Velocity,” it is cached in the application tier.
Although caching
the quantity data in memory on a single computer can provide performance
improvements, a single cache cannot provide availability or scalability when
the order volume is high. For this purpose, quantity data can be replicated in
multiple caches across the distributed cache. However, because quantity
information is shared and concurrently updated, the consistency of quantity
values needs to be maintained across the distributed cache.
Maintaining
consistency in distributed caches requires data replication, transactions, data
change notifications, and invalidations. However, support for high performance
and scale requires relaxed notions of consistency and transactions. For example,
the order application may over-sell the item (above the available quantity) in
order to allow concurrent updates to quantity for increased scale and
performance. Typically, such conflicts are resolved by vendor intervention.
Scenarios
In this section, we discuss couple of real-world scenarios
and the category of data that they are dealing with and their caching
requirements.
>Social Networking Site
Consider a social networking site that people use for
keeping up with friends. The site allows the uploading of an unlimited number
of photos, sharing links and videos, and learning more about the people they
meet. Such a site can accelerate performance by caching users and friend lists.
User names and friend lists fall under reference data, making it very attractive
for caching.
.jpg)
>Figure
2: Use of cache in a social networking site
Enterprise LOB
Consider the use of distributed cache in a large enterprise
for both scalability and the availability of business data. Assume that it is
an enterprise that provides a Web-based invoice (payment, vendor analysis) and
expense management system for large enterprises, with a focus on litigation
expense management. Vendor analysis is an infrequent and periodic operation
that enterprises perform. Once the desired vendor catalogs are created, they
can be cached as reference data so that purchase and payment processes can gain
performance benefits. The vendor catalogs include information like vendor
names, offered services, price etc. The purchase process includes activities like
order submission, invoicing, and payment. The data in these activities forms
activity data, where the data is primarily accessed and manipulated by a
specific business transaction. Again, such data can be cached for performance,
scale, and availability
Vendor sourcing and analysis is involves aggregating data
from multiple vendor sources; the aggregated data is cached and order, invoice,
and payment data is also cached while these business operations are active.
.jpg)
Figure 3: Use of
cache in an enterprise scenario
>>Velocity Concepts
In this section we look at the concepts that will be useful
when programming with “Velocity.”
>Logical Hierarchy
Named caches and regions are the two basic building blocks
of “Velocity”. A named cache is used for storing a logical grouping of data
together. Regions are logical groupings of objects within named caches.
When working with “Velocity,” a user can run multiple
processes that can host the “Velocity” instance, called cache hosts. The cache
hosts can access named caches that can be stored across all the cluster nodes.
The named cache consists of regions and regions store cache items. This
represents the “Velocity’” logical hierarchy.
.jpg)
Figure 4: “Velocity”
logical hierarchy
Named Cache
You can think of named cache as equivalent to a database. A
named cache is used for storing a logical grouping of data. Also, all physical
configurations and cache policies such as failover, expiration, eviction and so
forth, are specified at this level. In a distributed cache scenario, all the
applications that need to communicate to the same distributed cache must
instantiate the same named cache.
A cluster or node can contain one or more named caches. An
application may use one or more named caches based on the policies for the
various caches. For example, activity data may be stored in a partitioned named
cache that is partitioned while a reference data (a catalog for example) may be
stored in a named cache that is replicated.
Region
Regions are logical grouping of objects within a named
cache. You can think of regions as equivalent to tables, although regions can
also store arbitrary sets of key value pairs. Items within a region are
guaranteed to reside on a single node and are the logical units for replication
and node placement. A named cache consists of one of more regions.
An application is not required to use regions and can use
the put/get/remove APIs using only the key to the object. In fact, the
application will scale better when not using regions because the keys can be
distributed across the named cache. If no region is specified, the system
automatically partitions the keys into multiple implicitly created regions.
Cache Item
A cache item represents the lowest level of caching that
contains the object to be cached along with the key, the object payload, tags,
and the time to live (TTL), the created timestamp, the version, and other
internal bookkeeping information. A region contains one or more of these cache
items.
The following is a code example that shows the creation of a
named cache and region.
// CacheFactory class provides methods to return cache objects
// Create instance of CacheFactory (reads appconfig)
CacheFactory fac = new CacheFactorc();
// Get a named cache from the factory
Cache catalog = fac.GetCache("catalogcache");
//-------------------------------------------------------
// Simple Get/Put
catalog.Put("toy-101", new Toy("thomas", .,.));
// From the same or a different client
Toy toyObj = (Toy)catalog.Get("toy-101");
// ------------------------------------------------------
// Region based Get/Put
catalog.CreateRegion("toyRegion");
// Both toy and toyparts are put in the same region
catalog.Put("toyRegion", "toy-101", new Toy( .,.));
catalog.Put("toyRegion", "toypart-100", new ToyParts(…));
Toy toyObj = (Toy)catalog.Get("toyRegion", "toy-101");
>Cache Concepts
We will be referring to primary and secondary nodes often
when we talk about project “Velocity.”
>Primary
Node
In “Velocity,” we
refer to the node where a region is located as the primary node for that
region. All access to this region will be routed to the primary node for that
region.
>>
Secondary Nodes
If the named cache
is configured to have “backups” for high availability, then one or more other
nodes are chosen to contain a copy of this data. Such nodes are called
secondary nodes for that region. All changes made to the primary node are also
reflected on these secondary nodes. Thus if the primary node for a region
fails, the secondary node can be used to retrieve the data without having to
have logs written to disk.
Cache Types
Project “Velocity” supports the common cache types –
partitioned cache, replicated cache and local cache. Depending on the type of
data, applications can choose the appropriate type of cache.
>>>
Partitioned Cache
In partitioned cache, regions are partitioned among all of
the available nodes on which the named cache is defined. The combined memory of
all the computers across the cluster can be used to cache data, thus increasing
the amount of memory available to the cache. All reads and writes to a region
are directed to the node that contains the primary copy of the region.
>
Scale
Partitioned cache can be used to
achieve a desired scale.
The figure in the next section
shows how Put and Get operations work in the partitioned cache. In the case
where the client performs a Put operation on the cache, the Put assigns the
value “v2” for key “K2”. The routing layer component in Cache1 determines that
the key “K2” really belongs to Cache2 and routes the request to that cache
host. Similarly, Get requests for the same key also get routed to Cache2.
Velocity also supports the configuration where routing layer can be part of the
client.
.jpg)
Figure 5: Routing in
a partitioned cache
Applications can increase scale in
this partitioned cache model by adding more computers. New computers can be
added to achieve two goals:
When new computers are added to the
cluster, automatic load balancing occurs and some partitions on existing nodes
get migrated to the new computers. This results in keys being distributed
across all those computers. This results in the access requests being routed to
more computers now, resulting in increased throughput
When new computers are added, there
is more memory to store more data. Applications can now easily scaled by
increasing the data being cached.
>
Availability
When using a partitioned cache, it
possible to specify a number of nodes as secondary caches. This allows the same
data to be stored on multiple computers. When a primary node fails, one of the
secondary nodes becomes the primary node, enabling applications to continue
accessing the data that was stored on the computer.
The following example shows how
adding an object and retrieving it will work for a partitioned cache with two
secondary caches configured. Take the case where the client is sending a
request to put the value “v2” with key “K2” to Cache1. The routing layer in
Cache1 determines that the key “K2” belongs to Cache2 and routes it
appropriately. Cache2 performs the operation locally and, in addition, sends it
to two other secondary nodes. It waits for an acknowledgement from the
secondary nodes that they have received the message and then acknowledges the
success of the operation back to Cache1. Cache1 then relays the success back to
the client.
A Get operation behaves the same
way as it does in a partitioned cache without secondary caches; the operation
gets routed to the appropriate primary node.
.jpg)
Figure 6: Routing in
a partitioned cache with secondary caches
Replicated Cache
Data is replicated across all nodes of the named cache. This
configuration is very useful for reference data where the access pattern is
mostly reads. This allows scaling by increasing the throughput.
The following example shows how adding an object and
retrieving works for a replicated cache.
The client sends a Put request with key “K2” and value “v2”
to Cache1. Cache1 routes it to the “primary” node for handling key “K2” which
is Cache2. Cache2 performs the write locally and acknowledges back to Cache1.
Cache1 returns the success of the operation back to the user. Cache2 meanwhile
asynchronously propagates the change to all the hosts in that named cache.
A Get request in a replicated cache is always handled
locally.
.jpg)
Figure 7: Routing in
a replicated cache
Local Cache
Applications can also maintain a local cache in the
application process space for frequently accessed items. In the local cache,
payload is kept in the object form. This helps applications save the
deserialization cost as well as the network hop to the primary computer,
resulting in increased performance.
.jpg)
>Figure
8: Routing when using a local cache
Cache Clients
There are two types of cache clients: simple and routing.
The simple client has no routing capabilities and does not
track where each cached object is stored. A simple client is configured to
contact only one cache host in the cluster. If the simple client requests an
object from a cache host that is not located in its memory, that particular
cache host retrieves the object from the cluster and then returns it to the
simple client.
The routing client contains a routing table to keep track of
cached object placement across all cache hosts in the cluster. Because the
routing client keeps track of where each of the cached objects are, it can make
requests directly to the cache host that stores the object in memory.
The diagram below shows a routing client used with a
partitioned cache.
.jpg)
>Figure
9: Routing client used with a partitioned cache
Expiration and Eviction
>“Velocity” ensures the validity of
cache items by supporting expiration and eviction. When adding objects to the
cache, it is possible to optionally specify a time to live (TTL). Expired
objects are removed from the cache.
Applications can
specify a “high watermark” that determines when eviction kicks in. “Velocity”
supports the least-recently-used algorithm for eviction.
Consistency Models
>“Velocity” support both weak and
strong consistency models of access for the different cache types.To support strong consistency, cache
operations (e.g. Get, Put) are performed synchronously before returning to the
client; in a weak consistency access, cache operations return to the client
eagerly and the consistency is maintained in an asynchronous manner.
Concurrency Model
“Velocity” supports both optimistic version-based updates
and pessimistic locking.
Optimistic
Version-based Updates
This model is based on the cache associating a version with
each cache item. Any update to the object increments the version internally.
When an application retrieves the cache item, its version is also returned. An
update of the cache item succeeds only if the version of the passed-in cache
item is the same as the one stored in the cache.
>>>Pessimistic
Locking
An application can
take a lock when retrieving an object from the cache. When using the GetandLock
method, a lock handle is returned along with the object. Any further calls to
GetandLock for that object will fail. However, regular Gets will succeed.
Using PutAndLock,
an object is updated in the cache and the lock is released. Regular Puts remove
the lock and overwrite the object. Also, the UnLock method explicitly unlocks
the object.
Notifications
>In a distributed architecture,
because many clients work with the same data, it may be useful for a client to
know when a cache item is changed by another client. Notifications can be set
on a region or a cache item level. When set, a notification event is generated
for any change (add, put, remove) to the region or cache items.
Deployment Topology
“Velocity” can be deployed as a service or embedded within
the application
Cache Service
In the cache service deployment model, “Velocity” runs as a
separate service that can be accessed from client applications running on the
same computer or on other client computers. Applications can access the
“Velocity” cache through the “Velocity” client APIs. The client API can either
use TCP/IP or pipes (when running on the same computer as the service) to
communicate with the service.
>Embedded
Mode
In the embedded deployment model, “Velocity” runs embedded
within the hosting application. The hosting application maybe an ASP.NET
application (see the ASP.NET Deployment Configurations) or other .NET
application. In the embedded case, the distributed cache runs as part of the
application (or container, to be more precise) and the memory for the cache is
shared with that of the application.
.jpg)
>Figure
10: Deployment topologies with “Velocity”
ASP.NET Integration
>>“Velocity” provides a SessionStoreProvider
class that plugs into the ASP.NET session storage provider model and stores
session states in a partitioned cache.
Using “Velocity” to
store session data enables non-sticky routing and ensures that session
information is available across the cluster. ASP.NET applications can scale by
adding more nodes and also by configuring secondary caches for the session data
and thereby increasing availability.
Performance Testing of Velocity
We created a test application to see how “Velocity” scales.
The application consisted of simulating lots of users on many distributed
client computers. Each user thread performs operations like Get, Put, and
Remove on a 20K payload. We plotted the throughput as the load gradually
increased and also as new servers were added.
The graph below shows the benefits of caching very clearly.
The throughput increases as the load increases until it reaches saturation and
stabilizes. At this point, if we add more servers, load balancing kicks in and
the throughput increases again and the response time goes down.
.jpg)
Figure 11:
Performance graphs when using “Velocity”
Conclusion
“Velocity” can help scale your .NET applications easily and
cheaply by allowing you to bring the memory capacity of multiple computers into
a single unified cache. In addition, it also helps in dynamically scaling your
application as your application needs grow by allowing the dynamic addition of
computers to the cluster. “Velocity” can store all your application
data—reference, activity or resource data—while giving you the performance,
scale, and availability that you need.