Windows Server AppFabric Caching Capacity Planning Guide

Article
11/30/2011

Jason Roth, Rama Ramani, Jaime Alva Bravo

March 2011

This white paper provides guidance for Windows Server AppFabric Caching capacity planning.

Introduction
Evaluating AppFabric Caching Performance
Capacity Planning Methodology
1. Step One: Understand bottlenecks and identify caching candidates
2. Step Two: Evaluate current workload patterns
3. Step Three: Understand physical infrastructure and hardware resources
4. Step Four: Finalize the required performance SLA for all applications
5. Step Five: Identify appropriate AppFabric features and configuration settings
Capacity Planning Tool
Capacity Planning Next Steps
Monitoring Ongoing Caching Capacity Requirements

Introduction

Windows Server AppFabric caching provides a distributed, in-memory cache cluster. This paper provides capacity planning guidance for an on-premises deployment of Windows Server AppFabric Caching.

The architecture of Windows Server AppFabric Caching is described in-detail in the documentation. For more information, see Windows Server AppFabric Caching Features. In summary, an AppFabric cache cluster is made up of one or more cache server (also referred to as cache hosts). Caches are distributed across the cache hosts and stored in-memory.

Note

Note that there is also a Windows Azure Caching release for using caching in the cloud. Some of the steps in this paper regarding estimating memory requirements will apply to a cloud-based solution, but this paper specifically focuses on the on-premises cache cluster scenario. For more information on using the Azure caching feature, see Windows Azure Caching.

The information in this paper is based on planning work that Microsoft has done with customers. All AppFabric caching customers ask a common question: “How many servers are required for my scenario?” In many of these discussions, we start off with a typical answer of, “It depends.” We then quickly drill down into various details and eventually arrive at a good starting point. During this process, other more specific questions emerge:

How much caching memory do my applications require?
How many machines should I use in a cache cluster?
How much memory should be on each machine, and how should they be configured?
How does network bandwidth affect the cache cluster performance?

The goal of this white paper is to capture the lessons learned from this type of customer discussion, with the goal of providing a methodology that can be used for your capacity planning.

If you’re reading this paper, you might fall into one of several development phases:

You are just starting to look at distributed caching and do not have any drill down data on workload mix, performance requirements, or the server deployment topology. At this point, you might want to start by looking at some performance data of AppFabric caching.
Or, you have already investigated the features and performance of AppFabric caching. Now, you want help to look closely at your specific scenario and high-level requirements. This is where you get into the details of capacity planning.
Finally, you could already be in production. At this stage, you want to understand how to analyze the performance data to ensure that you have sufficient capacity. In addition, you also want to plan for future increases in workload, knowing the key indicators and best practices based on the runtime behavior of the AppFabric cache.

This white paper has been structured to address these phases in a sequential manner. If you are just evaluating AppFabric performance, we point to a comprehensive white paper written by our partner Grid Dynamics. Here, we will group the data from that white paper in an easy manner such that it facilitates your evaluation and informs the capacity planning process.

If you are ready to go through the capacity planning process, we provide a set of steps to formalize a methodology that you can apply to your scenario.

If you’re using or testing a cache cluster, you can validate the capacity planning process by looking at a key set of performance indicators to ensure that you have the right capacity. And in addition, we will discuss some best practices.

Evaluating AppFabric Caching Performance

Our partner, Grid Dynamics, recently completed a series of tests on the performance of AppFabric caching. They published their results in the following white paper: Windows Server AppFabric Cache: A detailed performance & scalability datasheet.

Each test focused on a specific variable, such as the size of the cache or the number of servers in the cache cluster. The Grid Dynamics study can be used to evaluate the performance and scalability of AppFabric caching. You can compare throughput and latency numbers of a wide range of testing scenarios and topologies with your application requirements. Typically, in each test, only one or two parameters were varied, in order to focus on their performance impact. The entire set of parameters includes:

Load pattern	Cache usage pattern, i.e. percentage of 'Get', 'Put', 'BulkGet', 'GetAndLock', 'PutAndUnlock' operations
Cached data size	Amount of data stored in cache during the test
Cluster size	Number of servers in the cache cluster
Object size	Size of objects after serialization that are stored in the cache
Type complexity	Different .NET object types such as byte, string[], etc that are stored in the cache
Security	Security settings of the cache cluster

In addition to verifying the performance and scalability of AppFabric caching, Grid Dynamics has provided the test harness to enable you to repeat the tests with your own data and workloads. This is yet another option to evaluate caching performance for your specific scenario.

Although we highly recommend viewing the entire study and its conclusions, here is a summary of a few of the study’s findings that will inform best practices discussed throughout the remainder of this paper:

AppFabric caching linearly scales as more machines are added to a cache cluster.
Cache size has low impact, except for large caches with high percentage of writes. Among other factors, high write workloads put greater pressure on .NET garbage collection when the size of the managed heap is large.
High type complexity only affects client-side performance due to serialization.
Bulk get calls result in better network utilization.Direct cache access is much faster than proxies (ASP.NET, WCF), but this is due to the performance of the intermediate layer rather than the caching performance.
Pessimistic and optimistic locking performs similarly, so you should use the technique that is best for your application design.Both latency and throughput improve when the conflict ratio decreases.
Cache cluster security does decrease performance, and it is enabled by default. The highest throughput and lowest latency are achieved when security is turned off, but this may be unacceptable due to the sensitivity of data and business requirements.
Network bottlenecks are reduced by using dedicated network between application servers and cache servers.

Note that the Grid Dynamics paper is a good place to start an evaluation of AppFabric caching, and additionally, it also contains raw data and observed patterns that can be used to inform the capacity planning process.

Capacity Planning Methodology

Once you decide that your application could benefit from a distributed in-memory cache, like AppFabric caching, you enter the capacity planning stage. Although it is possible to perform some capacity planning steps through direct testing against an AppFabric cache cluster, you could be required to create an estimate without this type of testing. That is the focus of this section. The following steps provide a systematic way of thinking through your AppFabric caching requirements:

Understand bottlenecks and identify caching candidates
Evaluate current workload patterns
Understand physical infrastructure and hardware resources
Finalize the required performance SLA for all applications
Identify appropriate AppFabric features and configuration settings

This paper provides examples of these steps by examining the needs of a sample online store application. However, caching can be used by any type of .NET application, and you can have more than one application accessing the same cache cluster. In this scenario, you should perform the steps below for each application and aggregate the results to get an accurate estimation for capacity.

Step One: Understand bottlenecks and identify caching candidates

First identify the data you want to cache. This is achieved by using performance analysis tools such as SQL Server Profiler, Performance Monitor, Visual Studio testing, and many others. These tools can identify frequently accessed database objects or slow calls to web services. The data sets returned from these backend stores are potential candidates for caching. Storing this data temporarily in the cache can improve performance and relieve pressure on the backend data stores.

Once you identify potential caching candidates, it is a useful exercise to classify those objects into one of three main categories: Activity data, Reference data, and Resource data. These are best explained with examples.

Activity data contains read/write data related to an individual user. For example, in an online store, a user’s shopping cart is activity data. It applies to the user’s current session and might change frequently. Although it is important to maintain high availability for the shopping cart, it is data that doesn’t necessarily require permanent storage in a backend database. Due to its temporary characteristics, activity data is a logical candidate for caching.
Reference data is read-only data that is shared by multiple users or by multiple application instances. It is accessed frequently but changes infrequently. For example, in the online store example, the product catalog is reference data. The catalog would be valid for one or more days but might be accessed thousands of times by different users. This type of data is also a great candidate for caching. Without some type of caching, each user that views the product catalog requires accessing the data from the database. Using caching can alleviate the pressure on the backend database for repeated requests to semi-static data. Due to its permanent characteristics, it is also a logical candidate for caching.
Resource data is read/write data that is shared across users. For example, a support forum contains this type of data. All users can read a reply to forum posts.

Because different types of cached data will have different usage-patterns, classifying the data into these categories can be helpful. For example, identifying an object as reference data automatically suggests that it is more of a read-only workload. It also can help in determining expiration policies, which tend to be shorter for data that changes more frequently. For development, these logical divides can suggest areas that can be encapsulated in the code.

It is recommended to create estimates for individual objects and then aggregate that data. The following table shows an example of the information collected at this stage:

Object to Cache	Caching Categorization	Permanent Storage Location
Shopping Cart Object	Activity Data	None
User Preferences Object	Activity Data	Backend Database
Product Catalog	Reference Data	Backend Database
User Forum Thread	Resource Data	Backend Database

When identifying candidate data for caching, you are not required to find data to cache in each category. You might find that performance and scalability can be improved by simply caching your application’s shopping cart. The important point at this step is to use available information to identify the best items to cache.

Step Two: Evaluate current workload patterns

Once you determine the appropriate data to cache, you then must understand how the application currently accesses the data and the associated workload patterns. At the end of this step, you should be able to arrive at a raw estimate of your caching memory requirements. You should also have a better understanding of how data is accessed and used, which will be important in later steps.

For example, if you target your product catalog as a candidate for caching, you should explore when the application retrieves catalog data and how often these requests are made. Based on the previous step, you know that this is reference data, and therefore, it is a primarily read-only workload. A thorough understanding of workload patterns for different objects will guide future capacity decisions. Let’s take a closer look at what is involved in this step.

There are several ways to get a better understanding of your current data-access patterns:

Thoroughly review the code to see where the data is accessed and the frequency of that access.
Use a code profiler that can provide the method call frequency and associated performance data.
Create instrumentation in the code around specific data access sections. Log the data access attempt and the associated performance of the data operation.
Use a database profiler, such as SQL Server Profiler, to observe the number and duration of database operations.

Note that many of these techniques could have been used in the previous step to determine what data to target for caching. However, at this stage you are interested in more detailed numbers that can be used in future capacity planning calculations.

Part of this evaluation involves understanding the ratio of reads to writes. The read/write workload can influence later decisions about capacity. For example, a workload with a high percentage of writes will trigger more .NET garbage collection. This is discussed in a later step.

Another factor is the frequency of reads and writes during peak load. The following table shows sample data collected at this stage for the example shopping cart object.

Object to Analyze	Shopping Cart
Reads %	50%
Writes %	50%
Read Operations per Second (Max)	250
Write Operations per Second (Max)	250

Again, this same analysis should be done for each object type. Different object types will have different access patterns and different maximum reads and writes per second under load.

To understand the caching requirements, you must have an estimate for the maximum number of active objects of each type in the cache at any given time. This estimate involves a balance between the frequency of object inserts and the life-expectancy of those objects. This is best understood using an example.

In the ASP.NET web application example, operational data may show that there are peak times of 25000 concurrent users. Each user requires session state information, so that is 25000 cached objects. But the expiration of these objects could be set to thirty minutes. During peak times, operational data could show that there are 5000 new users an hour, meaning that 2500 new users could arrive during the thirty minute expiration period. In addition, some users may close their browser and launch a new session. Although they are the same user, they are now using a different session. So additional padding should account for this. Finally, any anticipated growth over the next 6-12 months should be planned for. In the end, the calculation for the maximum number of active object in the cache might look like this:

Object to Analyze:	Shopping Cart
Peak Concurrent Users	25000
New Users During Expiry Period (30 minutes)	2500
Existing Users Starting New Browser Sessions	250
Future Growth Estimate (25%)	6940
Total Active Objects (Max):	~35000 Max Active Objects

If there are changes to inputs, such as the expiration period, then this will change how many unexpired objects reside in the cache during peak load. This is just an example of the thought process. Other object types may have different patterns and different variables that go into the calculation. For example, if a shared product catalog is valid for an entire day, the maximum number of product catalog objects in the cache during the day should be one.

However, knowing the maximum number of objects in the cache only helps if you also know the average object size. Inherently, this is a hard problem. In the shopping cart example, one user may put a single item into their cart while another user could have ten or twenty items. The goal is to understand the average. Like most numbers in this process, it is not going to be perfect, but the end result is a well thought-out starting point for your cache cluster rather than a simple guess.

Objects are stored in the cache in a serialized form. So to understand the average object size, you have to calculate the average serialized object size. AppFabric uses the NetDataContractSerializer class for serialization before storing the items in the cache. To determine the average object size, add instrumentation code to your application that serializes your objects and records their serialized size.

The following code example attempts to estimate the average size of a single object. The object being serialized is named obj. The length variable is used to record the length. If there are any problems getting the size with the NetDataContractSerializer, the BinaryFormatter is used instead. You can wrap this in a method for easier use. In that case obj would be passed in as a parameter and length would be returned from the method.

// requires following assembly references:
//
//using System.Xml;
//using System.IO;
//using System.Runtime.Serialization;
//using System.Runtime.Serialization.Formatters.Binary;
//
// Target object “obj”
//
long length = 0;

MemoryStream stream1 = new MemoryStream();
using (XmlDictionaryWriter writer = 
    XmlDictionaryWriter.CreateBinaryWriter(stream1))
{
    NetDataContractSerializer serializer = new NetDataContractSerializer();
    serializer.WriteObject(writer, obj);
    length = stream1.Length; 
}

if (length == 0)
{
    MemoryStream stream2 = new MemoryStream();
    BinaryFormatter bf = new BinaryFormatter();
    bf.Serialize(stream2, obj);
    length = stream2.Length;
}

Note

Note that if you had a test cache cluster setup, you could add items to the cache and use the Windows PowerShell command, Get-CacheStatistics, to find the size of one or more objects added to the cache. Or you could divide the size of multiple objects in the cache by the count of objects in the cache. You can choose between collecting estimations through code or through testing.

If you are planning to use AppFabric caching for session state, it is important to understand that the SQL Server provider for ASP.NET session state always uses the BinaryFormatter instead of the NetDataContractSerializer. At times, objects serialized by the NetDataContractSerializer class can be several times larger than the serialized size of objects that use the BinaryFormatter. This is important if you’re looking at the size of session state objects in SQL Server, because you cannot use that size and assume it will be the same in the cache. You could multiply that size by six to get a rough estimate. To get a better estimate, use the code above on the objects stored in session state.With the data collected to this point, you can start to formulate total memory requirements. This step involves the following subtasks:

Focus on an object type (for example the Shopping Cart object).
For that object, first take the average serialized object size.
Then add 500 bytes to the average size to account for caching overhead.
Multiply that size by the maximum number of active objects. This gives you the total caching memory requirements for that object type.
Add an estimate overhead for internal caching structures (5%).
Repeat these steps for each identified object type.
Aggregate the results for total caching memory requirements.

In this process, note that there is approximately .5 KB of overhead per object that you should add to the size estimates. Also, there are other internal data structures that require memory in the cache. Regions, tags, and notifications all require additional memory to manage. In our example calculations, we add 5% of the total to account for these internal structures, but it could be more or less for you depending on the extent to which you use these caching features.

At this point, you should consider the impact of one specific AppFabric caching feature, high availability. This feature creates copies of cached items on secondary servers. This means that if a single cache server goes down, the secondary cache server takes over and the data is not lost. If you choose to use high availability, then you must double the memory estimates. You must also use Windows Server 2008 Enterprise edition or higher. Note that the high availability feature is made at the named cache level. This means that if you were to create two named caches in the same cluster, you would not necessarily have to use high availability for each cache. The application could put some items in the named cache with high availability and some in the cache without high availability. This can help you make the most of your memory resources. So it is important to know the decision on high availability, because it doubles the memory requirements for the caches that use it.

As an example, here is a table where we’re evaluating the requirements for Activity data and Reference data in the same application. Depending on your scenario, you could choose to do this estimate at the object level or the application level. You would just add more columns to this example and label them appropriately.

Object to Analyze:	Activity Data	Reference Data
Average Serialized Object Size:	250 KB	60 KB
Cache Cluster Overhead per Object:	.5 KB	.5 KB
Adjusted Average Serialized Object Size:	250.5 KB	60.5 KB
Max Active Objects:	~35000	~68000
Caching Memory Requirements:	8.4 GB	3.9 GB
High Availability Enabled?	16.8 GB	No
Internal Data Structures Overehead (5%):	.8 GB	.2 GB
Total Memory Requirements:	17.6 GB	4.1 GB

The aggregation of these estimates forms the initial memory requirements for the cache cluster. In this example the total is 21.7 GB. With this estimate, you can now begin to look at other considerations, including physical infrastructure, SLA requirements, and AppFabric cache configuration settings.

Step Three: Understand physical infrastructure and hardware resources

It is important to understand your physical infrastructure and the availability of resources. Here are some common questions:

Will you be able to provision physical machines or virtual machines?
If you have existing hardware, what are the machine configurations (for example, 8 GB RAM, Quad-Core)?
Will the cache cluster be located in the same datacenter as the application servers?
What are the networking capabilities?

If you are considering using virtual machines for the cache servers, there are several considerations. For example, you must consider the impact that having multiple virtual machines on the same physical machine has. Multiple virtual machines may share the same network card, increasing the possibility of network bottlenecks. Also high availability might be setup between a primary and secondary cache host that are virtual machines on the same physical machine. If that physical machine goes down, then there is no copy of the data left. For more information, see Guidance on running AppFabric Cache in a Virtual Machine.

For existing machines, there are no recommended specifications. However, for large cache clusters, we have seen 16GB RAM, Quad-Core machines work well. In general, planning for the correct amount of physical memory and network load are the most important considerations.

For both physical and virtual machines, you should note the location of the cache cluster to the application or web servers that use the cache. If they are in separate data centers, make sure that the latency between those datacenters will not negatively affect your performance. At this stage it might be tempting to use your application or web servers for your cache servers. Although possible, this is not supported. First, any resource usage spikes by services such as IIS on those machines could impact the cache cluster. Second, the caching service assumes that it is on a dedicated server and can potentially use more much more memory than you specify.

For networking capabilities, you should evaluate your expected network load and compare it to your infrastructure. First, you have to know how much data each cache server is expected to handle and whether the network card capabilities are sufficient. If not, you could require more cache servers. For example, consider a scenario where the average cached object size is 500.5 KB and there are 240 operations per second on the cache cluster. If a single cache host were used, the results would be as follows:

Number of object reads/writes per second:	240
Number of machines in the cache cluster:	1
Number of cache operations per machine per second:	240
Average object size:	500.5 KB
Size of data transmitted per second:	240 * 500.5 = 117.3 MBps

If each machine has a 1Gbps network card, then the maximum throughput is around 119 MBps. The calculated value of 117.3 MBps would likely overwhelm that one server. This makes it much more likely that the network will become a bottleneck. However, if three machines were used in the cache cluster, then an even distribution of cache requests results in each server getting 1/3 of that load.

Consider also the network utilization of the application servers that access the cache cluster. If they exchange a large volume of data with other systems, you should consider creating a dedicated network between the application servers and the cache cluster. This requires mounting an additional network card on each application server for this purpose.”

The final network consideration is whether the network can handle the required load along the entire path. Just having a 1Gbps network card on each cache server is not enough. The switch and other points along the network must also be able to handle the load. Work with operations to meet this requirement.

Step Four: Finalize the required performance SLA for all applications

Before you decide on a final configuration, you must also understand any business requirements, including performance and reliability service level agreements (SLAs). In a practical sense, this step influences the decision on the number of cache clusters and the number of cache hosts in each cluster.

For example, if you have a mission-critical application using a cache cluster, it’s possible that you would want to isolate that cache cluster from other lower-priority applications. Those lower-priority applications could potentially use more memory, network, or CPU resources that would negatively affect the mission-critical application.

Here are specific factors that influence this decision:

Security is maintained at a cache cluster level. If a user has access to the cache cluster, that user can potentially access any of the data on the cache cluster (assuming that user knew the cache name and key to request). If you require different security settings for different types of data, separate cache clusters can make sense. For more information, see Managing Security.
Eviction of non-expired items occurs when memory reaches the high watermark level. Underestimating the amount of memory necessary for one cache could affect all caches on the cluster. When eviction happens due to memory pressure, it occurs across all caches, even if only one cache is responsible for the memory pressure. This can be somewhat mitigated by creating non-evictable caches, but this must be done carefully. For more information, see Expiration and Eviction.
Manageability can be easier with separate cache clusters to a point. You might not want to manage separate clusters for 100 different caches. But it might be useful to have separate cache clusters for 2 different large caches, because you can manage, scale, and monitor them separately.

Finally, some requirements may involve considerations of latency and throughput. To inform this decision, see the Grid Dynamics whitepaper for both guidance and test results. For example, you can compare your throughput requirements for the cache cluster with the published test results from the Grid Dynamics paper. Their study could indicate that you have too few servers for your throughput goals. It is important to recognize that this study may not exactly match your object types, your object sizes, and your physical and logical infrastructure; however, it still provides proven test results that can help you to make an informed decision.

Step Five: Identify appropriate AppFabric features and configuration settings

This part of the process considers specific configuration settings of AppFabric caching as well as the architecture of an AppFabric cache cluster. Even with all of the right empirical and business data collected, you can still plan incorrectly by ignoring these AppFabric caching settings.

The following table lists a variety of AppFabric caching features and associated considerations for capacity planning.

Regions	You can create multiple regions, but each region exists on a single cache host. To take advantage of the distributed cache, applications must use multiple regions. Note that all calls that do not specify regions automatically use default regions internally. Each cache host in the cache cluster must be able to host the maximum size of the largest region.
Notifications	Caches can enable notifications and cache clients can receive those notifications. This adds additional network traffic and processor utilization on the server side. The effect varies depending on the notification interval and the number of notifications sent. High numbers of notifications at very short intervals could have an impact on performance and scalability. There are not firm guidelines for estimating this impact, so it will have to be observed during testing.
Local Cache	Local cache improves performance by caching objects on the clients as well as the server. Consider the memory impact on client machines when using local cache. This feature has no bearing on the server side capacity planning for memory, because all locally cached items also exist on the server. However, if notifications are used for invalidation of the local cache, then the server side could be affected by the notification processing.
Client-side Application Design & Configuration	Client-side application design can affect your overall performance. For example, you should store any DataCacheFactory objects you create for reuse rather than recreating them for each call. You also may benefit from creating one DataCacheFactory object per thread. Or if you share a single DataCacheFactory object for multiple threads, consider increasing the MaxConnectionsToServer setting. This increases the number of connections to the cache servers per DataCacheFactory.
High Availability	As already stated, caches that use High Availability require double the calculated memory requirement. But this feature also requires a minimum of three servers. If one server goes down, there must be two remaining servers to support primary and secondary copies of each item after a failure. Note that this feature also requires Windows Server 2008 Enterprise edition or higher on all servers.
Named Caches	At this time, there is a limit of 128 named caches. This becomes a capacity planning decision when you have applications that require more than this set number. At that point you require more than one cache cluster or your applications must be designed to use less caches. Another strategy is to programmatically create regions within named caches.
XML Configuration Store	When you use a shared network location for your cache cluster configuration store, you should have at least three servers in the cluster that are all designated as Lead Hosts. For more information about the reasons, see Updating Cache Servers.

It should not be a surprise that the one of the most important settings to understand is the memory settings on each cache host. You can view the default memory settings on each cache host by using the Windows PowerShell command, Get-CacheHostConfig.

Note

For information about how to use the caching Windows PowerShell commands, see Common Cache Cluster Management Tasks.

The following screenshot shows the output from Get-CacheHostConfig on a machine with 4 GB of RAM.

Get-CacheHostConfig Command

By default, the amount of memory reserved for the cache on a given server is 50% of the total RAM. In this example, half of the RAM is 2 GB. The remaining memory is then available for the operating system and services. Even on machines with much more memory, it is recommended to keep this default setting. As previously mentioned, the caching service assumes that it is running on a dedicated machine, and it may use much more memory than is allocated for the cache. Although part of this memory usage is due to the internal design of the caching service, part of it is also related to .NET memory management and garbage collection. Even when memory is released in a .NET application, it must wait for garbage collection to be freed from the process memory. The process requires a buffer of physical memory to account for the non-deterministic nature of garbage collection.

Once you understand the impact of garbage collection, you can see that workloads with a high percentage and frequency of writes will require a larger buffer of memory to account for garbage collection cycles. Workloads that are mostly read-only will not have this consideration. In that case, you could consider raising the amount of memory reserved for caching in some cases. For example, on a 16 GB machine, you could consider reserving 12 GB for the cache Size setting (instead of the default of 8 GB), which provides 4 GB of overhead for the operating system and services. This assumes that the machine is dedicated for the caching service, which is currently the only supported configuration. In this example, you should test this memory configuration with your anticipated load. If the memory allocation is too aggressive, testing will reveal this mistake through memory related problems, such as eviction or throttling. For more information, see Troubleshooting Server Issues (Windows Server AppFabric Caching). The following example uses the Set-CacheHostConfig command to set the cache size to 12 GB on a server named Server1:

Set-CacheHostConfig -HostName Server1 -CachePort 22233 -CacheSize 12288

The other item to observe in the Get-CacheHostConfig output is the watermark values. The LowWatermark value has a default of 70% of the cache Size setting. When cached memory reaches the LowWatermark, the Caching service begins evicting objects that have already expired. This is fine, since those objects are inaccessible anyway.

Cache Host Low Watermark

The HighWatermark value has a default of 90% of the cache Size setting. At the HighWatermark level, objects are evicted regardless of whether they have expired until memory is back to the LowWatermark level. Obviously, this has the potential to hurt performance and also potentially results in a bad user experience.

Cache Host High Watermark

We recommend that you plan your cache usage to the LowWatermark level to avoid the possibility of hitting the HighWatermark. For a more detailed description see Expiration and Eviction.

Tip

Full garbage collection cycles can cause a short delay, which can often be seen in retry errors. For this reason, we recommend that each cache host have 16 GB or less of memory. Machines with greater than 16 GB of RAM can experience longer pauses for full garbage collection cycles. With that said, there is no restriction against using more memory per cache host. A workload that is more read-only might not experience full garbage collection cycles as frequently. You can best determine this through load testing.

In a previous example, we calculated an estimate of 21.7 GB for our total caching memory requirement. Since we want high availability, then there must be at least three servers. Assume that each server has 16 GB of RAM. In this example, we will keep the default cache Size setting of 8 GB on each server. As previously mentioned, the LowWatermark (70%) should be the target memory available on each server. That means that each cache server has an estimate of 5.6 GB of memory. With these factors, the following table shows that the four servers would provide 22.4 GB of caching memory and would meet the 21.7 GB requirement.

Total Memory Requirement	21.7 GB
Initial Memory (Cache Host)	16 GB
Cache Size setting (Cache Host)	8 GB
Low Watermark (Cache Host)	70%
Adjusted Memory Target per Cache Host	5.6 GB
Total Memory on Cache Cluster (3 machines)	5.6 GB * 4 Servers = 22.4

Again, remember that you can also use the published results in the Grid Dynamics whitepaper to verify this estimate in relation to throughput and latency goals. Those test results could lead you to slightly modify this initial estimate, such as by adding an additional cache server. The important point here is to use available resources like this to make an informed decision.

Capacity Planning Tool

A spreadsheet is a logical tool for the capacity planning steps in the previous sections. We have put together a sample spreadsheet that you can download here. The entries with an asterisk are items that you should modify based on your planning and resources. The remaining calculations are done by the spreadsheet.

The first part of the spreadsheet specifies the server configuration for each cache host. Note that you can change this as part of the planning process and observe the differences in final calculations. The following screenshot shows this first section.

Capacity Planning Spreadsheet Screen

Important

Unless you are using the default installation values, you are responsible for using the Set-CacheHostConfig command on each cache host to apply the CacheSize and LowWatermark settings.

The second section allows you to fill in estimates for different types of objects. In the example spreadsheet, only two sections are used for “Activity Data” and “Reference Data”. Then a series of empty columns are provided. You can rename these columns depending on the level of granularity that you are using in your planning (object, category, application, etc). The following screenshot shows this second section.

Capacity Planning Spreadsheet Screen

The third section estimates network requirements. You fill in the average read/write object sizes and the max read/write operations per second. This section calculates your maximum network bandwidth requirements for that object type. This can be used to get a rough idea of whether your grouping of machines and network cards can handle the load. As discussed in previous sections, you’ll also want to look at bandwidth across the network path. The following screenshot shows this third section.

Capacity Planning Spreadsheet Screen

The final section aggregates the requirements in the memory and network sections. It then uses the specified machine configuration from the first section to perform the calculation of the number of servers. The “Additional Servers” field allows you to increase this calculated total if required. For example, if the calculations specified that only two servers are required, you may add an additional one server to the final total in order to properly support high availability. The following screenshot shows this final section.

Capacity Planning Spreadsheet Screen

Note

The screenshots above use similar numbers to the example in this paper, but the estimated servers is three rather than four. The reason is that the spreadsheet sets the Cache Size Setting(Set-CacheHostConfig) value to 12 GB to demonstrate this custom setting. Changing this value to 8 GB would yield results similar to those seen in the previous sections of this paper.

Capacity Planning Next Steps

In the previous section, a methodology was presented for determining an initial estimate for the number of cache clusters, the number of cache hosts in each cluster, and the configuration of those cache hosts. But you should recognize that this is an estimate that might change depending on testing as well as ongoing monitoring.

If you decide to move forward with plans to use AppFabric caching, you may create a proof of concept to get a feel for how AppFabric caching would work in your solution. After that, you would want to setup a test cache cluster to run tests in your environment. Depending on the test results, you might make further changes to your configuration to achieve capacity, performance, and scalability requirements. The following section covers specific ongoing metrics that you can look at during both testing and production.

Monitoring Ongoing Caching Capacity Requirements

Caching capacity planning is never an exact science. Many of the numbers going into the conclusions are estimates themselves. Also, application usage and patterns can change over time. Because of this, you should monitor performance indicators to make sure that the cache cluster is fulfilling capacity requirements. Successful capacity planning is an ongoing process that continues in both test and production environments.

The Performance Monitor tool is the best way to monitor capacity in an ongoing way. On each cache host, we recommend you monitor the following counters:

Monitoring Category	Performance Monitor Counters
Memory	AppFabric Caching:Host\Total Data Size Bytes AppFabric Caching:Host\Total Evicted Objects AppFabric Caching:Host\Total Eviction Runs AppFabric Caching:Host\Total Memory Evicted AppFabric Caching:Host\Total Object Count .NET CLR Memory(DistributedCacheService)\# Gen 0 Collections .NET CLR Memory(DistributedCacheService)\# Gen 1 Collections .NET CLR Memory(DistributedCacheService)\# Gen 2 Collections .NET CLR Memory(DistributedCacheService)\# of Pinned Objects .NET CLR Memory(DistributedCacheService)\% Time in GC .NET CLR Memory(DistributedCacheService)\Large Object Heap size .NET CLR Memory(DistributedCacheService)\Gen 0 heap size .NET CLR Memory(DistributedCacheService)\Gen 1 heap size .NET CLR Memory(DistributedCacheService)\Gen 2 heap size Memory\Available MBytes
Network	Network Interface()\Bytes Received/sec Network Interface()\Bytes Sent/sec Network Interface(*)\Current Bandwidth
CPU	Process(DistributedCacheService)\% Processor Time Process(DistributedCacheService)\Thread Count Process(DistributedCacheService)\Working Set Processor(_Total)\% Processor Time
Throughput	AppFabric Caching:Host\Total Client Requests AppFabric Caching:Host\Total Client Requests /sec AppFabric Caching:Host\Total Get Requests AppFabric Caching:Host\Total Get Requests /sec AppFabric Caching:Host\Total Get Requests AppFabric Caching:Host\Total Get Requests /sec AppFabric Caching:Host\Total GetAndLock Requests AppFabric Caching:Host\Total GetAndLock Requests /sec AppFabric Caching:Host\Total Read Requests AppFabric Caching:Host\Total Read Requests /sec AppFabric Caching:Host\Total Write Operations AppFabric Caching:Host\Total Write Operations /sec
Miscellaneous	AppFabric Caching:Host\Cache Miss Percentage AppFabric Caching:Host\Total Expired Objects AppFabric Caching:Host\Total Notification Delivered

Many of the counters listed here connect directly to the capacity planning process. For example, there are several memory counters. “Memory\Available Mbytes” shows how much available physical memory remains on the machine in megabytes. If this counter goes under 10% of the total physical memory, there is a significant chance of throttling. For more information, see Throttling Troubleshooting. Other counters are specific to caching features. “AppFabric Caching:Host\Total Eviction Runs” indicates how often memory is evicted. As memory levels pass the high watermark, this counter will point to these eviction runs as the memory is brought back to the low watermark. Similarly, the other counters are associated with the capacity planning thought process found in the previous sections of this paper.

Note that the process counters for the service, DistributedCacheService, and the Processor counter for the machine are also captured. High processor utilization can negatively affect the cache cluster performance. If you do detect periods of high processor utilization, it is important to identify whether it is related to the Caching service (DistributedCacheService.exe) or another process on the machine.

In addition to the Performance Monitor tool, you can use the Windows PowerShell commands that install with AppFabric. Many of these commands can be used to monitor the health and state of the Cache Cluster. For more information, see Health Monitoring Tools and Logging and Counters in AppFabric Cache.

Windows Server AppFabric Caching Capacity Planning Guide

Introduction

Evaluating AppFabric Caching Performance

Capacity Planning Methodology

Step One: Understand bottlenecks and identify caching candidates

Step Two: Evaluate current workload patterns

Step Three: Understand physical infrastructure and hardware resources

Step Four: Finalize the required performance SLA for all applications

Step Five: Identify appropriate AppFabric features and configuration settings

Capacity Planning Tool

Capacity Planning Next Steps

Monitoring Ongoing Caching Capacity Requirements

See Also

Other Resources

Additional resources