Managing Cache Cluster Health (Windows Server AppFabric Caching)

It is important to understand the factors that contribute to a healthy Windows Server AppFabric cache cluster. This section describes general concepts and guidance around managing cache cluster health. For more information about the tools used to monitor cache cluster health, see Health Monitoring Tools (Windows Server AppFabric Caching).

General Cache Cluster Health Factors

There are several main areas that determine the health of the cache cluster:

  • Physical Resources

  • Caching-Specific Configuration Settings

  • Environmental Settings and Changes

Physical Resources

To successfully manage the cache cluster health, you must monitor and manage the basic physical resources on each cache host. The following list briefly covers some of the most important physical resources:

  • Memory -- It is important to understand the memory requirements of applications that use the cache cluster. This includes information such as the average rate of cached items, their average size, the average expiration time, and other factors that affect memory requirements. Although some planning can be done with averages, other planning must be done in a test environment. Note that you can examine the amount of memory reserved for each cache host by using the Get-CacheHostConfig Windows PowerShell command. This size is set by default when the cache host is added to the cache cluster, but you can change the memory reserved for the cache with the Set-CacheHostConfig command. Note that this document does not fully cover the subject of capacity planning.

    ImportantImportant
    If your memory requirements exceed the capacity of your cache hosts, you can experience performance problems or a condition called throttling. Throttling occurs during low-memory conditions on a cache host. In this state, the cache cluster will not permit new information to be written to the cache host, and applications may receive errors. This condition is more likely to happen with caches with eviction disabled, but it can also occur from external memory pressure from other processes. For more information about how to identify and resolve throttling, see Throttling Troubleshooting (Windows Server AppFabric Caching).

  • CPU -- High processor utilization can adversely affect the performance of any application. This includes the Caching Service, DistributedCacheService.exe, on each cache host. Use the performance monitor or task manager to examine the CPU utilization based on the process name. If a process other than DistributedCacheService.exe is using CPU for a majority of the time, evaluate whether that process should be running on the same server with the Caching Service.

  • Network Bandwidth -- As a distributed caching service, the network health and bandwidth is very important. Use performance monitor or other network monitoring tools to determine whether there are any issues that could negatively affect the cache cluster.

Caching-Specific Configuration Settings

The health of the cache cluster is also affected by the characteristics of each cache host, each cache, and each application using the cache. The following list reviews some of these configuration details and how they relate to the health of the cache cluster.

  • Eviction -- Caches can enable or disable eviction. Eviction occurs when the cache host memory passes certain thresholds. Use the Get-CacheConfig Windows PowerShell command to check the EvictionType. If EvictionType is None, eviction is disabled. For caches that disable eviction, you must be even more careful that there are enough cache hosts and enough physical memory on each cache host to meet your demands. For more information see Expiration and Eviction. For information about troubleshooting diagnosing and troubleshooting eviction issues, see Eviction Troubleshooting (Windows Server AppFabric Caching).

  • Expiration -- Expiration settings also affect memory growth. Cache configuration settings specify the default TimeToLive value. This is the length of time that objects reside in the cache before expiring. Caches can also disable expiration by setting Expirable to false. You can view these settings with the Get-CacheConfig Windows PowerShell command. Applications can also specify custom TimeToLive values for objects that get placed in the cache. Long expiration times can result in higher cache host memory usage. However, when the memory usage on a cache host reaches the HighWatermark setting, these objects can still be evicted unless EvictionType for a cache is set to None. You can view the HighWatermark setting with the Get-CacheHostConfig Windows PowerShell command. For more information about expiration, see Expiration and Eviction. For information about diagnosing and troubleshooting expiration issues, see Expiration Troubleshooting (Windows Server AppFabric Caching).

  • Custom Regions -- Applications can create custom regions. A region always exists on a single cache host. This can be an issue if the memory requirement for the region exceeds the physical memory on any one cache host. Use the Get-Cache Windows PowerShell command to see the regions on each cache host. If this is an issue, the redesign applications to use smaller regions or add physical memory to each cache host in the cluster.

  • High Availability -- The high availability cache option also affects memory requirements. If a cache enables high availability, a second copy of all cached items exists on another cache host. This doubles the memory requirements for that cache on the target cache cluster. It also increases the network and CPU load due t the need to copy items to the secondary cache hosts. You can view the high availability option with the Get-CacheConfig Windows PowerShell command. If Secondaries is set to 1, high availability is enabled. For more information about high availability, see High Availability.

  • Lead Hosts -- If you are using the XML provider for the cache cluster configuration store, you must know about the importance of lead hosts in your cache cluster. If a majority of lead hosts go down, the cache cluster will become unavailable. To determine whether a cache host is a lead host, use the Get-CacheHostConfig or Export-CacheClusterConfig Windows PowerShell commands. To make a cache host a lead host, use the Import-CacheClusterConfig command. For information about lead hosts, see Lead Hosts. Note that unless changed manually, the number of lead hosts is determined by the size of the cache cluster.

     

    Cache Cluster Size Number of Lead Hosts

    Small

    1

    Medium

    3

    Large

    5

Environmental Settings and Changes

Environmental settings and operational tasks also affect the cache cluster. The following list provides some of these factors.

  • Firewall -- For a cache host to communicate with the cache cluster, you must correctly configure the firewall. AppFabric installs a custom Windows Firewall rule named "AppFabric Caching Service (TCP-in)" in the custom group "Windows Server AppFabric: AppFabric Caching Service". If you are using the Windows Firewall, you should enable this rule. You also should enable the "Remote Service Management" firewall rules. Note that the AppFabric Configuration Wizard provides options to do this for you. If you are using a different firewall, you must create or enable custom rules in that firewall application. For more information, see the firewall section in Troubleshooting AppFabric Caching.

  • Operating System and Software Updates -- There are times when you must apply updates to either the operating system or software on a cache host to maintain server health or security. You can decide to update all servers at the same time by first stopping the cache cluster with the Stop-CacheCluster command. Or you can update one or more cache hosts at a time while leaving other cache hosts running in the cache cluster. There are several considerations when you use this method. For more information, see Updating Cache Servers (Windows Server AppFabric Caching).

  • IP Address Changes -- Any changes to cache host IP addresses can cause problems with the communication between cache hosts in the cache cluster.

See Also

Community Additions

ADD
Show: