Health Monitoring Tools (AppFabric 1.1 Caching)
This section describes the various tools and commands available for monitoring the health of a Microsoft AppFabric 1.1 for Windows Server cache cluster. These tools include the following.
Event Tracing for Windows (ETW)
System Center Operations Manager
The AppFabric Caching features install several Performance Monitor counters. For more information about the available counters, see Performance Coutners for AppFabric Caching. You can observe or log some counter values to determine a baseline of typical cache cluster behavior. For example, in the
AppFabric Caching:Cache category, you might observe that the
Total Client Requests / sec value stays within general ranges that varies with the time of day. You can use this baseline to identify a trend of increasing client requests to the cache cluster that might necessitate adding additional cache hosts.
For more general information about using Performance Monitor, see Using Performance Monitor.
Event Tracing for Windows (ETW)
The AppFabric Caching features use Event Tracing for Windows (ETW) to provide status and error information related to the cache cluster. You can use the Event Viewer to examine the ETW logs for AppFabric Caching features.
Open the Event Viewer on a cache host. For instructions on how to launch the Event Viewer, see Start Event Viewer.
In the left navigation pane, expand the Applications and Services Logs folder.
Then expand Microsoft, Windows, and Application Server-System Services.
Select the Admin log.
The Admin log contains informational updates, such as when the AppFabric Caching Service starts or stops. It also contains warnings and errors. Note that these logs can contain events from other AppFabric features, such as hosting and monitoring. You can choose to filter the log to just the
Microsoft-Windows Server AppFabric Caching source to focus on events related to the AppFabric Caching features.
The Application Server-System Services folder, also contains an Operational log. By default, this log is disabled. To enable it, right-click on the Operational log in the navigation pane, and then click Enable Log. The Operational log contains other events, such as low memory conditions.
When evaluating the health of the cache cluster, it is important to examine the event logs on each of the cache hosts that belong to the cluster. A problem with one cache host can have a negative effect on the entire cache cluster.
The event viewer is useful to regularly monitor the health of the cache cluster. However, when troubleshooting an error, it is possible to get an even more detailed log of the cache cluster activities. This can be done with the tracelog.exe tool. The tracelog.exe tool creates a detailed ETL trace log from the command-line. You can download the tracelog utility as a part of the Windows Software Development Kit. The following command begins logging to the cachedebugtrace.etl file:
tracelog -start debugtrace -f cachedebugtrace.etl -guid "C:\Program Files\Windows Server AppFabric\Manifests\ProviderGUID.txt" -level 5 -cir 512
The following command stops the logging:
tracelog -stop debugtrace
The following command converts the log cachedebugtrace.etl file into a text file named cachedebugtrace.csv.
tracerpt .\cachedebugtrace.etl -o cachedebugtrace.csv -of CSV
|Although the traceprt tool enables you to view the contents of the log file generated by tracelog, you may need to work with Microsoft support to fully interpret the information.|
System Center Operations Manager
You can use System Center Operations Manager to monitor the health of the AppFabric cache cluster. For more information, see Windows Server AppFabric Management Pack for Operations Manager 2007.
There are several Windows PowerShell commands that indicate the current status and health of a cache cluster. This section demonstrates how to use the following commands.
Note that these commands provide dynamic information based on the current state of the cache cluster. It is often useful to also look at the configuration details with the following commands:
Export-CacheClusterConfig. These commands are covered in the section Common Cache Cluster Management Tasks (AppFabric 1.1 Caching).
|For more information about how to get started with Windows PowerShell, see Common Cache Cluster Management Tasks (Windows Server AppFabric Caching). For a complete list of commands, see Using Windows PowerShell with AppFabric Caching.|
Get-CacheHost command without any parameters to quickly view the status of the cache hosts in the cache cluster. Some problems occur when one or more cache hosts in a cluster are not running. For example, consider the following output from
PS C:\> Get-CacheHost HostName : CachePort Service Name Service Status Version Info -------------------- ------------ -------------- ------------ CacheServer1:22233 AppFabricCachingService UP 1 [1,1][1,1] CacheServer2:22233 AppFabricCachingService DOWN 1 [1,1][1,1] CacheServer3:22233 AppFabricCachingService UP 1 [1,1][1,1]
This output shows that there are three cache hosts in the cluster:
Service Status column indicates that the cache cluster is running, because at least one cache host has a status of
CacheServer2 is currently stopped with a status of
DOWN. This could indicate a problem with
CacheServer2, or you might simply need to start the cache host with the
Start-CacheHost command. The
Get-CacheHost command is often the first command you should run to get a high-level overview of the state of the cache cluster.
Get-CacheClusterHealth to get detailed information about the health of the cache hosts and the caches residing on those cache hosts. For example, consider the following sample output from the
Cluster health statistics ========================= HostName = CacheServer1 ------------------------- NamedCache = default Healthy = 0.00 UnderReconfiguration = 0.00 NotPrimary = 0.00 NoWriteQuorum = 0.00 Throttled = 25.00 NamedCache = Cache1 Healthy = 0.00 UnderReconfiguration = 0.00 NotPrimary = 0.00 NoWriteQuorum = 0.00 Throttled = 25.00 HostName = CacheServer2 ------------------------- NamedCache = Cache1 Healthy = 25.00 UnderReconfiguration = 0.00 NotPrimary = 0.00 NoWriteQuorum = 0.00 Throttled = 0.00 NamedCache = default Healthy = 25.00 UnderReconfiguration = 0.00 NotPrimary = 0.00 NoWriteQuorum = 0.00 Throttled = 0.00 Unallocated named cache fractions ---------------------------------
Internally, the cache cluster uses a concept of partitions to organize and manage memory. The numbers displayed in the output of the
Get-CacheClusterHealth command are the percentages of the total number of cache cluster partitions. For example, on CacheServer2 the named cache
Cache1 is using
25.00 percent of the total partitions and all of those partitions are healthy. However, the specific percentages are not as important as the categories in which those percentages reside. Adding more caches or cache hosts may reduce Cache1 from
25.00 percent to
10.00 percent, but as long as that
10.00 percent is still in the
Healthy category, the cache is still healthy. In the previous example, note that
CacheServer1 is showing both caches as
Throttled. This is a low-memory condition on that server. For more information about how to troubleshoot this low-memory condition, see Throttling Troubleshooting (Windows Server AppFabric Caching).
The following table describes each category in the
The cache is operating normally. This is the target state for all caches.
The cache is under reconfiguration. This is an internal state that may have several causes, but it should be temporary and resolve to healthy.
The cache is not currently available. This can happen when secondary copies are promoted to primary. During this transition, the cache may temporarily have a state of
The cache is read-only, because the cache is unable to create the required number of replicas on secondary cache hosts. This occurs when the cache has the high availability option enabled (
The cache is read-only, because the cache host is in a throttled memory state. This is a low-memory condition.
Unallocated named cache fractions represents the percentage of cache partitions that have not been allocated to a specific cache host yet. This state normally appears when the cache cluster is started or when a cache host is started or stopped on the running cluster. This state should typically resolve to healthy.
Get-CacheStatistics Windows PowerShell command provides basic information about the contents of a cache. The following example demonstrates how to display the cache statistics for a cache named
This is sample output from the previous command.
Size : 12408186 ItemCount : 1200 RegionCount : 714 RequestCount : 1200 MissCount : 1200
The previous output shows that there are 1200 items in
Cache1 for a total size of 12408186 bytes. There are 714 regions, which could be user-created or system-created. There have been 1200 requests and the same number of misses. However, it is important not to see the
MissCount as a problem indicator in isolation. When the cache cluster is restarted, applications must repopulate the cache. This involves checking to see whether the cached item exists, which increments the
MissCount. A high
MissCount could indicate that the items in the cache have been unexpectedly evicted or that the expiration time on cached items is too low, but these conditions cannot be confirmed with the cache statistics alone. For example, if you use the Put method to add an item that is not in the cache, it increments the
MissCount, but this is not an error condition.
This command can be used together with the
Get-CacheConfig command. For example, if the
Get-CacheStatistics command showed that
Cache1 had an unexpectedly large size of 1 GB, you could examine the cache configuration with
Get-CacheConfig to see the eviction and expiration settings.