Click to Rate and Give Feedback
MSDN
MSDN Library
System Services
File Services
Storage Management
Windows Clustering
 HangRecoveryAction
HangRecoveryAction

Specifies the recovery action taken by the cluster service in response to a heartbeat countdown timeout.

AttributeValue
Data typeDWORD
AccessRead/write
StructureCLUSPROP_DWORD
MinimumClussvcHangActionDisable (0)
MaximumClussvcHangActionBugCheckMachine (3)
DefaultClussvcHangActionBugCheckMachine (3)

 

Windows Server 2003 and Windows 2000 Server:  ClussvcHangActionTerminateService (2) is the default value.

Remarks

The Cluster network driver maintains a countdown timer that initiates the HangRecoveryAction property when it reaches 0 (zero). Whenever the ClusNet driver receives a Cluster service heartbeat, the countdown time is reset to the ClusSvcHeartbeatTimeout property. Additionally, when the Cluster service stops for any reason, the Cluster network driver automatically turns off the countdown timer.

The HangRecoveryAction property can be set to the following values.

ValueDescription
ClussvcHangActionDisable

0

Disables the cluster heartbeat and monitoring mechanism.
ClussvcHangActionLog

1

Log an event in the system log of the Event Viewer when a heartbeat countdown timeout occurs.
ClussvcHangActionTerminateService

2

Terminate the cluster service when a heartbeat countdown timeout occurs. (default)
ClussvcHangActionBugCheckMachine

3

Create a system Stop error (BugCheck) when a heartbeat countdown timeout occurs.

 

Note  In some extreme cases, system services may also stop responding, and actions 1 and 2 may not succeed. In such cases, action 3 (bugcheck) is the only effective recovery measure.

If the action is set to cause a bugcheck on the cluster node, Windows stops responding and you receive the Stop error Bugcheck code of 0x9E. The Stop error causes a failover to another cluster node. Additionally, if the node where the Stop error occurs is configured to capture a memory dump file, you may be able to use the information that is contained in the memory dump file to diagnose the cause of the unresponsive cluster node.

The following code is an example of a stack trace from a Kernel dump that the Cluster network driver initiated:

ChildEBP    RetAddr
f9c33ea8    f6e2e11f    nt!KeBugCheckEx+0x19
f9c33ecc    f6e2e836    clusnet!CnpCheckClussvcHang+0xef
f9c33ef0    805070d7    clusnet!CnpHeartBeatDpc+0x47e
f9c33fa4    8050735d    nt!KiTimerExpiration+0x371
f9c33ff4    80543ccf    nt!KiRetireDpcList+0x63

The Bugcheck error code is similar to the following error code: BugCheck 9E, {812d5b08, 3c, 0, 0}

Note  You must manually configure the server to generate a memory dump file in response to a Bugcheck.

Examples

The property value portion of a property list entry for HangRecoveryAction can be set with the following example code:

DWORD          ClusSvcHangActionData = 1;
CLUSPROP_DWORD ClusSvcHangActionValue;

ClusSvcHangActionValue.Syntax.dw = CLUSPROP_SYNTAX_LIST_VALUE_DWORD;
ClusSvcHangActionValue.cbLength  = sizeof(DWORD);
ClusSvcHangActionValue.dw        = ClusSvcHangActionData;

Requirements

Minimum supported clientNone supported
Minimum supported serverWindows 2000 Server Advanced Server, Windows 2000 Server Datacenter

See Also

CLUSPROP_DWORD

Send comments about this topic to Microsoft

Build date: 10/8/2009

Tags What's this?: Add a tag
Community Content   What is Community Content?
Add new content RSS  Annotations
Processing
© 2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement | Site Feedback
Page view tracker