
Failures Due to Soft Errors
Conditions that might cause mirroring time-outs include (but are not limited to) the following:
-
Network errors such as TCP link time-outs, dropped or corrupted packets, or packets that are in an incorrect order.
-
A hanging operating system, server, or database state.
-
A Windows server timing out.
-
Insufficient computing resources, such as a CPU or disk overload, the transaction log filling up, or the system is running out of memory or threads. In these cases, you must increase the time-out period, reduce the workload, or change the hardware to handle the workload.
The Mirroring Time-Out Mechanism
Because soft errors are not detectable directly by a server instance, a soft error could potentially cause a server instance to wait indefinitely. To prevent this, database mirroring implements its own time-out mechanism, based on each server instance in a mirroring session sending out a ping on each open connection at a fixed interval.
To keep a connection open, a server instance must receive a ping on that connection in the time-out period defined, plus the time that is required to send one more ping. Receiving a ping during the time-out period indicates that the connection is still open and that the server instances are communicating over it. On receiving a ping, a server instance resets its time-out counter on that connection.
If no ping is received on a connection during the time-out period, a server instance considers the connection to have timed out. The server instance closes the timed-out connection and handles the time-out event according to the state and operating mode of the session.
Even if the other server is actually proceeding correctly, a time-out is considered a failure. If the time-out value for a session is too short for the regular responsiveness of either partner, false failures can occur. A false failure occurs when one server instance successfully contacts another whose response time is so slow that its pings are not received before the time-out period expires.
In high-performance mode sessions, the time-out period is always 10 seconds. This is generally enough to avoid false failures. In high-safety mode sessions, the default time-out period is 10 seconds, but you can change the duration. To avoid false failures, we recommend that the mirroring time-out period always be 10 seconds or more.
To change the time-out value (high-safety mode only)
To view the current time-out value