1.3 Overview

The Dynamic Host Configuration Protocol (DHCP) protocol inherently supports the rebinding operation described in [RFC2131] section 4.4.5. In this operation, if a DHCP client fails to renew its dynamically allocated address via a unicast DHCPREQUEST message directed to the original DHCP server that allocated the address at time T1, the client attempts to renew the request via a broadcast DHCPREQUEST message at time T2. Any capable DHCP server on the network can respond to the broadcast DHCPREQUEST. This allows a redundant DHCP server to renew the lease in the event of failure of the original DHCP server.

For such a setup to work, the lease database on the redundant DHCP server is synchronized with that on the original server. The DHCP Failover Protocol provides a mechanism for a pair of DHCP servers to synchronize lease database information.

      DHCP rebinding operation

Figure 1: DHCP rebinding operation

The speed of address acquisition and confirmation is a key performance metric for the DHCP protocol and the DHCP Failover Protocol is designed to minimize this impact. To achieve this, the original DHCP server responsible for allocating and renewing the lease to the DHCP client updates its DHCP partner server after the DHCP server responds to the DHCP client.

  Lease synchronization

Figure 2: Lease synchronization

The protocol includes provisions that provide robustness when responding to a failure of the original server before the partner server is updated regarding an allocation or a renewal. Either server updates the DHCP client with a lease expiration time that is less than the expiration time communicated to and acknowledged by the partner server, plus a preconfigured duration called the maximum client lead time (MCLT). In the case of a fresh allocation, the expiration time acknowledged by the partner is assumed to be 0. Thus, if the partner server has to renew a lease for a client that it had not originally allocated and has determined that its partner is down, the partner server can safely perform the renewal for the time recorded in its lease database (or 0, if the record for this client is missing) plus the MCLT duration. This scheme is detailed in [IETF-DHCPFOP-12] section 5.2.1.

To synchronize lease information, the two partners communicate over TCP by using protocol messages with a fixed-length header and variable-length options. The fixed-length header contains the overall message length and message type that allows messages to be extracted from the TCP data stream.

One of the DHCP servers in the failover pair is designated as the primary server and the other is designated as the secondary server. The primary server is responsible for connection establishment and initialization. The pool of IP addresses available for the subnets being serviced by the failover pair is partitioned between the primary and secondary servers where the primary is also responsible for partitioning and allocation of addresses to the secondary.

Communication between the failover servers starts with the primary establishing a TCP connection with the secondary ([IETF-DHCPFOP-12] section 8.2). Once the connection is established, the primary sends the CONNECT message to the secondary with the relationship parameters ([IETF-DHCPFOP-12] section 7.8.1). These include message authentication parameters, if so configured by the administrator. The secondary can accept the connection request by responding with a CONNECTACK message with no reject reason option, or reject it with a CONNECTACK message that includes a reject reason option ([IETF-DHCPFOP-12] section 7.9.1). Upon completing the connection, each server updates its partner regarding its state by sending the STATE message ([IETF-DHCPFOP-12] section 7.10).

           Connection setup

Figure 3: Connection setup

If the servers have been out of communication, either of the servers can request that its partner send it all the binding database information that it has not already received. This task is accomplished by sending an UPDREQ message to the partner. This causes the partner to send BNDUPD messages to the requesting server which the requesting server acknowledges with BNDACK messages. After the partner has sent all BNDUPD messages to the requesting server, it sends an UPDDONE message to indicate that the original UPDREQ was fulfilled ([IETF-DHCPFOP-12] section 7.3 and 7.5). Similarly, an UPDREQALL message can be used by a server that is recovering from a total loss of binding information ([IETF-DHCPFOP-12] section 7.4).

           Recovery

Figure 4: Recovery

Regular binding updates are triggered by the receipt of BNDUPD and BNDACK messages corresponding to lease allocations or renewals as indicated in Figure 2. These messages are the payload of the DHCP Failover Protocol and all other messages have ancillary functions. The primary server allocates IP addresses from the available pool to the secondary and also uses BNDUPD messages to update the latter about the allocation.

Communication interruption is detected by the loss of the TCP connection. In addition to an active TCP connection, the regular receipt of messages is used to ensure availability of the partner. To ensure that the partner server determines the current server as operational, the current server sends periodic CONTACT messages, if other protocol messages are not transmitting on the connection.

For further details on the DHCP Failover Protocol, see [IETF-DHCPFOP-12].