Cisco CNS Network Registrar User's Guide, 5.0
Configuring DHCP Failover

Table of Contents

Configuring DHCP Failover
Failover Configuration Steps
Configuring Failover Based on Your Scenario
Using the DHCP Failover Configuration Tool
Handling State Transitions and Regimes
Allocating Addresses Among Servers
Changing Failover Server Roles
Handling Special Cases
Troubleshooting Failover

Configuring DHCP Failover

DHCP failover allows you to configure two servers to operate as a redundant pair of DHCP servers. If one server is down, the other server seamlessly takes over and allows existing DHCP clients to keep and renew their IP leases. Clients requesting new leases need not know or care which server is responding to their request for a lease. These clients can obtain leases even if the main server is not operational.

Table 11-1 lists the topics in this chapter and their associated sections.


Table 11-1: DHCP Failover Configuration Topics
If you want to... Go to...

Know more about how DHCP failover works

"DHCP Failover" section

Have an overview of the failover configuration steps

"Failover Configuration Steps" section

Configure failover based on the three main server scenarios or deployments

"Configuring Failover Based on Your Scenario" section

Use the failover configuration tool to ensure synchronized partners

"Using the DHCP Failover Configuration Tool" section

Determine and handle changes in failover states and regimes

"Handling State Transitions and Regimes" section

Allocate addresses and leases among the failover partners

"Allocating Addresses Among Servers" section

Change the roles of servers

"Changing Failover Server Roles" section

Handle special configuration cases, such as multiple interface server hosts, changing system defaults and the maximum lead time, and supporting BOOTP clients and Relays.

"Handling Special Cases" section

Troubleshoot failover

"Troubleshooting Failover" section



Failover Configuration Steps

The following steps review the process by which you configure DHCP failover. The remainder of this section describes the steps based on different site configurations.


Step 1   Configure your DHCP server and duplicate its configuration to the backup server.

   a. Duplicate the configurations for scopes, policies, DHCP options, and IP addresses. (You might be able to use the failover configuration tool for this. See "Using the DHCP Failover Configuration Tool" section.)

   b. If you enabled dynamic DNS update on the main server, be sure that it is also enabled on the backup server. (See "Configuring Dynamic DNS Update.")

   c. If you are using reservations, make sure that reservations are configured identically on each server. (See the "Reserving a Lease" section.)

   d. If you are using client-class, configure both servers with identical client-classes. (See the "Defining Client-Classes" section.)

   e. Give the scopes the same set of selection tags. (See the "Setting Client-Class Scope Selection Criteria" section.)

   f. Enter the clients in both clusters, or enter them in LDAP and direct both servers to the same LDAP server. (See "Configuring LDAP.")

Step 2   Choose your backup configuration scenario: basic, back office, or symmetrical.

Step 3   Set up identical configurations on both servers. (You might be able to use the failover configuration tool for this: see the "Using the DHCP Failover Configuration Tool" section.)

Step 4   Configure servers according to the choices made in step 1.

Step 5   Reload both servers. (See the "Configuring the Basic Scenario" section.)

Step 6   If you use BOOTP relays (IP helper address), configure all BOOTP relays to point to both servers. (See the "Changing System Defaults" section.)


Configuring Failover Based on Your Scenario

In all the following examples, the main and backup servers were already configured identically from a scope, policy, client, and client-class standpoint, and the server-wide default capabilities are used. These examples illustrate only the failover-specific configuration commands.

Before you consider these server deployments, read about how failover works in the"DHCP Failover" section. You may also want to look at the "Configuring Failover on Multiple Interface Hosts" section.


Tip You can also use the failover configuration tool for all these scenarios. For details, see the "Using the DHCP Failover Configuration Tool" section.

Configuring the Basic Scenario

This failover scenario involves a main server and a single backup server (Figure 11-1).


Figure 11-1: Simple Failover Configuration


On Main Server A:

Enable failover, specify the name of the backup server, and reload the server.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-backup-server=dhcpB.example.com 
nrcmd> server dhcp reload 
 
On Backup Server B:

Enable failover, specify the name of the main server, and reload the server.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-main-server=dhcpA.example.com 
nrcmd> server dhcp reload 

Note   The server that is reloaded last might get an error stating that the failover is not available. You can safely ignore this message on startup.

Using a Script:

Instead of typing the CLI commands interactively, you could create a command file that you could run to configure both the main and backup servers. Here is the contents of that file:

dhcp enable failover 
dhcp set failover-main-server=dhcpA.example.com 
dhcp set failover-backup-server=dhcpB.example.com 
 

Configuring the Back Office Scenario

This failover scenario involves two main servers and a single backup server (Figure 11-2). The main servers are Aserver and Bserver and the backup server is Cserver. The main servers have three scopes each: scope1, scope2, and scope3 on Aserver; and scope4, scope5, and scope6 on Bserver. This scenario is well suited for scopes that are on the same LAN segment, which requires the same main and backup server combination. In this case scope1, scope2, and scope3 are on the same LAN segment, and scope4, scope5, and scope6 are on a different one.


Figure 11-2: Back Office Failover Configuration


On Both Main Servers (A and B):

On each main server, enable failover and specify the name of the backup server. (Note that if you disable failover at the server level, only those scopes that have failover enabled can engage in failover activity.)

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-backup-server=Cserver.example.com 
 
On Backup Server (C):

If you have more than one main server, you need to list the scopes of the other servers individually. The following example enables failover for the server (and therefore for each scope) and designates the main server for each scope.

nrcmd> dhcp enable failover 
nrcmd> scope scope1 set failover-main-server=Aserver.example.com 
nrcmd> scope scope2 set failover-main-server=Aserver.example.com 
nrcmd> scope scope3 set failover-main-server=Aserver.example.com 
nrcmd> scope scope4 set failover-main-server=Bserver.example.com 
nrcmd> scope scope5 set failover-main-server=Bserver.example.com 
nrcmd> scope scope6 set failover-main-server=Bserver.example.com 
 
Duplicating Configurations on the Backup Server:

Instead of entering the CLI commands interactively, you could create a command file so that you can duplicate the information about servers A and B to backup server C.

On main server A, create FileA that contains all the information about the scopes and includes the following specifications for failover, then run the file.

scope scope1 set failover=scope-enabled 
scope scope1 set failover-main-server=Aserver.example.com 
scope scope1 set failover-backup-server=Cserver.example.com 
scope scope2 set failover=scope-enabled 
scope scope2 set failover-main-server=Aserver.example.com 
scope scope2 set failover-backup-server=Cserver.example.com 
scope scope3 set failover=scope-enabled 
scope scope3 set failover-main-server=Aserver.example.com 
scope scope3 set failover-backup-server=Cserver.example.com 
 
nrcmd -b <FileA 
 

On main server B, create a similar FileB, then run the file.

scope scope4 set failover=scope-enabled 
scope scope4 set failover-main-server=Bserver.example.com 
scope scope4 set failover-backup-server=Cserver.example.com 
scope scope5 set failover=scope-enabled 
scope scope5 set failover-main-server=Bserver.example.com 
scope scope5 set failover-backup-server=Cserver.example.com 
scope scope6 set failover=scope-enabled 
scope scope6 set failover-main-server=Bserver.example.com 
scope scope6 set failover-backup-server=Cserver.example.com 
 
nrcmd -b <FileB 
 

On backup server C, run the two command files.

nrcmd -b <FileA 
nrcmd -b <FileB 
 

Configuring the Symmetrical Scenario

This failover scenario involves two servers that share network responsibilities by acting as backups for each other based on certain scopes (Figure 11-3). Aserver is the main for scopes 1 through 3 and the backup for scopes 4 through 6. Bserver plays the reverse role.

On Server A:

Enable failover for the server, then specify the name of the backup server for the first three scopes and the main server for the last three scopes.

nrcmd> dhcp enable failover 
nrcmd> scope scope1 set failover-backup-server=Bserver.example.com 
nrcmd> scope scope2 set failover-backup-server=Bserver.example.com 
nrcmd> scope scope3 set failover-backup-server=Bserver.example.com 
nrcmd> scope scope4 set failover-main-server=Bserver.example.com 
nrcmd> scope scope5 set failover-main-server=Bserver.example.com 
nrcmd> scope scope6 set failover-main-server=Bserver.example.com 
 

Figure 11-3: Symmetrical Failover Configuration


On Server B:

Do the reverse for Bserver as for Aserver. Enable failover, then specify the name of the main server for the first three scopes and the backup for the last three scopes.

nrcmd> dhcp enable failover 
nrcmd> scope scope1 set failover-main-server=Bserver.example.com 
nrcmd> scope scope2 set failover-main-server=Bserver.example.com 
nrcmd> scope scope3 set failover-main-server=Bserver.example.com 
nrcmd> scope scope4 set failover-backup-server=Bserver.example.com 
nrcmd> scope scope5 set failover-backup-server=Bserver.example.com 
nrcmd> scope scope6 set failover-backup-server=Bserver.example.com 
 
Using a Script:

The following script is for the symmetrical server case in which Aserver is serving scopes scope1 through scope3 as main, and Bserver is serving scopes scope4 through scope6 as main. They are backing up each other's scopes. You can use this script on both servers.

dhcp enable failover 
scope scope1 set failover-main-server=Aserver.example.com 
scope scope1 set failover-backup-server=Bserver.example.com 
scope scope2 set failover-main-server=Aserver.example.com 
scope scope2 set failover-backup-server=Bserver.example.com 
scope scope3 set failover-main-server=Aserver.example.com 
scope scope3 set failover-backup-server=Bserver.example.com 
 
scope scope4 set failover-main-server=Bserver.example.com 
scope scope4 set failover-backup-server=Aserver.example.com 
scope scope5 set failover-main-server=Bserver.example.com 
scope scope5 set failover-backup-server=Aserver.example.com 
scope scope6 set failover-main-server=Bserver.example.com 
scope scope6 set failover-backup-server=Aserver.example.com 
 

Avoiding Configuration Mistakes

There are several ways to make a simple mistake and lose the DHCP redundancy provided by the failover protocol. Make sure that you have not made one of the following mistakes:

  • Enabled failover on one server and not on the other.
  • Configured main and backup servers differently, for example, fail to include some scopes in the backup's configuration.

  • Failed to configure the underlying network infrastructure, that is, fail to reconfigure all of your BOOTP relays to send DHCP packets to both main and backup servers.

In the first two situations, Network Registrar detects and logs the configuration mistakes during processing, although a mistake may be detected a considerable time after the actual misconfiguration occurred. However, Network Registrar cannot detect the mistake in the third situation.

You can only detect BOOTP configuration errors by performing live tests (akin to fire drills) in which you periodically take the main server out of service to verify that the backup server is available to DHCP clients.

Maintaining Failover Servers

After you configure failover, verify that failover is running correctly by checking the log files and running a report (see the "Displaying Related DHCP Servers" section) that displays information about the connection status of the main and backup servers.

When you check the log files for each server, note that there are two possible roles for each server with respect to every other server: it may be a main server for another server, or a backup server for another server. There is a failover object within the DHCP server for each of these roles, and the object name directly reflects its role.

  • In the simple configuration (Figure 11-1), server A has one failover object as main for Bserver and server B has one failover object as backup for Aserver.
  • In the back office configuration (see Figure 11-2), each main server has one failover object as main for Cserver, while server C has two failover objects, as backup for Aserver and as backup for Bserver.

When the DHCP server configures itself, it logs every network that is part of each failover object. In addition, it reports the configuration parameters (such as failover-maximum-client-lead-time, failover-backup-percentage, failover-use-safe-period state, and so on).

Monitoring Failover Server Status

Run the dhcp getRelatedServers command to create a report about the connection status of the main and backup servers. The command displays the following information (in Table 11-2).


Table 11-2: getRelatedServers Report
Column Description

Type

Main, Backup, DNS, or LDAP

Name

DNS host name

Address

IP address in dotted octet format

Requests

Number of outstanding requests, or two dashes (--) if not applicable

Communications

OK or INTERRUPTED

Localhost State

Failover state of this server, or two dashes (--)

Partner State

Failover state of the associated failover server, or two dashes (--)



Changing General DHCP Configurations

In the following cases, if you make any changes to your main DHCP server you must duplicate those changes to the backup server.

  • Adding, deleting, modifying scopes

  • Changing policies

  • Adding, deleting, modifying clients

  • Adding or changing IP addresses

  • Adding or changing reservations

  • Adding, deleting, modifying client-classes

  • Enabling or disabling dynamic DNS updates

  • Enabling or disabling dynamic BOOTP


Tip Consider maintaining your entire DHCP server configuration in CLI command files and always make any changes to those files.

Using the DHCP Failover Configuration Tool

The DHCP failover configuration tool ensures that users can duplicate configurations without having to manually enter the configurations on the backup machine, saving time and preventing errors. To accomplish this task, the failover configuration tool uses the cnrFailoverConfig command.

The types of configuration options currently supported by the failover configuration tool are:

The three main steps in the failover configuration process are:

1. Make a copy of the main server's configuration in the form of a CLI batch file, using the cnrFailoverConfig -clone command.

2. Load the CLI batch file on the backup server.

3. Check to make sure both servers are identical, using the cnrFailoverConfig -compare command.

The following sections explain each step.

Step 1: Clone the Main Server's Configuration

The following procedure shows how to clone the main server's configuration.

   a. Configure the main server, including saving the configuration and reloading the server.

   b. Create a batch configuration file on the main server, at the system command line.

    % cnrFailoverConfig -clone -mc cluster -mu user -mp password -o config_file 
     
    
The generated configuration file may contain lines such as the following:

    dhcp set client-class=enabled
    dhcp unset collect-performance-statistics
    dhcp set defer-lease-extensions=false
    dhcp set discover-interfaces=enabled
    dhcp set dns-timeout=5000
    ...
     
    

Step 2: Load the Batch File on the Backup Server

Perform these substeps to load the batch configuration file on the backup server.

    % nrcmd -b < config_file 
    username: admin
    password:
    100 Ok
    session:
        cluster = localhost
        default-format = user
        user-name = admin
        visibility = 5
     
    nrcmd>
    Set the configuration to match DHCP server localhost
    ...
     
    
    % nrcmd 
    nrcmd > save 
     
    

Step 3: Compare the Configurations, Reload, and Confirm

Perform these substeps to compare the configurations of the main and backup servers.

   a. Invoke the cnrFailoverConfig -compare command from the system command line. (You can be on either the main or backup server.)

    % cnrFailoverConfig -compare -mc main_cluster -mu main_user -mp main_password 
          -bc backup_cluster -bu backup_user -bp backup_password -verbose -o compare_file 
     
    

    Note   If you perform an additional reverse comparison between the main and the backup server, the results may not be the same, especially if you have an unsymmetrical back office type failover configuration (see the "Configuring Failover Based on Your Scenario" section).

   b. Check the output file for discrepancies. Since configuration differences between the two servers are marked with the word "difference," you can search for that word using a text editor. (If you omit the -verbose switch, only the differences appear in the output file.)

If there are differences, go back to the "Step 1: Clone the Main Server's Configuration" section and correct the problem.

    nrcmd> server DHCP reload 
     
    
    nrcmd> server DHCP getRelatedServers 
     
    

Handling State Transitions and Regimes

During normal operation the DHCP failover servers transition from one state to another. The servers stay in their current state until all of the actions specified on the state transition are completed. If communication fails during one of the actions, the server simply stays in the current state and attempts a transition whenever the conditions for a transition are fulfilled.

Failover States

Table 11-3 describes the failover states and how they transition.


Table 11-3: Failover States and Transitions
State Description Transition

STARTUP

The server can learn its partner's state before doing anything. A server tries to contact its partner during this state. When it succeeds, it immediately enters another state.

A server automatically leaves STARTUP state after a short time, typically seconds.

NORMAL

The server can communicate with its partner. The main server can respond to all client requests. The secondary server only responds to renewal and rebinding requests. This is one of the few states in which the main and backup server operations are different.

COMMUNICATIONS-
INTERRUPTED

A server cannot communicate with its partner. Main and backup servers automatically cycle (without administrative intervention) between NORMAL and COMMUNICATIONS-INTERRUPTED states as the network connection fails and recovers, or as the partner cycles between operational and non-operational. Servers cannot give duplicate addresses while they cycle between these states.

PARTNER-DOWN

Either server can enter this state. It ignores the fact that the other server can still operate and service a different set of clients and simply acts as if it is the only server operating. For this reason, only one server should be in this state. The servers can then properly resynchronize and no duplicate addressing can occur. Until the partners resynchronize, an address is in a pending available state.

POTENTIAL-
CONFLICT

The two servers are trying to re-integrate, but at least one of them is running in a state that did not guarantee automatic re-integration. The server may determine that two different clients were offered and accepted the same IP address, and tries to resolve that conflict, but the two clients may not be operating.

RECOVER

The server has no information in its stable storage, or it is reintegrating with its partner in PARTNER-DOWN state after it comes back up. When a server is in this state, it tries to refresh its stable storage from its partner. A main server in RECOVER state does not immediately start serving leases again. (In fact, do not reload a server in this state; it will only slow it down.)

RECOVER-DONE

Allows an interlocked transition for one server from RECOVER state and another from PARTNER-DOWN state, or COMMUNICATIONS-INTERRUPTED state into NORMAL state.

PAUSED

A server can inform its partner that it will be out of service for a relatively short time.

Lets the partner transition to COMMUNICATIONS-
INTERRUPTED state immediately and begin servicing clients without interruption.

SHUTDOWN

A server can inform its partner that it will be out of service for a long time

Lets the partner transition immediately to PARTNER-DOWN state and take over completely.



Handling Failover Regimes

The failover protocol operates in three regimes, which correspond loosely to failover states. These regimes are:

Each server operates differently in each of these regimes (see Table 11-4).


Table 11-4: Server Operations in Regimes
Regime Main Server Backup Server

Normal

Responsive to all DHCP client requests and allocates IP addresses to new clients from its pool of available addresses. It allocated to the backup server some addresses for the backup server to use if communication is interrupted.

Unresponsive to DHCP client requests except renewals or rebinding requests. It requested and received a set of addresses to allocate to new DHCP clients if communication with the main server is interrupted.

Communications Interrupted

Responsive to all DHCP client requests. It cannot tell if the backup server went down or if it is just unable to communicate. It operates normally, although it cannot reallocate an address from one DHCP client to another while in this regime.

Cannot tell if the main server is down or simply not communicating. In either case, the server is responsive to all DHCP client requests and can allocate addresses from its pool of available addresses it received from the main server.

Servers usually transition between Normal and Communications Interrupted regimes as one or the other server comes up or goes down.

Partner Down

The running server is guaranteed that the other server is down. The running server has control of all of the IP addresses, can offer any configured lease or extension, and at any time can reallocate an address from one client to another.

A server only transitions to Partner Down regime if it is informed that the other partner is indeed down. The notification can be either through the protocol (used when the partner knows that it is going down) or because the server was unable to communicate with its partner, it automatically entered the Communication Interrupted regime, and the administrator used the server DHCP setPartnerDown command.

The server DHCP setPartnerDown command tells the server that its partner is down. You could configure failover to effect an automatic transition from Communications Interrupted regime to Partner Down regime after the safe period passes, but doing so would run the risk of duplicate IP address allocation if the partner is not actually down.



Ideally, you would let the servers move from Normal regime to Communications Interrupted regime and back again, because these are safe. You would then never need to intervene to move a server into Partner Down regime. In some cases, however, this is not practical because a server running in Communications Interrupted regime is not using the available IP addresses efficiently. This can restrict the amount of time a server can effectively service DHCP clients.

There are restrictions on either server running in Communications Interrupted regime that do not apply to a server running in Partner Down regime:

In addition, if the backup server is running in Communications Interrupted regime, the following restriction applies: the backup server may run out of IP addresses to allocate to newly arriving DHCP clients, because it normally has only a small pool of available IP addresses, whereas the main server has the majority of available addresses.

The length of time a server can successfully run in Communications Interrupted regime is limited only by the number of IP addresses that have been allocated to it, and the corresponding arrival rate of the DHCP client DISCOVER packets for new clients. (As far as failover is concerned, a server that is responsive to DISCOVER packets is also responsive to INIT-REBOOT packets.) When there is a high arrival rate of new DHCP clients or a high turnover rate of the client IP addresses, you may need to move the server into Partner Down regime more quickly.

Enabling the Safe Period

The failover safe period is optional and is disabled by default. It is the period after which either the main or backup server automatically transitions from COMMUNICATIONS-INTERRUPTED state to PARTNER-DOWN state. You should only enable a safe period if, in the event of a server failure, it is more important to get an IP address than to risk receiving a duplicate address.

When the servers are in COMMUNICATIONS-INTERRUPTED state, neither server can function long term. This state exists to allow the servers to easily survive transient network communications failures of a few minutes to a few days. Note that the actual time period a server can function effectively in COMMUNICATIONS-INTERRUPTED state depends on the DHCP activity of the network in terms of arrival and departure of DHCP clients on the network.

If both servers are still operating, but cannot communicate, you have no choice except to leave them in COMMUNICATIONS-INTERRUPTED state. In most situations, however, when one server is down for an extended period and the operational server can no longer function effectively in COMMUNICATIONS-INTERRUPTED state, it must be moved into PARTNER-DOWN state.

There are two ways that a server can move into this state:

  • Through an administrative command—There is no difficulty moving into PARTNER-DOWN state, since it is an accurate reflection of reality and the protocol was designed to operate correctly.

  • When the safe period expires—When the servers are running unattended for extended periods and require an automatic way to enter PARTNER-DOWN state.

Configuring the safe period entails some risk, because it allows one server to enter PARTNER-DOWN state when the other server may not be down. If this should occur, duplicate IP addresses could be allocated.

The purpose of the safe period is to allow network operations staff some time to react to a server moving into COMMUNICATIONS-INTERRUPTED state. During the safe period, the only requirement is that the network operations staff determine if both servers are still running—and if they are, to either fix the network communications failure, or to take one of the servers down before the expiration of the safe period.

The length of the safe period is installation specific, and depends in large part on the number of unallocated IP addresses within the subnet address pool and the expected frequency of arrival of previously unknown DHCP clients requiring IP addresses. Many environments should be able to support safe periods of several days.

During this safe period, either server allows renewals from any existing client. The only limitation is the need for IP addresses for the DHCP server to hand out to new DHCP clients and the must reallocate IP addresses to different DHCP clients.

The number of extra IP addresses required is equal to the expected total number of new DHCP clients encountered during the safe period. This is dependent on the arrival rate of new DHCP clients, not on the total number of outstanding leases on IP addresses.

Even if you can only afford a short safe period, because of a dearth of IP addresses or a very high arrival rate of new DHCP clients, then substantial benefit is provided by allowing the DHCP subsystem to ride through minor problems that can be fixed within an hour. In such cases, there is no possibility that duplicate IP address allocation exists, and re-integration after the failure is solved will be automatic and require no operator intervention.

Setting the PARTNER-DOWN State

If your corporate policy is to have minimal manual intervention, issue the dhcp enable failover-use-safe-periodcommand. In the absence of explicit actions after the failover safe period expires, the responding server goes into PARTNER-DOWN state on its own. You cannot get duplicate address allocation if the partner is down (unless both servers go into PARTNER-DOWN state).

If your corporate policy is to avoid conflict under any circumstances, then you should never let the backup server go into PARTNER-DOWN state unless by explicit command. Allocate sufficient IP addresses to the backup server so that it can handle the arrival of new clients during periods when there is no administrative coverage. It is fairly simple to determine if the main server is down and to inform the backup server explicitly.

To manually set the PARTNER-DOWN state:


Step 1   Make sure the main server is truly down.

Step 2   Use the dhcp set failover-safe-period command to set the failover safe period to 24 hours:

nrcmd> dhcp set failover-safe-period=24 
 

Step 3   Use the server dhcp setPartnerDown command to inform the backup that the main server is down, thereby giving it the time the main server ceased to operate.

After you issue this command, there is no possibility of duplicate IP address allocation.


Allocating Addresses Among Servers

To keep both your main and backup servers operating in spite of a network partition (in which both servers can communicate with clients but not with each other), you must allocate more IP addresses than you need to run a single server. How do you determine how many additional addresses you need?

You must configure the main server to allocate a percentage of the currently available addresses in each scope's address pool to the backup server. These addresses are then unavailable to the main server. The backup server uses these addresses when it is running, but cannot talk to the main server and does not know if it is down.

There is no single percentage answer for all environments. It depends on the arrival rate of new DHCP clients and the reaction time of your network administration staff. The backup server needs enough addresses from each scope to satisfy the requests of all new DHCP clients that arrive during the period in which the backup does not know if the main server is down.

Even during the Partner Down regime, the backup server waits for the MCLT (see the "Changing the Maximum Client Lead Time" section) and lease time to expire before reallocating any leases. When these times expire, the backup server does the following:

  • Offers its leases from its private pool of addresses

  • Offers leases from the main server's pool of addresses

  • Offers expired leases to new clients

During the day, if the administrative staff can respond within two hours to the COMMUNICATIONS-INTERRUPTED state and determines whether the main server is working, then the backup server needs enough addresses to support a reasonable upper bound on the number of new DHCP clients that might arrive during those two hours.

During off-hours, if the administrative staff can respond within 12 hours to the same situation, and considering that the arrival rate of previously unheard from DHCP clients is also less, then the backup server needs enough addresses to support a reasonable upper bound on the number of DHCP clients that might arrive during those 12 hours.

Consequently, the number of addresses over which the backup requires sole control would be the greater of the two numbers, and would ultimately be expressed as a percentage of the currently available (unreserved) addresses in each scope.

If you use client-classes, remember that some clients can only use some sets of scopes and not others.


Note   During failover, clients can sometimes obtain leases whose expiration times are shorter than the amount configured. This is a normal part of keeping the server partners synchronized. Typically this happens only for the first lease period, or during the COMMUNICATIONS-INTERRUPTED state.

Importing Backup Server Leases to the Main Server

One way to bring the backup server's lease information to the main server is to import it.


Step 1   Stop the backup server.

nrcmd> server DHCP stop 
 

Step 2   Enable import mode on the main server.

nrcmd> dhcp enable import-mode 
 

Step 3   Reload the main server.

nrcmd> server DHCP reload 
 

Step 4   Import the leases into the main server.

nrcmd> import leases leasefile.txt 
 

Step 5   Disable import mode on the main server.

nrcmd> dhcp disable import-mode 
 

Step 6   Reload the main server.

nrcmd> server DHCP reload 
 

Step 7   Start the backup server.

nrcmd> server DHCP start 
 

Exporting Leases

There are two ways that you can export lease information:

De-activating Leases in Failover

When DHCP safe failover is in use and you de-activate a lease with either the GUI or the CLI, it is de-activated only in the cluster to which you are connected. If you wish the lease to be de-activated on both the main and backup servers, you must connect to and de-activate the lease in each server's cluster.

Changing Failover Server Roles


Caution   Be careful when you change the role of a failover server. Remember that all state information about IP addresses in a scope is lost from a server if it is ever reloaded without that scope in its configuration.

Making a Nonfailover Server a Failover Main

This is a way to update an existing installation and increase the availability of the DHCP service it offers.


Note   You can use this procedure only if the original server never participated in failover.


Step 1   Install Network Registrar on the original server and ensure that it operates correctly after the installation.

Step 2   Install Network Registrar on the machine that is to be the backup server. Note the machine's DNS name.

Step 3   Enable failover on the original server. Use the DNS name of the recently installed backup server. (See the "Configuring the Basic Scenario" section.)

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-backup-server=backupserver.example.com 
 

Step 4   Reload the main server. It should go into the PARTNER-DOWN state and stay there, because it cannot locate the backup server (since the backup server is not yet configured). There should be no change in the operation of the main server at this point.

nrcmd> server dhcp reload 
 

Step 5   Duplicate the main server's configuration on the backup server, including scopes (also secondary scopes), policies, and client-class. If you are using client-class, make sure that the clients are entered into each cluster or that each server can access an LDAP database with the client information.

Step 6   Enable failover on the backup server.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-main-server=mainserver.example.com 
 

Step 7   Reconfigure all of the operational BOOTP relays to forward broadcast DHCP packets to both the main and the backup server.

Step 8   Reload the backup server.

nrcmd> server dhcp reload 
 

After you complete these steps:

1. The backup server detects the main server and moves into RECOVER state.

2. The backup server refreshes its stable storage with the lease information that resides on the main server, and when complete, the backup server moves into RECOVER-DONE state when the maximum client lead time (MCLT) is reached.

3. The main server moves into NORMAL state.

4. The backup server moves into NORMAL state.

5. The backup server uses a pool request to ask the main server for addresses to allocate if communications become interrupted.

6. After these addresses are allocated, the main server sends this information to the backup server.

Replacing a Partner With Defective Storage

If a failover server loses its stable storage (hard disk), it is possible to replace the server and have it recover its state information from its partner.


Step 1   Determine which server lost its stable storage.

Step 2   Use the server dhcp setPartnerDown command to tell the other server that its partner is down. If you do not specify a time, Network Registrar uses the current time.

nrcmd> server dhcp setPartnerDown backupserver.example.com FEB 02 
13:10 2001 
 

Step 3   When the replacement server is operational, re-install Network Registrar.

Step 4   Duplicate the configuration from the original server to the replacement server.

Step 5   On the replacement server, set the failover recovery time to approximately when the server failed.

nrcmd> dhcp set failover-recover "FEB 02 13:20 2001" 
 

Step 6   Reload the replacement server.

nrcmd> server dhcp reload 
 

After you complete these steps:

1. The replacement server moves into RECOVER state.

2. The other server sends the replacement server all its information.

3. The replacement server moves into RECOVER-DONE state when the MCLT has been reached.

4. The other server moves into NORMAL state.

5. The replacement server moves into NORMAL state. The replacement server requests addresses, but few new ones are allocated, because the addresses previously allocated were sent in step 2.


Note   After the replacement server is in NORMAL state, you must specify the dhcp set failover-recover 0 command and reload the replacement server.

Removing a Backup Server and Halting Failover Operation

There are times when you might need to remove the backup server and halt all failover operations.


Step 1   On the backup server, remove all the scopes that were designated as backup to the main server.

nrcmd> scope scope1 delete 
nrcmd> scope scope2 delete 
...
 

Step 2   On the main server, remove the failover capability from those scopes that were main for the backup server (or disable failover server-wide if that is how it was configured).

nrcmd> scope scope1 set failover=scope-disabled 
nrcmd> dhcp disable failover 
 

Step 3   Reload both servers.

nrcmd> server dhcp reload 
 

Adding a Main Server to an Existing Backup Server

You may want to use an existing backup server for a main server.


Step 1   Duplicate the main server's scopes, policies, and other configurations on the backup server.

Step 2   Configure the main to enable failover and point to the backup.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-backup-server=backupserver.example.com 
 

Step 3   Configure the backup to enable failover for the new scopes that point to the new main.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-main-server=mainserver.example.com 
 

Step 4   Reload both servers.

nrcmd> server dhcp reload 
 

Network Registrar performs the same steps as those described in the "Making a Nonfailover Server a Failover Main" section.

Handling Special Cases

There are special failover configuration cases explained in the following sections:

Configuring Failover on Multiple Interface Hosts

If you plan to use failover on a server host with multiple interfaces, you must explicitly configure the local server's name or IP address. This requires an additional command. For example, if you have a host with two interfaces, serverA and serverB, and you want to make serverA the a main server in a failover situation, you have to define serverA as the failover-main-server before you set the backup server name (external serverB). If you do not do this, failover might not initialize correctly and try to use the wrong interface.

nrcmd> dhcp set failover-main-server=serverA 
nrcmd> dhcp set failover-backup-server=serverC 

Note   With multiple interfaces on one host, you must specify a host name that points to only one address or A record. You cannot set up your servers for round-robin support.

Changing the Maximum Client Lead Time

The maximum client lead time (MCLT) controls how much ahead of the backup server's lease expiration the client's lease expiration can be. Generally, if you enabled failover on your DHCP server, you should not change the failover-maximum-client-lead-time option.

If you must change the MCLT, do the following:


Step 1   Reload the backup server to ensure that all information that the backup server has to tell the main server is up to date on the main server. Ideally you should wait until both partners are stabilized (NORMAL) and any updates were exchanged. At least wait until the backup server completed its cache update process as noted in the log file.
nrcmd> server DHCP reload 
 

Step 2   Change the MCLT on the main server. (Any MCLT configured on the backup server is ignored by the backup server since the backup derives its MCLT value from directly from the main.)

nrcmd> dhcp set failover-maximum-client-lead-time=4800 
 

Step 3   Stop the backup server.

nrcmd> server DHCP stop 
 

Step 4   Reload the main server.

nrcmd> server DHCP reload 
 

Step 5   Start the backup server.

nrcmd> server DHCP start 
 

Changing System Defaults

You can change some of the system defaults, such as the number of leases that the main server should send to the backup server, or the MCLT (see the "Changing the Maximum Client Lead Time" section). Whenever you change the system defaults, you need to change them on both servers. On each server:

    nrcmd> dhcp set failover-poll-interval=14 
     
    

Supporting BOOTP Clients

You can configure scopes to support two types of BOOTP clients—static and dynamic.

Static BOOTP

Static BOOTP clients are supported using DHCP reservations. When you enable failover, remember to configure both the main and the backup server with identical reservations.

Dynamic BOOTP

Dynamic BOOTP clients are supported through the scope enable dynamic-bootp command. When you use failover, however, there are additional restrictions placed on the address usage in such scopes, because BOOTP clients are allocated IP addresses permanently and receive leases that never expire.

When a server whose scope does not have the dynamic-bootp option enabled goes to PARTNER-DOWN, it can allocate any available IP address from that scope, whether or not it was initially available to any partner. However, when the dynamic-bootp option is set, each partner can only allocate its own addresses. Consequently scopes that enable the dynamic-bootp option require more addresses to support failover.

When using dynamic BOOTP:

Configuring BOOTP Relays

Network Registrar's failover protocol works with all existing BOOTP relay (also called IP helper) implementations. BOOTP relay is a router capability that supports DHCP clients that are not locally connected to a DHCP server. For details about configuring routers, see the "Configuring a BOOTP Relay Router" section of Chapter 7.

If you are using BOOTP Relay, ensure that any BOOTP relay implementations point at both the main and backup servers. If they do not and the main fails, DHCP clients will not be serviced, because the backup server cannot see the required packets.

If you cannot configure the BOOTP Relay implementation to forward DHCP broadcast packets to two different DHCP servers, configure the router to forward the packets to a subnet-local broadcast address for a LAN segment, which could contain both the main and backup DHCP servers. Then, ensure that both the main and backup DHCP servers are resident on the same LAN segment.

Troubleshooting Failover

This section describes how to avoid configuration mistakes, monitor failover operations, and detect and handle network failures.

Monitoring Failover Operations

You can examine the DHCP server log files after you reload to verify your failover configuration. To change the amount of information reported in the log files pertaining to failover operation, do the following:

Using the GUI:

Step 1   Double-click the DHCP server to open its properties.

Step 2   In the DCHP Server Properties dialog box, click the Advanced tab.

Step 3   Click the Debug Settings button.

Step 4   Select the Enable Debug check box.

Step 5   In the Category field, enter Y = 1 (or 2, 3, or 4, depending on how detailed you want the messages to be).

Step 6   Leave Output set to MLOG.

Step 7   Click Apply.

Step 8   Click OK.

Step 9   Reload the server.


Using the CLI:
Reload the DHCP server for the changes to the log setting to take effect.

To make sure previous messages do not get overwritten, append the failover-detail parameter to the end of the dhcp set log-settings command.

    nrcmd> dhcp set log-settings=default,incoming-packets,missing-options,failover-detail 
     
    
  • Use server dhcp setDebug Y=1 to cause the failover-detail messages to take effect immediately.
  • Use server dhcp setDebug Y=2 to log more detailed failover messages.

  • Use server dhcp setDebug Y=3 to cause all messages logged for Y=1 or Y=2 to appear, as well as a formatted dump of each nonpoll failover packet.

  • Use server dhcp setDebug Y=4 to log all the poll messages, and generate logs of log messages (causing the log files to roll and log information to be quickly lost).

The settings Y=1, Y=2, and the log-settings failover-detail property do not load the server or add too much information to the log files. You can leave them on for all but the most heavily loaded servers.

Both Y=3 or Y=4 debug settings slow the server considerably, and should be used only when absolutely necessary. In particular, the Y=4 setting produces a large amount of output and should be used only when you need to verify that the server is communicating at the most basic level.

Detecting and Handling Network Failures

How do you know that the servers are not communicating or that one server is down? It could be a network failure or a network partition, which means that the servers can communicate, but just not with each other. In this case, however, one server could really be down. Table 11-5 describes some symptoms, causes, and solutions for failover problems.


Table 11-5: Detecting and Handling Failures
Symptom Cause Solution

New clients cannot get addresses

A backup server is in COMMUNICATIONS-INTERRUPTED with too few addresses

Increase the backup percentage on the main server

Error messages about mismatched scopes

Mismatched scope configurations between partners

Reconfigure your servers

Log messages about failure to communicate with partner

Server cannot communicate with its partner

Main server fails. Some clients cannot renew or rebind leases. The leases expire even when the backup server is up and possibly processing some client requests.

Some BOOTP relay (ip-helper) was not configured to point at both servers (see the "Configuring BOOTP Relays" section)

  • Reconfigure BOOTP relays to point at both main and backup server

  • Run a fire drill test: take the main server down for a day or so and see if your user community can get and renew leases

SNMP trap: other server not responding

Server cannot communicate with its partner

SNMP trap: dhcp failover config mismatch

Mismatched scope configurations between partners

Reconfigure your servers

Users complain that they cannot use services or system as expected

Mismatched policies and client-classes between partners

Reconfigure partners to have identical policies; possibly use LDAP for client registration if currently registering clients directly in partners