Cisco CNS Network Registrar User's Guide, 6.0
Configuring DHCP Failover
Downloads: This chapterpdf (PDF - 395.0KB) The complete bookPDF (PDF - 7.06MB) | Feedback

Configuring DHCP Failover

Table Of Contents

Configuring DHCP Failover

Failover Configuration Procedure

Configuring Three Types of Failover

Configuring the Basic Scenario

Configuring the Back Office Scenario

Configuring the Symmetrical Scenario

Scope Failover Attribute States

Avoiding Configuration Mistakes

Maintaining Failover Servers

Monitoring Failover Server Status

Changing General DHCP Configurations

Using the Web UI to Configure Failover Partners

Handling State Transitions

Moving a Server into PARTNER-DOWN State

State Transitions During Integration

Allocating Addresses Among Servers

Importing Backup Server Leases to the Main Server

Exporting Lease Information

Specifying Clusters for the export addresses Command

Database Output Format of the export addresses Command

Text File Output Format of the export addresses Command

Addresses Reported by the export addresses Command

Error Reports for the export addresses Command

Changing Failover Server Roles

Making a Nonfailover Server a Failover Main

Replacing a Server Having Defective Storage

Removing a Backup Server and Halting Failover Operation

Adding a Main Server to an Existing Backup Server

Configuring Failover on Multiple Interface Hosts

Maximum Client Lead Time and Lease Period Factor

Setting DHCP Request and Response Packet Buffers

Changing System Defaults

Supporting BOOTP Clients

Static BOOTP

Dynamic BOOTP

Configuring BOOTP Relays

DHCPLEASEQUERY and Failover

Troubleshooting Failover

Monitoring Failover Operations

Detecting and Handling Network Failures


Configuring DHCP Failover


You can use DHCP failover to configure two DHCP servers to operate as a redundant pair. If one server is down, the other server seamlessly takes over so that new DHCP clients can get addresses and existing ones can renew them. Clients requesting new leases need not know or care about which server is responding to their request for a lease. These clients can obtain leases even if the main server is down.

Table 11-1 lists the topics in this chapter and their associated sections. The Network Registrar Web UI Guide explains how to accomplish these tasks using the Web-based interface.

Table 11-1 DHCP Failover Configuration Topics 

If you want to...
See...

Know more about how DHCP failover works

"DHCP Failover" section

Have an overview of the failover configuration steps

"Failover Configuration Procedure" section

Configure failover based on the three main server scenarios or deployments

"Configuring Three Types of Failover" section

Determine and handle changes in failover states

"Handling State Transitions" section

Allocate addresses and leases among the failover partners

"Allocating Addresses Among Servers" section

Import backup server leases to the main server

"Importing Backup Server Leases to the Main Server" section

Export lease information

"Exporting Lease Information" section

De-activate leases

"Changing Failover Server Roles" section

Change failover server roles

"Changing Failover Server Roles" section

Configure failover on multiple interfaces

"Configuring Failover on Multiple Interface Hosts" section

Configure the failover maximum client lead time (MCLT) and lease period factor

"Maximum Client Lead Time and Lease Period Factor" section

Set the DHCP request and response packet buffers for failover

"Setting DHCP Request and Response Packet Buffers" section

Change system defaults for failover

"Changing System Defaults" section

Support BOOT clients in failover

"Supporting BOOTP Clients" section

Configure BOOTP relay agents for failover

"Configuring BOOTP Relays" section

Configure DHCPLEASEQUERY for failover

"DHCPLEASEQUERY and Failover" section

Troubleshoot failover

"Troubleshooting Failover" section


Failover Configuration Procedure

Use this procedure to configure DHCP failover.


Step 1 Configure your DHCP server and duplicate the configuration on its partner, based on the configuration scenario you selected:

a. Duplicate the configurations for scopes, policies, DHCP options, and addresses on the partner server. Do this manually, or use the Web UI. See the Network Registrar Web UI Guide.

b. If you enable dynamic DNS update on the main server, ensure that you also enable it on the partner. For help on doing this, see "Configuring Dynamic DNS Update."

c. If you use reservations, ensure that they are identical on each server. For help on doing this, see the "Reserving a Lease" section.

d. If you use client-classes, configure both servers with identical client-classes. For help on doing this, see the "Defining Client-Classes and Their Properties" section.

e. Give the scopes the same set of scope selection tags. For help on doing this, see the "Setting Client-Class Scope Selection Criteria" section.

f. Enter the clients in both clusters, or enter them in LDAP and direct both servers to the same LDAP server. For help on doing this, see "Configuring LDAP."

Step 2 If you use BOOTP relay (IP helpers), configure all BOOTP relay servers to point to both servers—See the "Changing System Defaults" section.

Step 3 Reload both servers.


Configuring Three Types of Failover

There are three basic failover scenarios, as described in the following subsections:

Basic failover—A main and a backup server

Back office failover—Two mains and one backup server

Symmetrical failover—Two servers that act as main and backups for each other.

If you plan to configure failover on a server with multiple interfaces, see the "Configuring Failover on Multiple Interface Hosts" section.

Configuring the Basic Scenario

The basic failover scenario involves a main server and a single backup server (Figure 11-1).

Figure 11-1 Simple Failover Configuration

Using the Web UI

You can automate the basic scenario configuration using the Network Registrar Web UI (see the Network Registrar Web UI Guide).

Using a CLI Command File

To set up a basic failover configuration, create and run the same command file on both servers:

dhcp enable failover 
dhcp set failover-main-server=dhcpA.example.com. failover-backup-server=dhcpB.example.com. 
dhcp reload 


Note The server that you last reload might return an error that failover is not available. You can safely ignore this message on startup.


Configuring the Back Office Scenario

The back office failover scenario involves two (or more) main servers and a single backup server (Figure 11-2). The main servers are Aserver and Bserver and the backup server is Cserver.

The main servers have three scopes each—scope1, scope2, and scope3 on one; and scope4, scope5, and scope6 on the other. This scenario is appropriate for scopes on the same LAN segment, which requires the same main and backup server combination. The two sets of scopes are on different LAN segments.

Figure 11-2 Back Office Failover Configuration

Using the Web UI

You can automate the back office scenario configuration using the Network Registrar Web UI (see the Network Registrar Web UI Guide).


Note Do not use the Exact synchronization operation with a back office failover scenario.


Using a CLI Command File

To set up a back office failover configuration, create and run the following command file on Cserver. Run configuration files on Aserver and Bserver with only their appropriate scope data.

scope scope1 set failover=scope-enabled failover-main-server=Aserver.example.com. 
scope scope1 set failover-backup-server=Cserver.example.com. 
scope scope2 set failover=scope-enabled failover-main-server=Aserver.example.com. 
scope scope2 set failover-backup-server=Cserver.example.com. 
scope scope3 set failover=scope-enabled failover-main-server=Aserver.example.com. 
scope scope3 set failover-backup-server=Cserver.example.com. 
scope scope4 set failover=scope-enabled failover-main-server=Bserver.example.com. 
scope scope4 set failover-backup-server=Cserver.example.com. 
scope scope5 set failover=scope-enabled failover-main-server=Bserver.example.com. 
scope scope5 set failover-backup-server=Cserver.example.com. 
scope scope6 set failover=scope-enabled failover-main-server=Bserver.example.com. 
scope scope6 set failover-backup-server=Cserver.example.com. 
dhcp reload 

Configuring the Symmetrical Scenario

The symmetrical failover scenario involves multiple (in this case, two) servers that share network responsibilities by acting as backups for each other based on certain scopes (Figure 11-3). Aserver is the main for scopes 1 through 3 and the backup for scopes 4 through 6. Bserver plays the reverse role.

Figure 11-3 Symmetrical Failover Configuration

Using the Web UI

The symmetrical scenario is configurable using the Network Registrar Web UI (see the Network Registrar Web UI Guide).

Using a CLI Command File

To set up a symmetrical failover configuration, create and run the same command file on both servers.

scope scope1 set failover=scope-enabled failover-main-server=Aserver.example.com. 
scope scope1 set failover-backup-server=Bserver.example.com. 
scope scope2 set failover=scope-enabled failover-main-server=Aserver.example.com. 
scope scope2 set failover-backup-server=Bserver.example.com. 
scope scope3 set failover=scope-enabled failover-main-server=Aserver.example.com. 
scope scope3 set failover-backup-server=Bserver.example.com. 
scope scope4 set failover=scope-enabled failover-main-server=Bserver.example.com. 
scope scope4 set failover-backup-server=Aserver.example.com. 
scope scope5 set failover=scope-enabled failover-main-server=Bserver.example.com. 
scope scope5 set failover-backup-server=Aserver.example.com. 
scope scope6 set failover=scope-enabled failover-main-server=Bserver.example.com. 
scope scope6 set failover-backup-server=Aserver.example.com. 
dhcp reload 

Scope Failover Attribute States

The scope failover attribute has three possible states:

scope-enabled—Indicates that this scope, and all scopes that are secondary to it or to which it is a secondary on this LAN segment, are enabled for failover. Scope parameters (not server parameters) should determine the main and backup servers.

If more than one scope on the same LAN segment is scope-enabled for failover, then the main and backup servers must be identical for each. An error occurs if one scope on a LAN segment is scope-enabled and another is scope-disabled, unless the other scope has failover enabled server-wide.

scope-disabled—Disables a scope and all other scopes associated with it on a LAN segment from participating in failover. It only has meaning if failover is defined server-wide.

use-server-settings—Indicates that this scope should use the settings for main and backup servers unless another scope associated with it on a LAN segment is either explicitly scope-enabled or scope-disabled. If one scope on a LAN segment is scope-enabled or scope-disabled, it overrides any scope for which use-server-settings is set on that LAN segment. Note that if you set the scope attribute failover-back-percentage explicitly, Network Registrar uses it, even if you set the use-server-settings attribute.

Avoiding Configuration Mistakes

Follow these explicit steps to avoid DHCP failover mistakes:

1. Enable failover on both servers.

2. Configure the main and backup servers with exactly the same scopes.

3. Reconfigure all of your BOOTP relays to send DHCP packets to both mains and backups.

4. Provide the partner server with enough addresses to service clients if the main server fails.

Network Registrar detects and logs the configuration mistakes during processing if the first two steps are not met, although it may detect a mistake some time after the actual misconfiguration occurred.

Network Registrar cannot ensure that the third and fourth steps are taken. You can only detect BOOTP configuration errors by performing live tests in which you periodically take the main server out of service to verify that the backup server is available to DHCP clients. As far as addresses are concerned, ensure that both partners are configured with a wide enough range of addresses so that the backup server can provide leases while the main server is down for a reasonable amount of time.

Maintaining Failover Servers

After you configure failover, verify that failover is running correctly:

In the Web UI, click DHCP on the Primary Navigation bar and Failover on the Secondary Navigation bar, then see if there is a main and backup server relationship set up on the List DHCP Failover Pairs page.

In the GUI, click the DHCP server icon in the Server Manager window, then right-click Show related servers in the popup menu or choose it from the Servers menu.

Look at the log files and run a Related Servers report. See the "Displaying Related DHCP Servers" section.

When looking at the server log files, there are only two possible roles a server can play, as a main or backup to another server. There is a failover object in the DHCP server for each of these roles, and the object name directly reflects its role.

When the DHCP server configures itself, it logs every network that is part of each failover object. It also reports the configuration parameters, such as failover-maximum-client-lead-time, failover-backup-percentage, and failover-use-safe-period.

Monitoring Failover Server Status

Run the dhcp getRelatedServers command to create a report about the connection status of the main and backup servers. The command displays the information shown in Table 11-2.

Table 11-2 getRelatedServers Report 

Column
Description

Type

Main, Backup, DNS, or LDAP.

Name

DNS host name.

Address

IP address in dotted octet format.

Requests

Number of outstanding requests, or the failover recovery or DNS update status. In the failover RECOVER state, the column shows the Percent of Failover Recovery yet-to-complete value, starting with 100 at the beginning of the recovery and decreasing to zero, when the partners are again synchronized. If the server is in the failover NORMAL state, you can use the dhcp set log-settings=failover-detail command to enable showing the Percent of Failover Bind-Update yet-to-complete value (the percent of configured leases not yet scanned) in this column.

Communications

OK or INTERRUPTED.

Localhost State

Failover state of this server, or two dashes (--) if not applicable.

Partner State

Failover state of the associated failover server, or two dashes (--) if not applicable.


Changing General DHCP Configurations

If you change any of these main DHCP server configurations, you must duplicate them to the backup:

Scopes

Policies

Clients

IP addresses

Reservations

Client-classes

Dynamic DNS updates

Dynamic BOOTP

Namespaces


Tip Consider maintaining your entire DHCP server configuration in CLI command files and always make any changes to those files.


Using the Web UI to Configure Failover Partners

The Web UI has a convenient way of synchronizing failover partners. For details, see the Network Registrar Web UI Guide, the "DHCP Administration" chapter, for details. Synchronizing partners in this way recreates the relevant scope configurations set on the main server on the secondary server. You should make configuration changes on the main server only before you synchronize the partners. Use these basic steps:

1. Change the main server or scope configuration to enable failover and set the main and backup addresses.

2. Reload the main server.

3. Push the configuration to the backup server by synchronizing the pair.

4. Reload the backup server.

Handling State Transitions

During normal operation, the failover partners transition between states. They stay in their current state until all the actions for the state transition are completed and, if communication fails, until the conditions for the next state are fulfilled. For a review of the failover states, see the "Failover States and Transitions" section.

Moving a Server into PARTNER-DOWN State

One or both failover partners could potentially move into COMMUNICATIONS-INTERRUPTED state. Fortunately, they cannot issue duplicate addresses while in this state. However, having a server in this state over longer periods is not a good idea, because there are restrictions on what a server can do. The main server cannot re-allocate expired leases and the backup server can run out of addresses from its pool. COMMUNICATIONS-INTERRUPTED state was designed for servers to easily survive transient communication failures of a few minutes to a few days. A server might function effectively in this state for only a short time, depending on the client arrival and departure rate. After that, it would be better to move a server into PARTNER-DOWN state so it can completely take over the lease functions until the servers resynchronize.

There are two ways a server can move into PARTNER-DOWN state:

User action—An administrator sets a server into PARTNER-DOWN state based on an accurate assessment of reality. The failover protocol handles this correctly.

The failover safe period expires—When the servers run unattended for longer periods, they need an automatic way to enter PARTNER-DOWN state.

Network operators might not sense in time that a server is down or uncommunicative. Hence, the failover safe period, which provides network operators some time to react to a server moving into COMMUNICATIONS-INTERRUPTED state. During the safe period, the only requirement is that the operators determine that both servers are still running and, if so, fix the network communications failure or take one of the servers down before the safe period expires.

During this safe period, either server allows renewals from any existing client, but there is a major risk of possibly issuing duplicate addresses. This is because one server can suddenly enter PARTNER-
DOWN state while the other is still operating. Because of this risk, the failover safe period is disabled by default. That is why it is best to enable the safe period only if, during a server failure, it is more important to get an address than risk receiving a duplicate one.

The length of the safe period is installation-specific, and depends on the number of unallocated addresses in the pool and the expected arrival rate of previously unknown clients requiring addresses. The safe period is typically 24 hours, although many environments can support periods of several days.

The number of extra addresses required for the safe period should be the same as the expected total of new clients a server encounters. This depends on the arrival rate of new clients, not the total outstanding leases. Even if you can only afford a short safe period, because of a dearth of addresses or a high arrival rate of new clients, you can benefit substantially by allowing DHCP to ride through minor problems that are fixable in an hour. There is minimum chance of duplicate address allocation, and re-integration after the solved failure is automatic and requires no operator intervention.

Here are some guidelines to follow to decide in using manual intervention or the safe period for transitioning to PARTNER-DOWN state:

If your corporate policy is to have minimal manual intervention, set the safe period. Use the dhcp enable failover-use-safe-period command to enable the safe period and use the dhcp set failover-safe-period command to set the duration (86400 seconds, or 24 hours, by default).

nrcmd> dhcp enable failover-use-safe-period 
nrcmd> dhcp set failover-safe-period=24h 
nrcmd> dhcp reload 

If your corporate policy is to avoid conflict under any circumstances, then never let the backup server go into PARTNER-DOWN state unless by explicit command. Allocate sufficient addresses to the backup server so that it can handle new client arrivals during periods when there is no administrative coverage. Use the dhcp setPartnerDown command, specifying the name of the partner server. This moves all the scopes running failover with the partner into PARTNER-DOWN state immediately, unless you specify a date and time with the command. This date and time should be when the partner was last known to be operational.

There are two conventions for specifying the date:

-num unit (a time in the past), where num is a decimal number and unit is s, m, h, d, or w for seconds, minutes, hours, days or weeks respectively. For example, specify -3d for three days.

Month (name or its first three letters), day, hour (24-hour convention), year (fully specified year or last two digits). This example notifies the backup server that its main server went down at 12 midnight on October 31, 2002:

nrcmd> server dhcp setPartnerDown dhcp2.example.com. -3d 
nrcmd> server dhcp setPartnerDown dhcp2.example.com. Oct 31 00:00:00 2001 
nrcmd> dhcp reload 


Note Wherever you specify a date and time in the CLI, enter the time that is local to the nrcmd process. If the server is running in a different time zone than this process, disregard the time zone where the server is running and use local time instead.


State Transitions During Integration

Table 11-3 describes what happens when servers enter various states and how they initially integrate and later re-integrate with each other under certain conditions.

Table 11-3 Failover State Transitions and Integration Processes 

Integration
Results

Into NORMAL state

1. The newly configured backup server contacts the main server, which starts in PARTNER-DOWN state.

2. Because the backup server is a new partner, it goes into RECOVER state and sends a Binding Request message to the main server.

3. The main server replies with Binding Update messages that include the leases in its lease state database.

4. After the backup server acknowledges these messages, the main server responds with a Binding Complete message.

5. The backup server goes into RECOVER-DONE state.

6. Both servers go into NORMAL state.

7. The backup server sends Pool Request messages.

8. The main server responds with the leases to allocate to the backup server based on the failover-backup-percentage configured.

After COMMUNICATIONS-
INTERRUPTED state

1. When a server comes back up and connects with a partner in this state, the returning server moves into the same state and then immediately into NORMAL state.

2. The partner also moves into NORMAL state.

After PARTNER-DOWN
state

When a server comes back up and connects with a partner in this state, the server compares the time it went down with the time the partner went into this state.

If the server finds that it went down and the partner subsequently went into this state:

a. The returning server moves into RECOVER state and sends an Update Request message to the partner.

b. The partner returns all the binding data it was unable to send earlier and follows up with an Update Done message.

c. The returning server moves into RECOVER-DONE state.

d. Both servers move into NORMAL state.

If the returning server finds that it was still operating when the partner went into PARTNER-DOWN state:

a. The server goes into POTENTIAL-CONFLICT state, which also causes the partner to go into this state.

b. The main server sends an update request to the backup server.

c. The backup server responds with all unacknowledged updates to the main server and finishes off with an Update Done message.

d. The main server moves into NORMAL state.

e. The backup server sends the main server an Update Request message requesting all unacknowledged updates.

 

f. The main server sends these updates and finishes off with an Update Done message.

g. The backup server goes into NORMAL state.

After the server loses
its lease state database

A returning server usually retains its lease state database. However, it can also lose it because of a catastrophic failure or intentional removal.

1. When a server with a missing lease database returns with a partner that is in PARTNER-DOWN or COMMUNICATIONS-INTERRUPTED state, the server determines whether the partner ever communicated with it. If not, it assumes to have lost its database, moves into RECOVER state, and sends an Update Request All message to its partner.

2. The partner responds with binding data about every lease in its database and follows up with an Update Done message.

 

3. The returning server waits the maximum client lead time (MCLT) period, typically one hour, and moves into RECOVER-DONE state. For details on the MCLT, see the "Maximum Client Lead Time and Lease Period Factor" section.

4. Both servers then move into NORMAL state.

After a lease state database
backup restoration

When a returning server has its lease state database restored from backup, and if it reconnects with its partner without additional data, it only requests lease binding data that it has not yet seen. This data may be different from what it expects.

1. In this case, you must configure the returning server with the failover-recover attribute set to the time the backup occurred.

2. The server moves into RECOVER state and requests all its partner's data. The server waits the MCLT period, typically one hour, from when the backup occurred and goes into RECOVER-DONE state. For details on the MCLT, see the "Maximum Client Lead Time and Lease Period Factor" section.

3. Once the server returns to NORMAL state, you must unset its failover-recover attribute, or set it to zero.

nrcmd> dhcp set failover-recover=0 

After the operational server
had failover disabled

If the operating server had failover enabled, disabled, and subsequently re-enabled, you must use special considerations when bringing a newly configured backup server into play. The backup server must have no lease state data and must have the failover-recover attribute set to the current time minus the MCLT interval, typically one hour. For details on the MCLT, see the "Maximum Client Lead Time and Lease Period Factor" section.

1. The backup server then knows to request all the lease state data from the main server. Unlike what is described in "After the server loses its lease state database" section of this table, the backup server cannot request this data automatically because it has no record of having ever communicated with the main server.

2. After reconnecting, the backup server goes into RECOVER state, requests all the main server's lease data, and goes into RECOVER-DONE state.

3. Both servers go into NORMAL state. At this point, you must unset the backup server's failover-recover attribute, or set it to zero.

nrcmd> dhcp set failover-recover=0 


Allocating Addresses Among Servers

To keep failover partners operating despite a network partition (when both servers can communicate with clients, but not with each other), allocate more addresses than for a single server. Configure the main server to allocate a percentage of the currently available addresses in each scope to the backup server. This makes these addresses unavailable to the main server. The backup server uses these addresses when it cannot talk to the main server and cannot tell if it is down.


Note If a Network Registrar 6.0 failover server receives an update from a Network Registrar DHCP server running prior to Network Registrar 6.0, the unavailable leases do not have a timeout value. In this case, the Network Registrar 6.0 server uses the unavailable-timeout value configured in the scope policy or system_default_policy policy as the timeout for the unavailable lease. When the lease times out, the policy causes the lease to transition to available in both failover partners.


Using the CLI

You can set the percentage of currently available addresses in each scope using the dhcp set failover-backup-percentage and scope set failover-backup-percentage commands. Note that setting the backup percentage on the server level sets the value for all scopes not set with that attribute. However, if set at the scope level, the backup percentage overrides the one at the server level.

nrcmd> dhcp set failover-backup-percentage=10 
nrcmd> scope examplescope set failover-backup-percentage=10 

There is no single default for all environments, although 10 percent is a reasonable one. The percentage depends on the new client arrival rate and the network operator's reaction time. The backup server needs enough addresses from each scope to satisfy all new clients requests arriving during the time it does not know if the main server is down. Even during PARTNER-DOWN state, the backup server waits for the maximum client lead time (MCLT) and lease time to expire before re-allocating leases. See the "Maximum Client Lead Time and Lease Period Factor" section. When these times expire, the backup server:

Offers leases from its private pool.

Offers leases from the main server's pool.

Offers expired leases to new clients.

During the day, an operator likely responds within two hours to COMMUNICATIONS-INTERRUPTED state to determine if the main server is working. The backup server then needs enough addresses to support a reasonable upper bound on the number of new clients that could arrive during those two hours.

During off-hours, the arrival rate of previously unknown clients is likely to be less. The operator can usually respond within 12 hours to the same situation. The backup server then needs enough addresses to support a reasonable upper bound on the number of clients that could arrive during those 12 hours.

The number of addresses over which the backup server requires sole control is the greater of the two numbers. You would express this number as a percentage of the currently available (unassigned) addresses in each scope. If you use client-classes, remember that some clients can only use some sets of scopes and not others. See "Configuring Clients and Client-Classes."


Note During failover, clients can sometimes obtain leases whose expiration times are shorter than the amount configured. This is a normal part of keeping the server partners synchronized. Typically this happens only for the first lease period, or during COMMUNICATIONS-INTERRUPTED state.


For all servers or scopes for which you set dhcp enable failover or scope enable failover, you must set the failover-backup-percentage attribute. This is the number of currently available (unreserved) leases that the backup server can use for allocations to new DHCP clients when the main server is down. You can use the default, which is 10 percent, or specify another value.

For scopes for which you set scope enable dynamic-bootp, use the failover-dynamic-bootp-backup-
percentage
attribute rather than the failover-backup-percentage attribute. The failover-dynamic-bootp-
backup-percentage
is the percentage of available addresses that the main server should send to the backup server for use with BOOTP clients.


Note You must define this percentage on the main server. If you define it on the backup server, Network Registrar ignores it (to enable duplicating configuration through scripts). If you do not define it, Network Registrar uses the default or specified failover-backup-percentage.


The failover-dynamic-bootp-backup-percentage is distinct from the failover-backup-percentage attribute, because if scope name enable bootp is set on a scope, a server, even in PARTNER-DOWN state, never grants leases on addresses that are available to the other server. Network Registrar does not grant leases because the partner might give them out using dynamic BOOTP, and you can never safely assume that they are available again. To properly support dynamic BOOTP while using the failover protocol, do this on every LAN segment in which you want BOOTP support:

Create one scope for dynamic BOOTP.

Enable BOOTP and dynamic BOOTP.

Disable DHCP for that scope.

Importing Backup Server Leases to the Main Server

One way to bring the backup server's lease information to the main server is to import it.

Using the CLI


Step 1 Stop the backup server.

nrcmd> dhcp stop 

Step 2 Enable import mode on the main server.

nrcmd> dhcp enable import-mode 

Step 3 Reload the main server.

nrcmd> dhcp reload 

Step 4 Import the leases into the main server.

nrcmd> import leases leasefile.txt 

Step 5 Disable import mode on the main server.

nrcmd> dhcp disable import-mode 

Step 6 Reload the main server.

nrcmd> dhcp reload 

Step 7 Start the backup server.

nrcmd> dhcp start 


Exporting Lease Information

You can export lease information.

Using the CLI

There are two ways that you can export lease information:

Use the export leases command—Exports lease information about the state of all current and expired leases. This command does not identify the lease information as belonging to the main or backup servers. The -server or -client options determine what time format the output should be.

The -client option writes out the lease time as a string in the month, day, time, year format, such as Apr 15 16:35:48 2002.

The -server option writes out the state of all current and expired leases to the DHCP server's log directory using the output file that you specify. It writes lease times as integers representing the number of seconds since midnight GMT Jan 1, 1970, for example, 903968580.You can also specify the namespace as part of the export.

nrcmd> export leases -server -namespace blue -time-numeric leaseout.txt 


Note Leases exported with the -client and -server options may show different results when also using dynamic DNS updates, if there is a conflict with the client's host name. This occurs because the lease exported with the -client option shows the host name requested by the client, while a lease exported with the -server option shows the host name used by the server to perform the DNS update.


Use the export addresses command to export information about every address configured in every server that is specified in the configuration file, or database (to its ip_address table by default). This includes addresses specified in DHCP scope ranges, namespaces, DNS static addresses, and explicitly reserved addresses both for DNS and DHCP servers. However, addresses in scope ranges that are not in use (allocated or reserved) do not appear in the output. The export addresses command also displays the failover role, if any.

nrcmd> export addresses file=out.txt namespace=blue config=config.txt dhcp-only 
	time-ascii 
nrcmd> export addresses database=mcd username=admin password=changeme 
	table=ip_address time-numeric 

Specifying Clusters for the export addresses Command

Using the CLI

The export addresses command exports all active IP addresses into a specified database or CSV text file. You can determine which clusters the command pertains to in many ways. Network Registrar follows a precedence order. Any of the specific cluster specifications can override the default specification or previous specification:

Default cluster (localhost).

UNIX environment or Windows Registry variable AIC_CLUSTER.

-C flag on the command line allows you to specify a single cluster.

clusters attribute in the configuration file. This allows you to specify a group of clusters. This example specifies clusters in a .nrconfig file, the default configuration file, or in a file that you specify with the config keyword:

Cluster information for export addresses 
[export addresses] 
clusters=machine1 username password, machine2 username password [...] 
clusters=host1 admin, host2, host3 admin3 passwd3 

Separate cluster specifications with commas. Within each cluster specification, separate the three arguments with spaces. For long lines you can use continuation lines. You can embed carriage returns; you do not need to use continuation escape indicators.

You can optionally specify a username and password for the cluster. If you omit a username or password for a particular cluster, Network Registrar uses the last username or password listed. If you omit usernames or passwords, Network Registrar uses the information from the command line -N and -P arguments, and then the Windows Registry or environment variables AIC_NAME and AIC_PASSWORD. See the Network Registrar CLI Reference for invoking the nrcmd command. If Network Registrar cannot find a username or password or the supplied username and password are incorrect, the command issues a warning for that cluster.

Database Output Format of the export addresses Command

Using the CLI

The export addresses command writes the database output to the ip_addresses table or to the specified table name. Table 11-4 shows the columns in the ip_addresses table.

Table 11-4 export addresses Database Output 

Column
Data Type
Null?
Description

client_id

varchar(256)

yes

Client identifier for the lease (DHCP).

client_class

varchar(256)

yes

Client-class name (DHCP).

cluster_name

varchar(256)

no

Cluster name from which the information came.

dns_name

varchar(256)

yes

Fully qualified DNS name for assigned addresses. If Null, the DCP lease is not bound to a DNS name.

failover-role

varchar(64)

yes

Failover role, if any, of leases.

ip_address

unsigned integer

no

32-bit IP address.

ip_text

varchar(15)

no

IP address in dotted decimal notation.

lease_expiration_time

timestamp

yes

Date and time when the lease expires (DHCP).

lease_state_change_time

timestamp

yes

Date and time when the lease last changed state (DHCP).

lease_transaction_time

timestamp

yes

Date and time of the last transaction (DHCP).

mac_address

varchar(256)

yes

MAC address text field in the form.type,length,hex:hex:hex..., such as "1,6,00:d0:ba:d3:bd:3b." The type and length are both in decimal, whereas the data is in hexadecimal.

namespace-name

varchar(256)

yes

Namespace name. If Null, the current namespace, as set by the session set current namespace command.

requested_name

varchar(64)

yes

UNIX or WINS host name (DHCP).

scope_pool_name

varchar(256)

yes

Scope name address (DHCP) from which the address was allocated.

state

varchar(20)

no

Available, Assigned, Unavailable, Leased, Expired, De-activated, Released,
Other Available, Pending Available.

subnet_bits

integer

yes

Number of bits in subnet mask for the scope.

type

varchar(20)

no

STATIC, DYNAMIC, or RESERVED.


If the target database does not support the data type of a field, the export command replaces it with a text (varchar) field as listed in Table 11-5. Column names that exceed the target database's supported maximum name width get truncated to the allowable width.

Table 11-5 Field Data Types 

Data Type
Replacement Text

ip_address

varchar(10) in hexadecimal, such as "0x1234abcd"

Null

varchar(1) ""

other integers

varchar(11) in decimal, such as "28"

timestamp

varchar(26) as either a string of the form "Mon Apr 1502:03:55 2002" if time-ascii is specified or an unsigned integer string of seconds since midnight GMT Jan 01 00:00:00 1970 if time-numeric is specified; times are always in UTC


Text File Output Format of the export addresses Command

Using the CLI

If you specify writing the output to a CSV text file, the export addresses command writes each line in the file as one row in the database. The export addresses command outputs each field to the text file in the order listed in Table 11-4. The first line in the file is not a table row, but instead contains a hash (#) symbol followed by the text of each of the column names separated by commas. The command handles all fields that require text substitution as the previous section describes.


Note The output is not in a guaranteed order. The order depends on the data in the system. Therefore, if order is important to you, use a tool to sort the data.


Addresses Reported by the export addresses Command

Using the CLI

The export addresses command reports every address configured in every server that is specified in the configuration file. This includes addresses specified in DHCP scope ranges, DNS static addresses, and explicitly reserved addresses both for DNS and DHCP servers. However, unused (unallocated and unreserved) addresses in DHCP scope ranges do not appear in the table.

The report displays multiple entries for an address if the address is served by more than one server, is in more than one scope, or has multiple DNS names. Thus, you cannot use a unique column as a key, but you can generate a unique key from a set of columns such as ip_address, type, cluster_name, scope_pool_name, or dns_name.

Error Reports for the export addresses Command

Using the CLI

The export addresses command attempts to establish communication with the clusters you specify. It reports an error in these cases:

If the export addresses command cannot establish communication with any of the selected clusters, it issues messages on each cluster it could not reach and exits with "101 Ok, with warnings."

If the export addresses command cannot connect to the database or manipulate the table, it reports "326 Database access error:" followed by the text that ODBC reports.

If ODBC is not installed on the system, it reports "340 ODBC 3.x or higher required. ODBC not installed." If there is an incompatible version of ODBC present, it issues the message, "340 ODBC 3.x or higher required. ODBC.y installed."


Note If successful, the export addresses command prints "100 Ok" both before and after Network Registrar lists the addresses. The first "100 Ok" means that the command is processing (without rejection because of existing locks, licensing problems, or command syntax errors). The second "100 Ok" indicates that the command successfully completed its processing.


Changing Failover Server Roles


Caution Be careful when you change the role of a failover server. Remember that all address states in a scope are lost from a server if it is ever reloaded without that scope in its configuration.

Making a Nonfailover Server a Failover Main

You can update an existing installation and increase the availability of the DHCP service it offers. You can use this procedure only if the original server never participated in failover.

Using the CLI


Step 1 Install Network Registrar on the original server and ensure that it operates correctly after the installation.

Step 2 Install Network Registrar on the machine that is to be the backup server. Note the machine's DNS name.

Step 3 Enable failover on the original server. Use the DNS name of the recently installed backup server. See the "Configuring the Basic Scenario" section.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-backup-server=backupserver.example.com. 

Step 4 Reload the main server. It should go into PARTNER-DOWN state and stay there. It cannot locate the backup server, because it is not yet configured. There should be no change in main server operation at this point.

Step 5 Duplicate the main server's configuration on the backup server, including scopes (including secondary), policies, and client-classes. If you use client-classes, make sure the clients are entered into each cluster or that each server can access an LDAP database with the client data.

Step 6 Enable failover on the backup server, similar to Step 3. Be sure to define the main server.

nrcmd> dhcp set failover-main-server=mainserver.example.com. 

Step 7 Reconfigure all the operational BOOTP relays to forward broadcast DHCP packets to both the main and backup server.

Step 8 Reload the backup server.


After you complete these steps:

1. The backup server detects the main server and moves into RECOVER state.

2. The backup server refreshes its stable storage with the main server's lease data and, when complete, moves into RECOVER-DONE state when it reaches the maximum client lead time (MCLT).

3. The main server moves into NORMAL state.

4. The backup server moves into NORMAL state.

5. The backup server uses a pool request to ask the main server for addresses to allocate if communication is interrupted.

6. After allocating these addresses, the main server sends this data to the backup server.

Replacing a Server Having Defective Storage

If a failover server loses its stable storage (hard disk), you can replace the server and have it recover its state information from its partner.

Using the CLI


Step 1 Determine which server lost its stable storage.

Step 2 Use the dhcp setPartnerDown command to tell the other server that its partner is down. If you do not specify a time, Network Registrar uses the current time.

nrcmd> dhcp setPartnerDown backupserver.example.com. oct 31 13:10 2001 

Step 3 When the server is again operational, re-install Network Registrar.

Step 4 Duplicate the configuration on the server from its partner.

Step 5 Set the failover recovery time to the approximate time when the server failed.

nrcmd> dhcp set failover-recover "FEB 02 13:20 2001" 

Step 6 Reload the replacement server.

nrcmd> dhcp reload 


These actions then occur:

1. The recovered server moves into RECOVER state.

2. Its partner sends it all its data.

3. The server moves into RECOVER-DONE state when it reaches it maximum client lead time.

4. Its partner moves into NORMAL state.

5. The recovered server moves into NORMAL state. It can request addresses, but can allocate few new ones, because its partner already sent it all its previously allocated addresses.

6. Use the dhcp set failover-recover=0 command on the recovered server, then reload the server.

Removing a Backup Server and Halting Failover Operation

There are times when you might need to remove the backup server and halt all failover operations.

Using the CLI


Step 1 On the backup server, remove all the scopes that were designated as a backup to the main server.

nrcmd> scope scope1 delete 
nrcmd> scope scope2 delete 
...

Step 2 On the main server, remove the failover capability from those scopes that were main for the backup server, or disable failover server-wide if that is how it was configured.

nrcmd> scope scope1 set failover=scope-disabled 
nrcmd> dhcp disable failover 

Step 3 Reload both servers.

nrcmd> dhcp reload 


Adding a Main Server to an Existing Backup Server

You can use an existing backup server for a main server.

Using the CLI


Step 1 Duplicate the main server's scopes, policies, and other configurations on the backup server.

Step 2 Configure the main server to enable failover and point to the backup server.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-backup-server=backupserver.example.com. 

Step 3 Configure the backup server to enable failover for the new scopes that point to the new main server.

nrcmd> dhcp enable failover 
nrcmd> dhcp set failover-main-server=mainserver.example.com. 

Step 4 Reload both servers. Network Registrar performs the same steps as those described in the "Making a Nonfailover Server a Failover Main" section.

nrcmd> server dhcp reload 


Configuring Failover on Multiple Interface Hosts

If you plan to use failover on a server host with multiple interfaces, you must explicitly configure the local server's name or address. This requires an additional command. For example, if you have a host with two interfaces, serverA and serverB, and you want to make serverA the a main failover server, you must define serverA as the failover-main-server before you set the backup server name (external serverB). If you do not do this, failover might not initialize correctly and tries to use the wrong interface.

Using the CLI

nrcmd> dhcp set failover-main-server=serverA failover-backup-server=serverC 


Note With multiple interfaces on one host, you must specify a host name that points to only one address or A record. You cannot set up your servers for round-robin support.


Maximum Client Lead Time and Lease Period Factor

You can set two properties for failover that control certain adjustments to the lease period, the maximum client lead time (MCLT) and the lease period factor. These adjustments are essential for failover.

MCLT—Controls the maximum allowed time beyond the expiration of a lease offered a client that the partner server knows the expiration to be. The default MCLT is one hour, which is optimized for most configurations. As defined by the failover protocol, the lease period given a client can never be more than the MCLT added to the most recently received potential expiration time from the failover partner, or the current time, whichever is later. That is why you sometimes see the initial lease period as only an hour, or an hour longer than expected for renewals. This hour is the MCLT, a form of lease insurance. The actual lease time is recalculated when the main server comes back.

The MCLT is necessary because of failover's use of lazy updates. Using lazy updates, the server can issue or renew leases to clients before updating its partner, which it can then do in batches of updates. If the server goes down and cannot communicate the lease information to its partner, the partner may try to re-offer the lease to another client based on what it last knew the expiration to be. The MCLT guarantees that there is an added window of opportunity for the client to renew. The way that a lease offer and renewal works with the MCLT is:

a. The client sends a DHCPDISCOVER to the server, requesting a desired lease period (say, three days). The server responds with a DHCPOFFER with an initial lease period of only the MCLT (one hour by default). The client then requests the one-hour MCLT lease period and the server acknowledges it.

b. The server sends its partner a bind update containing the lease expiration for the client as the current time plus the MCLT (one hour). The update also includes the potential expiration time as the current time plus the client's desired period plus the MCLT (three days and an hour). The partner acknowledges the potential expiration, thereby guaranteeing the transaction.

c. When the client sends a renewal request halfway through its lease (in one-half hour), the server acknowledges with the client's desired lease period (three days). The server then updates its partner with the lease expiration as the current time plus the desired lease period (three days), and the potential expiration as the current time plus the desired period and another half of this period (3 + 1.5 = 4.5 days). The partner acknowledges this potential expiration of 4.5 days. In this way, the main server tries to have its partner always lead the client in its understanding of the client's lease period so that it can always offer it to the client.

Lease period factor—Controls how much ahead of the client the partner's idea of the lease expiration can be. It is a multiple of the desired lease period used to update the partner when the main server informs it of a lease renewal. Possible values in the range of values are:

1.5—The default and optimized factor. It is the lease period itself plus half again the lease, best used if the renewal period is 50% of the lease.

1.0—Same as the lease period the main server gives the client. Any server can then never offer any client a lease or renewal of more than the MCLT.

2.0—Twice the lease period the main server gives the client.

The lease period factor interacts with the lease renewal period. If the renewal period is more than 50% of the lease, you must also increase the factor. The calculation is:

1 + renew-percentage = factor

Thus, the usual renewal period of 50% might take the default (1 + 0.5 =) 1.5 lease period factor. A renewal period of 80% would more appropriately take a (1 + 0.8 =) 1.8 lease period factor.

You must define the lease period factor for the main DHCP server only. If defined for a partner, the main server ignores it, to enable duplicating the configurations through scripts.

Using the CLI

Generally, if you enabled failover on your DHCP server, you should not have to change the failover-maximum-client-lead-time or lease-period-factor attributes. However, you can do so explicitly.


Step 1 Reload the backup server to ensure that all data that the backup server has for the main server is up-to-date. Ideally, you should wait until both partners are stabilized, in NORMAL state, and any updates were exchanged. At least wait until the backup server completes its cache update process, as the log file indicates.

nrcmd> dhcp reload 

Step 2 Change the MCLT or lease period factor on the main server. The backup server ignores any MCLT that you configured on it, because it derives its MCLT value directly from the main. The default MCLT is 60 minutes and the default lease period factor is 1.5.

nrcmd> dhcp set failover-maximum-client-lead-time=4800 lease-period-factor=2.0 

Step 3 Stop the backup server.

nrcmd> dhcp stop 

Step 4 Reload the main server.

nrcmd> dhcp reload 

Step 5 Start the backup server.

nrcmd> dhcp start 


Setting DHCP Request and Response Packet Buffers

Failover requires that there be excess response packet buffers for the number of request packet buffers set. The general rule is that if you set 1000 or fewer request buffers, the number of response buffers should be at least 50% (1.5 times) higher. Shorter lease times require more response buffers.

Using the CLI

Set the max-dhcp-requests and max-dhcp-responses attributes for the DHCP server according to the ratios described. The default is twice as many response buffers as request buffers.

nrcmd> dhcp set max-dhcp-requests=500 max-dhcp-responses=1000 

Changing System Defaults

You can change some system defaults, such as the number of leases that the main server should send to the backup server, or the MCLT. See the "Maximum Client Lead Time and Lease Period Factor" section. However, you need to change them on both servers.

Using the CLI

On each server:

Change the poll interval—The interval that partners contact each other to confirm network connectivity. The default is 10 seconds.

nrcmd> dhcp set failover-poll-interval=14 

Change the poll timeout—Failover partners who cannot communicate for failover-poll-timeout seconds will conclude that they lost network connectivity, and change their operational states appropriately. The default is 60 seconds.

nrcmd> dhcp set failover-poll-interval=120 

If you enable failover on a UNIX system, you could set the sms-network-discovery attribute to enable the computing client os-type for leased addresses. This can help if you have an Windows partner server and want to use the dhcp updateSms command on it.

Supporting BOOTP Clients

You can configure scopes to support two types of BOOTP clients—static and dynamic.

Static BOOTP

You can support static BOOTP clients using DHCP reservations. When you enable failover, remember to configure both the main and the backup server with identical reservations.

Dynamic BOOTP

You can enable dynamic BOOTP clients using the scope name enable dynamic-bootp command. When using failover, however, there are additional restrictions on address usage in such scopes, because BOOTP clients get permanent addresses and leases that never expire.

Using the CLI

When a server whose scope does not have the dynamic-bootp option enabled goes to PARTNER-DOWN state, it can allocate any available (unassigned) address from that scope, whether or not it was initially available to any partner. However, when the dynamic-bootp option is set, each partner can only allocate its own addresses. Consequently, scopes that enable the dynamic-bootp option require more addresses to support failover.

When using dynamic BOOTP:

Segregate dynamic BOOTP clients to a single scope. Disable DHCP clients from using that scope with the scope name disable dhcp command.

Use the dhcp set failover-dynamic-bootp-backup-percentage command to allocate a greater percentage of addresses to the backup server for this scope, as much as 50 percent higher than a regular backup percentage.

Configuring BOOTP Relays

The Network Registrar failover protocol works with BOOTP relay (also called IP helper), a router capability that supports DHCP clients that are not locally connected to a server. For details about configuring routers, see the "BOOTP Relay" section of Chapter 7.

If you use BOOTP relay, ensure that the implementations point to both the main and backup servers. If they do not and the main fails, clients are not serviced, because the backup cannot see the required packets. If you cannot configure BOOTP relay to forward broadcast packets to two different servers, configure the router to forward the packets to a subnet-local broadcast address for a LAN segment, which could contain both the main and backup servers. Then, ensure that both the main and backup servers are on the same LAN segment.

DHCPLEASEQUERY and Failover

To accommodate DHCPLEASEQUERY messages sent to a DHCP failover backup server when the master server is down, the master server must communicate the relay-agent-info (82) option values to its partner server. To accomplish this, the master server uses DHCP failover update messages.

Troubleshooting Failover

This section describes how to avoid failover configuration mistakes, monitor failover operations, and detect and handle network problems.

Monitoring Failover Operations

You can examine the DHCP server log files on both partner servers to verify your failover configuration.

Using the CLI

You can make a few important log and debug settings to troubleshoot failover. Use the dhcp set log-settings=failover-detail command to increase the number and detail of failover messages logged. To ensure that previous messages do not get overwritten, add the failover-detail attribute to the end of the list. Use the no-failover-conflict attribute to inhibit logging server failover conflicts, or the no-failover-activity attribute to inhibit logging normal server failover activity. Then, reload the server.

nrcmd> dhcp set log-settings=default,incoming-packets,missing-options,failover-detail 
nrcmd> dhcp reload 

You can also isolate failover misconfigurations more easily if you use the dhcp getRelatedServers command. See the "Monitoring Failover Server Status" section.

Detecting and Handling Network Failures

Table 11-6 describes some symptoms, causes, and solutions for failover problems.

Table 11-6 Detecting and Handling Failures 

Symptom
Cause
Solution

New clients cannot get addresses

A backup server is in COMMUNICATIONS-
INTERRUPTED state with too few addresses

Increase the backup percentage on the main server.

Error messages about mismatched scopes

There are mismatched scope configurations between partners

Reconfigure your servers.

Log messages about failure to communicate with partner

Server cannot communicate with its partner

Check the status of the server.

Main server fails. Some clients cannot renew or rebind leases. The leases expire even when the backup server is up and possibly processing some client requests.

Some BOOTP relay (ip-helper) was not configured to point at both servers; see the "Configuring BOOTP Relays" section.

Reconfigure BOOTP relays to point at both main and backup server

Run a fire drill test—Take the main server down for a day or so and see if your user community can get and renew leases

SNMP trap: other server not responding

Server cannot communicate with its partner

Check the status of the server.

SNMP trap: dhcp failover configuration mismatch

Mismatched scope configurations between partners

Reconfigure your servers.

Users complain that they cannot use services or system as expected

Mismatched policies and client-classes between partners

Reconfigure partners to have identical policies; possibly use LDAP for client registration if currently registering clients directly in partners.