Guest

Cisco Network Registrar

How To Manage CNS Cisco Network Registrar DHCP Failover

Document ID: 46807

Updated: Jan 31, 2006

   Print

Introduction

This document explains how to set up and manage Cisco CNS Network Registrar Dynamic Host Configuration Protocol (DHCP) failover server pairs for Release (5.5 or later). Failover pairs are the main and backup DHCP servers that interact in a failover configuration, and the backup server takes over to lease addresses to clients if the main server is down.

The remainder of this document explains what DHCP failover is, how it operates, and how to set up, verify, and troubleshoot simple failover.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

The information in this document is based on the software and hardware versions:

  • Cisco Network Registrar Version 5.5 and 6.0.x

  • Sun Solaris E220R server with Solaris 8 installed

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

For more information on document conventions, see the Cisco Technical Tips Conventions.

What is DHCP Failover?

Dynamic Host Configuration Protocol (DHCP) failover is a protocol designed to allow multiple servers to operate on a single network. Cisco Network Registrar allows a backup DHCP server to take over for a main Network Registrar DHCP server if the main server fails. In order for this to work reliably, the primary and secondary servers must maintain a consistent database of the lease information. Servers must coordinate all lease activity so that this information is synchronized in case of failover.

There are three types of Network Registrar DHCP failover configurations:

Simple failover is the easiest failover to manage. This document focuses on how to implement, verify, and troubleshoot simple failover. There are no performance benefits with backoffice and symmetric configurations.

Simple Failover

Simple failover includes a single main server and its backup server as shown in this illustration:

17867.gif

These are the advantages of simple failover:

  • Configuration is simple—failover is enabled at server level.

  • The backup server is identical to the main server.

  • Simple failover is easy to maintain as the network changes.

Backoffice Failover

Backoffice failover includes many main servers that use a single backup server as shown in this illustration:

17868.gif

An advantage of backoffice failover is that it reduces the total number of servers that you must manage.

These are the disadvantages of backoffice failover:

  • You must size the backup server to handle the sum of the configurations.

  • You must duplicate the changes to the main server on the backup server.

Symmetric Failover

With symmetric failover, the network is divided between two servers and they back up each other.

17869.gif

caution Caution: Although in NORMAL mode, the two servers in symmetric failover share the network load, but the gain in processing capacity is minimal. A backup server operates at about 40% of the main servers capacity to keep the lease database synchronized. If the servers back each other up, they must dedicate a portion of their processing capacity to synchronization, which leaves less capacity available to service clients. Symmetric failover is error prone since you must configure each scope individually. Use simple failover, because it has the most advantages with the least disadvantages.

How Failover Operates

The failover protocol is designed to protect against several kinds of failures:

  • Failure of the main server—The backup server takes over the services that were provided by the main server until the main server is operational again. Despite the fact that the main server updates the backup server after it responds to the DHCP client (a lazy update), there is no possibility of duplicate IP address allocation even if the main server fails before it updates the backup server.

  • Failure of the communications path between the main and backup servers—The backup server cannot distinguish between a failure of the main server and the failure of the communications path to the main server. The failover protocol is designed to operate correctly in either event. There is no possibility of duplicate IP address allocation even if both servers remain operational and each communicates with only a subset of DHCP clients.

You can use numerous parameters to define your failover configuration. The parameters are described in the Failover Parameters and Their Descriptions section.

After you start the servers, each server contacts the other. After contact is established, the main server provides the backup server with a private pool of IP addresses to use in the event of a failure. The main server updates the backup server whenever it performs an operation for a DHCP client.

The main server continues to update the backup server and the backup server allows the main server to service DHCP client requests.

If a failure occurs on the main server, the backup server takes over and renews the addresses of the current clients and offers addresses to new clients. When the main server is operational again, it automatically reintegrates with the backup server without administrator intervention.

This illustration is an example of message traffic in a failover configuration:

101400.gif

You can change some system defaults, such as the percentage of leases that the main server sends to the backup server, or the maximum client lead time (MCLT) as shown in the illustration. If you enabled failover on your DHCP server, do not change the failover-maximum-client-lead-time or lease-period-factor attributes. If you need more information about how to change the default MCLT and/or lease period factor settings, refer to the Maximum Client Lead Time and Lease Period Factor section in this document.

Choosing a Backup Percentage

Set the backup percentage large enough to allow the backup server to continue to serve new clients in the event that the main server fails. Calculate the backup percentage based on the number of available addresses. The default backup percentage is 10%. Set the percentage to a larger value if you expect extended outages, because the main servers reclaims addresses once per hour if the address pool of the main server drops below its predefined percentage. For example, with the default 10% backup percentage, the main server reclaims addresses if its address pool falls below 90%.

Maximum Client Lead Time and Lease Period Factor

If the default parameters do not fit your environment, there are two tunable parameters that are critical to the functionality of failover: the maximum client lead time (MCLT) and the lease period factor. If you change these parameters, you must change them on both servers.

  • MCLT—The MCLT controls the maximum time offered to a client beyond the expiration of a lease of which the partner server is aware. The default MCLT is one hour, which is optimized for most configurations. The failover protocol states that the lease period given a client can never be more than the MCLT added to the most recently received expiration time from the failover partner, or the current time, whichever is later. That is why you sometimes see the initial lease period as only an hour, or an hour longer than expected for renewals. This hour is the MCLT, a form of lease insurance. The actual lease time is recalculated when the main server comes back.

    The MCLT is necessary because of the use of lazy updates by failover. If you use lazy updates, the server can issue or renew leases to clients before it updates its partner, which it can then do in batches of updates. If the server goes down and cannot communicate the lease information to its partner, the partner may not try to re-offer the lease to another client based on its last knowledge of the lease. The MCLT guarantees that there is an added window of opportunity for the client to renew. A lease offer and renewal works with the MCLT in this manner:

    1. The client sends a DHCPDISCOVER to the server, and requests a desired lease period, such as, three days.

    2. The server responds with a DHCPOFFER with an initial lease period of only the MCLT, one hour by default.

    3. The client then requests the one-hour MCLT lease period and the server acknowledges.

    4. The server sends its partner a bind update that contains the lease expiration for the client as the current time plus the MCLT, one hour. The update also includes the potential expiration time as the current time plus the desired period of the client plus the MCLT, three days and an hour.

    5. The partner acknowledges the potential expiration, and guarantees the transaction.

    6. When the client sends a renewal request halfway through its lease, in one-half hour, the server acknowledges with the desired lease period ,three days, of the client.

    7. The server then updates its partner with the lease expiration as the current time plus the desired lease period, three days, and the potential expiration as the current time plus the desired period and another half of this period, 3 + 1.5 = 4.5 days.

    8. The partner acknowledges this potential expiration of 4.5 days. In this way, the main server tries to have its partner always lead the client to understand the lease period for the client so that it can always offer it to the client.

    There is no correct value for the MCLT. The default of one hour works well in most environments, but there is a trade-off between factors if you select a short and long MCLT value:

    • Short MCLT value—A short MCLT value means that after you enter the PARTNER-DOWN state, a server only has to wait a short time before it starts to allocate the IP addresses of the partner to DHCP clients. It only has to wait a short time after the expiration of a lease on an IP address before it re-allocates that IP address to another DHCP client.

      The downside of a short MCLT value is that the initial lease interval offered to every new DHCP client is short. A short MCLT value causes increased traffic because those clients need to send in their first renew in half the amount of a short MCLT time.

      In addition, the lease extensions that a server in the COMMUNICATIONS-INTERRUPTED state can give are only the MCLT after the server is in COMMUNICATIONS-INTERRUPTED for the desired client lease period. If a server stays in COMMUNICATIONS-INTERRUPTED for that long, then the leases it hands out are short, and that increases the load on that server.

    • Long MCLT value—A long MCLT value means that the initial lease period is longer and the time that a server in the COMMUNICATIONS-INTERRUPTED state extends leases, after it has been in the COMMUNICATIONS-INTERRUPTED state for the desired client lease period, is longer.

      However, a server that enters the PARTNER-DOWN state has to wait the longer MCLT before it allocates the IP addresses for the partner to new DHCP clients. This means that additional IP addresses are required to cover this time period. In addition, the server in PARTNER-DOWN must wait the longer MCLT from every lease expiration before it re-allocates an IP address to a different DHCP client.

  • Lease period factor—The lease period factor controls how much ahead of the client the lease expiration can be according to the partner. The lease period factor is a multiple of the desired lease period that is used to update the partner when the main server informs it of a lease renewal. Possible values in the range of values are:

    • 1.5—The default and optimized lease period factor is 1.5. It is the lease period plus half the lease. This value is best used if the renewal period is 50% of the lease.

    • 1.0—Same as the lease period the main server gives the client.

    • 2.0—Twice the lease period the main server gives the client.

    The lease period factor interacts with the lease renewal period. If the renewal period is more than 50% of the lease, you must also increase the factor. The calculation is:

    1 + renew-percentage = factor
    

    The usual renewal period of 50% might take the default (1 + 0.5 =) 1.5 lease period factor. A renewal period of 80% would take a (1 + 0.8 =) 1.8 lease period factor.

    You must define the lease period factor only for the main DHCP server. The main server ignores lease period factors defined for a partner, to enable you to duplicate the configurations through scripts.

    The length of the safe period is:

    • Installation-specific

    • Dependent on the number of addresses not allocated in the pool

    • Dependent on the expected arrival rate of previously unknown clients that require addresses

    The safe period is typically 24 hours, although many environments support periods of several days. The number of extra addresses required for the safe period must be the same as the expected total of new clients a server encounters. This depends on the arrival rate of new clients, not the total awaited leases. If you can only afford a short safe period, because of many addresses or a high arrival rate of new clients, you can benefit substantially when you allow DHCP to ride through minor problems that are fixable in an hour. There is little chance of duplicate address allocation and, after the solved failure, re-integration is automatic and requires no operator intervention.

Failover States

In Network Registrar, failover states define what the current communication situation is between the main and backup failover pair.

Regime Main Server Backup Server
Normal Responsive to all DHCP client requests and allocates IP addresses to new clients from its pool of available IP addresses. It allocates to the backup server some IP addresses to use if communications are interrupted. Unresponsive to DHCP client requests except renewals or rebind requests. The backup server requests and receives a set of IP addresses to use for allocation to new DHCP clients if communication with the main server is interrupted.
Communications Interrupted Responsive to all DHCP client requests. It cannot tell if the backup server has gone down or if the backup server is just unable to communicate. It operates normally, although it cannot reallocate an IP address from one DHCP client to another while in this regime. Cannot tell if the main server is down or simply not communicating. In either case, the backup server is responsive to all DHCP client requests and can allocate IP addresses from its pool of available addresses it receives from the main server.
Servers usually transition between Normal and Communications Interrupted as one or the other server goes up and down.
Partner Down The running server is guaranteed that the other server is down. The running server has control of all of the IP addresses, can offer any configured lease time or lease extension period, and at any time can reallocate an IP address from one client to another. A server only transitions to Partner Down if it is informed that the other partner is down. The notification can be either through the protocol (used when the partner knows that it is going down) or because the server is unable to communicate with its partner, it automatically entered the Communication Interrupted regime, and the administrator used the setPartnerDown command. The setPartnerDown command tells the server that its partner is down. You can configure failover to affect an automatic transition from Communications Interrupted to Partner Down after the safe period passes, but you run the risk of duplicate IP address allocations if the partner is not actually down.

To check on failover states, use the getrelatedservers CLI command:

nrcmd> dhcp getrelatedservers
100 Ok 
Type   Name                    Address            Requests Communications localhost State Partner State 
MAIN   main.test.cisco.com     192.168.1.1        0 OK     NORMAL          NORMAL 
BACKUP backup.test.cisco.com   192.168.1.2        0 OK     NORMAL          NORMAL

How to Set Up Simple Failover

Step-by Step Instructions For Releases Prior to 6.0

Complete these steps to setup a simple failover for releases prior to version 6.0.

  1. Use the mcdadmin export utility to get an exact replica of the main server:

    mcdadmin -x -e exportfile.mcd
    
    
    

    There is no output when you create this file. To verify that the file exists, run the UNIX ls command.

  2. Install the same version of Network Registrar on the backup server.

  3. Copy the exportfile.mcd to the backup server.

    Remove the DNS and TFTP entries that conflict from the file. Find the groups for servers/name/dns and servers/name/tftp and use a text editor, such as vi, to remove them.

  4. Use the mcdadmin import utility on the backup server.

    mcdadmin -c -o -i exportfile.mcd
    
    

    You are prompted for a username and password. The default username is admin and the default password is changeme.

  5. Enable failover, and set the main and backup configurations identically on both servers, for example:

    nrcmd> dhcp enable failover
    nrcmd> dhcp set failover-main-server=192.168.0.1
    nrcmd> dhcp set failover-backup-server=192.168.0.110
    
    
  6. Restart both servers.

    nrcmd> dhcp reload
    
    

Step-by-Step Instructions for Release 6.0

Note: In release 6.0, the mcdadmin utility is replaced by cnr_exim. You can use the steps described for previous releases, but you should use the Network Registrar Web UI Failover Configuration tool as described in this section.

Network Registrar creates a failover pair relationship when you configure the servers to use failover.

To configure failover for the server, complete these steps while connected to the main DHCP server.

Note: In this procedure, you must specify both a main and backup server IP address.

  1. On the Primary Navigation bar, click DHCP.

  2. On the Secondary Navigation bar, click DHCP Server and the Edit DHCP Server page displays.

  3. On the Edit DHCP Server page, set these attributes:

    • Failover settings—Click on.

    • Main Server—Enter the IP address of the main DHCP server in the failover pair.

    • Backup Server—Enter the IP address of the backup DHCP server in the failover pair.

  4. Accept the other failover settings, unless you have reason to change them.

  5. Click Modify Server to save the configuration.

  6. On the Secondary Navigation bar, click Failover and the List DHCP Failover Pairs page displays.

  7. Click the failover pair name to open the Edit DHCP Failover Pair page.

  8. Set the user credentials to access the backup server, then click Modify Failover.

  9. On the List DHCP Failover Pairs page, click Run, or the Report icon to see a list of changes to be made before they are applied.

  10. On the Run/Report Synchronize Failover page, select the Exact synchronization mode, then click Run, or Report, and if you are satisfied with the results of the report, click Run.

  11. On the List DHCP Failover Pairs page, click the Manager Servers icon and the Manage DHCP Failover Servers page displays.

  12. Click the Reload icon to reload the main server, then reload the backup server.

Refer to the Cisco Network Registrar Web UI Guide, 6.0 for more information.

Use Case: Simple DHCP Failover Using the CLI

This use case works for release 5.5. It lists the steps to configure and enable simple DHCP failover on a pair of Network Registrar DHCP servers that use the command line interface (nrcmd):

Note: This example assumes your scopes are already defined.

  1. On the main server, define the IP address of the main server, the IP address of the backup server, and enable failover. You should define the IP address of the servers rather than the FQDN so that name resolution is not required for the failover pair to communicate. Assume these scenario:

    Main CNR server: 192.168.0.1
    Backup CNR server: 192.168.0.110 
    
    

    Set the options as follows:

    nrcmd> dhcp set failover-main-server=192.168.0.1
    100 Ok
    failover-main-server=192.168.0.1
    
    nrcmd> dhcp set failover-backup-server=192.168.0.110
    100 Ok
    failover-backup-server=192.168.0.110
    
    nrcmd> dhcp enable failover
    100 Ok
    failover=on
    
    
  2. After you enter the three commands, perform a DHCP reload. You have already enabled failover, so as soon as the configuration file imports to the backup server and DHCP reloads on the backup, the main servers begin to communicate.

    1. Verify that failover configures on the main server with the getrelatedservers command. This verifies the state of the main server:

      nrcmd> dhcp getrelatedservers
      100 Ok
      Type Name      Address     Requests Communications localhost State Partner State 
      MAIN gateways 192.168.0.1        0 INTERRUPTED    RECOVER         --
      
      
    2. To export the configuration from the main server, from the /opt/nwreg2/usrbin directory in Solaris run:

      mcdadmin -x -e failoverexport.mcd 
      

      When prompted for username and password, use the admin account. The default username and password predefined in Network Registrar is admin/changeme.

    3. Open the file that you exported to verify that it is not corrupt. The first text line in the file looks like this:

      # version: 1.0 
    4. Move the file to the backup server and import it to the Network Registrar database.

      # ./mcdadmin -c -o -i failoverexport.mcd
      
    5. Start the backup server.

  3. After a varied period of time, which depends on the size of your Network Registrar deployment, the server synchronizes. Run the getrelatedservers and the output looks similar to:

    nrcmd> dhcp getrelatedservers
    100 Ok
    Type Name      Address     Requests Communications localhost State Partner State
    MAIN gateways 192.168.0.1        0 OK             NORMAL          NORMAL 
    
    
  4. Look in the log file for a statement similar to this after the DHCP reload:

     06/19/2003  9:41:19 name/dhcp/1 Info Configuration 0 04092 Failover is enabled server-wide. Main server name: '192.168.0.1', backup server name: '192.168.0.110', mclt = 3600, backup-percentage = 10, dynamic-bootp-backup-percentage = 0, lease-period-factor = 150, use-safe-period: disabled, safe-period = 0.
    
  5. Attempt to get a lease. Verify that communication between the main and backup servers are in a normal state when you view the log file, /var/nwreg2/logs/name_dhcp_1_log, on the backup server. On the backup server, you see the this messages in the name_dhcp_1_log log file:

    07/15/2003 10:26:29 name/dhcp/1 Info Failover 0 04666 Received DHCPDISCOVER packet but failover state of normal did not allow network '127.0.0.1' to respond to clients.  Dropping packet.
    

    This message states that the failover state between the two servers is normal and that the backup server does not respond to clients when the failover state is NORMAL.

Failover Parameters and Their Descriptions

Default values, are in the Parameter column. The failover parameters are listed here in order of importance.

Parameter Description
failover = disabled Controls whether all scopes that use the failover configuration for the server can engage in failover.
failover-main-server = With failover enabled, the DNS name of the main server associated with all scopes where scope name set failover-main-server is not set. If this DNS name resolves to the IP address of the current server, this server operates as the main server for all of these scopes. It is an error if both the main and backup server names resolve to addresses on the same server. Specify the IP address rather than the FQDN. Optional, no default.
failover-backup-server = With failover enabled, the DNS name of the backup server associated with all scopes if you did not use the scope name set failover-backup-server command. If this DNS name resolves to the IP address of the current server, this server operates as the backup server for all of these scopes. It is an error if both the main and backup server names resolve to addresses on the same server. Specify the IP address rather than the FQDN. Optional, no default.
failover-backup-percentage = 10 With failover enabled, the percentage of currently available (unleased) addresses that the main server sends to the backup server to allocate to new DHCP clients when the main server is down. The value is only meaningful for the main server. Optional, default 10 percent.
failover-maximum-client-lead-time = 60m With failover enabled, the maximum client lead time, in seconds. The MCLT is the maximum time that one server can extend the lease of a client beyond what its partner knows it to be. You must define the MCLT on the main server, which communicates it to its partner. It is ignored on a backup server.
failover-dynamic-bootp-backup-percentage = When failover is enabled, the percentage of currently unavailable (unreserved) addresses that the main server should send to the backup server for scopes set with scope name enable bootp.
failover-use-safe-period = disabled With failover enabled and the failover-use-safe-period attribute set, you must enable the failover-use-safe-period attribute to cause Network Registrar to go into the PARTNER-DOWN state automatically. If you disable this attribute (the default), Network Registrar never goes into the PARTNER-DOWN state automatically. You must then use the dhcp setPartnerDown command.
failover-safe-period=24h With failover enabled and the failover-use-safe-period attribute set, the safe period, in seconds. You must define it in the main server. The safe period can differ on the main and backup servers. Refer to the Network Registrar User's Guide, 6.0 for more information. Optional, default 86400 seconds (24 hours).
failover-bulking = enabled With failover enabled, controls whether a failover bind update (BNDUPD) contains multiple lease state updates. Affects only the lease state updates that DHCP client activity generates. Optional, enabled by default.
failover-lease-period-factor = 1.50 With failover enabled, the multiple of the desired lease period used to update the backup server when the main server informs it of a new DHCP client lease period.
failover-poll-interval = 15s With failover enabled, the polling interval of the failover partners (in seconds) to confirm network connectivity. Optional, default 15 seconds.
failover-poll-timeout = 60s With failover enabled, the interval, in seconds, after which failover partners who cannot communicate know that they lost network connectivity. Optional, default 60 seconds. Generally, you should not change the failover-poll-timeout.
failover-recover = "Wed Dec 31 17:00 1969" With failover enabled, time at which the server performs initialization and goes into RECOVER state. If server A is running, server B issues this command to ask for the state of server A. Enter the time as month, (name or its first three letters), day, hour (24 hour) year (fully specified year or last two digits), all enclosed in double quotes; for example, "Jun 30 20:00:00 2002." Optional, default zero (0).

Verify

Complete these steps to ensure that failover is enabled. The log messages assume that default logging is turned ON.

  1. Ping from one server to the other to verify TCP/IP connectivity. Make sure that the routers are configured to forward clients to both servers.

  2. Check the startup logs to ensure that you end up in NORMAL mode.

  3. After startup, have a client attempt to get a lease.

  4. Verify that the DHCP log name_dhcp_1_log log file, located in /var/nwreg2/logs by default installation, contains DHCPBNDACK or DHCPBNDUPD messages from each server.

  5. Verify that the DHCP log name_dhcp_1_log log file, on the backup server, located in /var/nwreg2/logs by default installation, contains messages that the backup server drops requests because failover is in a NORMAL state.

  6. Run the getrelatedservers command to verify that failover states are NORMAL.

    nrcmd> dhcp getrelatedservers
    100 Ok Type   Name                  Address           Requests     Communications localhost State Partner State 
    MAIN   main.test.cisco.com        192.168.1.1        0 OK             NORMAL            NORMAL 
    BACKUP backup.test.cisco.com      192.168.1.2        0 OK             NORMAL            NORMAL 
    

Troubleshoot

There are several things you can do to troubleshoot simple failover:

  • Ensure that you have bi-directional connectivity between your main and backup servers.

  • Verify that these parameters are defined globally:

    • failover=on

    • failover-main-server=ip address of main server

    • failover-backup-server=ip address of backup server

    • failover-recover=0

  • Verify that both servers have the identical configuration.

  • Check the name_dhcp_1_log log file for any errors.

    06/19/2003 10:54:44 name/dhcp/1 Info Failover 0 04011 Failover: as main for 192.168.0.110, the backup server NAKed a DHCPBNDUPD message about Lease: 192.168.1.140. The reason was: 'IP address not configured'.  The associated message was:  The server could not process a lease update message for IP address.

Related Information

Updated: Jan 31, 2006
Document ID: 46807