One or both failover
partners could potentially move into COMMUNICATIONS-INTERRUPTED state. They
cannot issue duplicate addresses while in this state. However, having a server
in this state over longer periods is not a good idea, because there are
restrictions on what a server can do. The main server cannot reallocate expired
leases and the backup server can run out of addresses from its pool.
COMMUNICATIONS-INTERRUPTED state was designed for servers to easily survive
transient communication failures of a few minutes to a few days. A server might
function effectively in this state for only a short time, depending on the
client arrival and departure rate. After that, it would be better to move a
server into PARTNER-DOWN state so it can completely take over the lease
functions until the servers resynchronize.
There are two ways a
server can move into PARTNER-DOWN state:
action—An administrator sets a server into
PARTNER-DOWN state based on an accurate assessment of reality. The failover
protocol handles this correctly. Never set both partners to PARTNER-DOWN.
expires—When the servers run unattended for longer
periods, they need an automatic way to enter PARTNER-DOWN state.
might not sense in time that a server is down or uncommunicative. Hence, the
failover safe period, which provides network operators some time to react to a
server moving into COMMUNICATIONS-INTERRUPTED state. During the safe period,
the only requirement is that the operators determine that both servers are
still running, and if so, fix the network communications failure or take one of
the servers down before the safe period expires.
The length of the
safe period is installation-specific, and depends on the number of unallocated
addresses in the pool and the expected arrival rate of previously unknown
clients requiring addresses.
In Cisco Prime
Network Registrar 8.2 or later, the use-safe-period attribute is
enabled by default for a failover pair and the default safe period is 4 hours.
This ensures that if the failover partner is in COMMUNICATIONS-INTERRUPTED
state for 4 hours, it will enter PARTNER-DOWN state automatically after the
safe period elapses. You may need to review if this setting is appropriate for
your network and adjust the safe-period based on your network requirements.
In addition, during
this safe period, either server allows renewals from any existing client, but
there is a major risk of possibly issuing duplicate addresses. This is because
one server can suddenly enter PARTNER-DOWN state while the other is still
operating. In order to prevent this problem, it is important that you do not
change the default settings for use-safe-period and put operational procedures
in place to alert operations personnel when the failover pair loses contact
with each other. Especially, in the event of network communications failure,
operator intervention is required before the safe period elapses. Either one
failover server needs to be taken offline or the use-safe-period attribute
needs to be disabled on both the servers before the safe period elapses.
In Cisco Prime
Network Registrar 8.2 or later, use-safe-period is enabled by
default. You may want to review if this is appropriate for your network and you
may want to disable the use-safe-period or adjust the safe-period based on your
network requirements and monitoring.
The number of extra
addresses required for the safe period should be the same as the expected total
of new clients a server encounters. This depends on the arrival rate of new
clients, not the total outstanding leases. Even if you can only afford a short
safe period, because of a shortage of addresses or a high arrival rate of new
clients, you can benefit substantially by allowing DHCP to ride through minor
problems that are fixable in an hour. There is minimum chance of duplicate
address allocation, and reintegration after the solved failure is automatic and
requires no operator intervention.
In Cisco Prime
Network Registrar 8.2 or later, if the failover safe period length
is more than the length of the MCLT and the failover server enters into
PARTNER-DOWN state because of the safe-period, the server can start allocating
its partner other-available leases to DHCP clients immediately. The advantage
of this is that the server has additional leases to allocate. However, the
disadvantage is that operator intervention is required within the safe period
in case of network communications failure. Either the failover server needs to
be taken offline or the use-safe-period attribute needs to be disabled on both
the servers before the safe period elapses. Without operator intervention, both
failover servers will transition to PARTNER-DOWN state and start allocating its
partner addresses to new DHCP clients.
Here are some
guidelines to follow, to help you decide whether to use manual intervention or
the safe period for transitioning to PARTNER-DOWN state:
- If your corporate policy is
to have minimal manual intervention, set the safe period. Enable the failover
use-safe-period to enable the safe period. Then, set the
set the duration (4 hours by default). Set this duration long enough so that
operations personnel can explore the cause of the communication failure and
assure that the partner is truly down.
- If your corporate policy is
to avoid conflict under any circumstances, then never let either server go into
PARTNER-DOWN state unless by explicit command. Allocate sufficient addresses to
the backup server so that it can handle new client arrivals during periods when
there is no administrative coverage. You can set PARTNER-DOWN on the Manage
Failover Servers tab of the web UI, if the partner is in the
Communications-interrupted failover state, you can click
Down in association with an input field for the
PARTNER-DOWN date setting. This setting is initialized to the value of the
start-of-communications-interrupted attribute. (In Normal
web UI mode, you cannot set this date to be an earlier value than the
initialized date. In Expert web UI mode, you can set this value to any date.)
setPartnerDown date in the CLI, specifying the
name of the partner server. This moves all the scopes running failover with the
partner into PARTNER-DOWN state immediately, unless you specify a date and time
with the command. This date and time should be when the partner was last known
to be operational.
In Cisco Prime
Network Registrar 8.2 or later, if you use setPartnerDown in the
CLI and specify the date and time when the partner was last known to be
operational then the failover server calculates the MCLT from the time
specified in the setPartnerDown command. If the date and time is not specified
for the setPartnerDown command, then the failover server calculates the MCLT
from the time the failover server moved to the COMMUNICATIONS-INTERRUPTED
state. In case of network communications failure, it is important that you
specify the actual time the partner was last known to be operational in the
setPartnerDown command. Otherwise, it can result in duplicate IP addresses.
There are two
conventions for specifying the date:
–num unit (a time in the
num is a
decimal number and
s, m, h,
seconds, minutes, hours, days or weeks respectively. For example, specify -3d
for three days.
Month (name or
its first three letters), day, hour (24-hour convention), year (fully specified
year or last two digits). This example notifies the backup server that its main
server went down at 12 o’clock midnight on October 31, 2002:
nrcmd> failover-pair dhcp2.example.com. setPartnerDown -3d
nrcmd> failover-pair dhcp2.example.com. setPartnerDown Oct 31 00:00:00 2001
you specify a date and time in the CLI, enter the time that is local to the
If the server is running in a different time zone than this process, disregard
the time zone where the server is running and use local time instead.