How High Availability Works
The following figure shows the main components and process flow for a Prime Infrastructure High Availability (HA) setup with the primary server in the active state.
Figure 9-1 Prime Infrastructure High Availability (HA) Architecture
An HA deployment consists of two Prime Infrastructure servers: a primary and a secondary. Under normal circumstances, the primary server is active and manages the network. The corresponding secondary server is passive. The secondary server is in constant communication with the primary server and monitoring the primary server’s status. The secondary also has a complete copy of the data on the primary, but it does not actively manage the network until the primary fails. When the primary fails, the secondary takes over (you can trigger this manually, which is recommended, or have it triggered automatically). You use the secondary server to manage the network while working to restore access to the primary server. When the primary is available again, you can initiate a failback operation and resume network management via the primary.
If you choose to deploy the primary and secondary servers on the same IP subnet, you can configure your devices to send a notifications to Prime Infrastructure at a single virtual IP address. If you choose to disperse the two servers geographically, such as to facilitate disaster recovery, you will need to configure your devices to send notifications to both servers.
Related Topics
About the Primary and Secondary Servers
In any Prime Infrastructure HA implementation, for a given instance of a primary server, there must be one and only one dedicated secondary server.
Typically, each HA server has its own IP address or host name. If you place the servers on the same subnet, they can share the same IP using Virtual IP Addressing, which simplifies device configuration.
Once HA is set up, you should avoid changing the IP addresses or host names of the HA servers, as this will break the HA setup (see Resetting the Server IP Address or Host Name).
Sources of Failure
Prime Infrastructure servers can fail due to issues in one or more of the following areas:
-
Application Processes
: Failure of one or more of the Prime Infrastructure server processes, including NMS Server, MATLAB, TFTP, FTP, and so on. You can view the operational status of each of these application processes by running the ncs status command through the admin console.
-
Database Server
: One or more database-related processes could be down. The Database Server runs as a service in Prime Infrastructure.
-
Network
: Problems with network access or reachability issues.
-
System
: Problems related to the server's physical hardware or operating system.
-
Virtual Machine
(VM): Problems with the VM environment on which the primary and secondary servers were installed (if HA is running in a VM environment).
File and Database Synchronization
Whenever the HA configuration determines that there is a change on the primary server, it synchronizes this change with the secondary server. These changes are of two types:
1.
Database
: These include database updates related to configuration, performance and monitoring data.
2.
File
: These include changes to configuration files.
Database changes are synchronized with the help of the Oracle Recovery Manager (RMAN). RMAN creates the active and standby database and synchronizes the databases when there is any change.
File changes are synchronized using the HTTPS protocol. File synchronization is done either in:
-
Batch
: This category includes files that are not updated frequently (such as license files). These files are synchronized once every 500 seconds.
-
Near Real-Time
: Files that are updated frequently fall under this category. These files are synchronized once every 11 seconds.
By default, the HA framework is configured to copy all the required configuration data, including:
-
Report configurations
-
Configuration Templates
-
TFTP-root
-
Administration settings
-
Licensing files
-
Key store
HA Server Communications
The primary and secondary HA servers exchange the following messages in order to maintain the health of the HA system:
-
Database Sync: Includes all the information necessary to ensure that the databases on the primary and secondary servers are running and synchronized.
-
File Sync: Includes frequently updated configuration files. These are synchronized every 11 seconds, while other infrequently updated configuration files are synchronized every 500 seconds.
-
Process Sync: Ensures that application- and database-related processes are running. These messages fall under the Heartbeat category.
-
Health Monitor Sync: These messages check for the following failure conditions:
– Network failures
– System failures (in the server hardware and operating system)
– Health Monitor failures
Health Monitor Process
Health Monitor (HM) is the main component managing HA operations. Separate instances of HM run as an application process on both the primary and the secondary server. HM performs the following functions:
-
Synchronizes database and configuration data related to HA (this excludes databases that sync separately using Oracle Data Guard).
-
Exchanges heartbeat messages between the primary and secondary servers every five seconds, to ensure communications are maintained between the servers.
-
Checks the available disk space on both servers at regular intervals, and generates events when storage space runs low.
-
Manages, controls and monitors the overall health of the linked HA servers. If there is a failure on the primary server then it is the Health Monitor’s job to activate the secondary server.
Health Monitor Web Page
You control HA behavior using the Health Monitor web page. Each Health Monitor instance running on the primary server or secondary server has its own web page. The following figure shows an example of the Health Monitor web page for a primary server in the “Primary Active” state.
Figure 9-2 Health Monitor Web Page (Primary Server)
|
Settings
area displays Health Monitor state and configuration detail in five separate sections.
|
|
Status indicates current functional status of the HA setup (green check mark indicates that HA is on and working).
|
|
Events table displays all current HA-related events, in chronological order, with most recent event at the top.
|
|
Secondary IP Address identifies the IP of the peer server for this primary server (on the secondary server, this field is labeled “Primary IP Address”).
|
|
State shows current HA state of the server on which this instance of Health Monitor is running.
|
|
Logging lets you change the logging level (your choice of Error, Informational, or Trace). You must press Save to change the logging level.
|
|
Failover Type shows whether you have Manual or Automatic failover configured.
|
|
Action shows actions you can perform, such as failover or failback. Action buttons are enabled only when Health Monitor detects HA state changes needing action.
|
|
Logs
area lets you download Health Monitor log files.
|
|
Identifies the HA server whose Health Monitor web page you are viewing.
|
Virtual IP Addressing
Under normal circumstances, you configure the devices that you manage using Prime Infrastructure to send their syslogs, SNMP traps and other notifications to the Prime Infrastructure server’s IP address. When HA is implemented, you will have two separate Prime Infrastructure servers, with two different IP addresses. If we fail to reconfigure devices to send their notifications to the secondary server as well, then when the secondary Prime Infrastructure server goes into Active mode, none of these notifications will be received by the secondary server.
To avoid this additional device configuration overhead, HA supports use of a virtual IP that both servers can share as the Management Address. The two servers will switch IPs as needed during failover and failback processes. At any given time, the virtual IP Address will always point to the correct Prime Infrastructure server.
You can enable virtual IP addressing during HA setup, by specifying that you want to use this feature and then specifying the virtual IPv4 and IPv6 addresses you want to the servers to use (see Setting Up High Availability).
Note that you cannot use this feature unless the addresses for both of the HA servers and the virtual IP are all in the same subnet. This can have an impact on how you choose to deploy your HA servers (see Planning HA Deployments and Using the Local Model).
Planning HA Deployments
Prime Infrastructure’s HA feature supports the following deployment models:
-
Local
: Both of the HA servers are located on the same subnet (giving them Layer 2 proximity), usually in the same data center.
-
Campus
: Both HA servers are located in different subnets connected via LAN. Typically, they will be deployed on a single campus, but at different locations within the campus.
-
Remote
: Each HA server is located in a separate, remote subnet connected via WAN. Each server is in a different facility. The facilities are geographically dispersed across countries or continents.
The following sections explain the advantages and disadvantage of each model, and discusses underlying restrictions that affect all deployment models.
HA will function using any of the supported deployment models. The main restriction is on HA’s performance and reliability, which depends on the bandwidth and latency criteria discussed in Network Throughput Restrictions on HA. As long as you are able to successfully manage these parameters, it is a business decision (based on business parameters, such as cost, enterprise size, geography, compliance standards, and so on) as to which of the available deployment models you choose to implement.
Related Topics
Network Throughput Restrictions on HA
Prime Infrastructure HA performance is always subject to the following limiting factors:
-
The net bandwidth available to Prime Infrastructure for handling all operations. These operations include (but are not restricted to) HA registration, database and file synchronization, and triggering failback.
-
The net latency of the network across the links between the primary and secondary servers. Irrespective of the physical proximity of these two servers, high latency on these links can affect how Prime Infrastructure maintains sessions between the primary and secondary servers.
-
The net throughput that can be delivered by the network that connects the primary and secondary servers. Net throughput varies with the net bandwidth and latency, and can be considered a function of these two factors.
These limits apply to at least some degree in every deployment model, although some models are more prone to problems than others. For example: Because of the high level of geographic dispersal, the Remote deployment model is more likely to have problems with both bandwidth and latency. But both the Local and Campus models, if not properly configured, are also highly susceptible to problems with throughput, as they can be saddled by low bandwidth and high latency on networks with high usage.
You will rarely see throughput problems affecting failover, as the two HA servers are in more or less constant communication and the database changes are replicated quickly. Most failovers take approximately 20 minutes. You will encounter the impact of these limiting factors most often during failback operations, where changes in the secondary server’s database must be replicated back to the primary server all at once. Variations in net throughput during failback, irrespective of database size or other factors, can mean the difference between a failback operation that completes successfully in under an hour, and one that fails after as long as six or seven hours.
Cisco has tested the impact of bandwidth and latency on HA throughput for the Remote model, which is usually the worst case for these limiting factors. For acceptable performance during failback with Remote deployments, Cisco recommends that you ensure that the available network bandwidth between the primary and secondary servers is at least 100 Mbps, with network latency of no more than 200 milliseconds. Under these conditions, with a database size of approximately 100 GB (generating a backup file size of around 10 GB), failback will take approximately 4.5 hours. Increasing the network bandwidth to above 450 Mbps, with network latency reduced to the sub-millisecond range, can shorten the failback time to approximately 1.5 hours or less.
Using the Local Model
The main advantage of the Local deployment model is that it permits use of a virtual IP address as the single management address for the system. Users can use this virtual IP to connect to Prime Infrastructure, and devices can use it as the destination for their SNMP trap and other notifications.
The only restriction on assigning a virtual IP address is to have that IP address in the same subnet as the IP address assignment for the primary and secondary servers. For example: If the primary and secondary servers have the following IP address assignments within the given subnet, the virtual IP address for both servers can be assigned as follows:
-
Subnet mask: 255.255.255.224 (/27)
-
Primary server IP address: 10.10.101.2
-
Secondary server IP address: 10.10.101.3
-
Virtual IP address: 10.10.101.[4-30] e.g., 10.10.101.4. Note that the virtual IP address can be any of a range of addresses that are valid and unused for the given subnet mask.
In addition to this main advantage, the Local model also has the following advantages:
-
Usually provides the highest bandwidth and lowest latency.
-
Simplified administration.
-
Device configuration for forwarding syslogs and SNMP notifications is much easier.
The Local model has the following disadvantages:
-
Being co-located in the same data center exposes them to site-wide failures, including power outages and natural disasters.
-
Increased exposure to catastrophic site impacts will complicate business continuity planning and may increase disaster-recovery insurance costs.
Using the Campus Model
The Campus model assumes that the deploying organization is located at one or more geographical sites within a city, state or province, so that it has more than one location forming a “campus”. This model has the following advantages:
-
Usually provides bandwidth and latency comparable to the Local model, and better than the Remote model.
-
Is simpler to administer than the Remote model.
The Campus model has the following disadvantages:
-
More complicated to administer than the Local model.
-
Does not permit use of a virtual IP address as the single management address for the system, so it requires more device configuration (see What If I Cannot Use Virtual IP Addressing?).
-
May provide lower bandwidth and higher latency than the Local model. This can affect HA reliability and may require administrative intervention to remedy (see Network Throughput Restrictions on HA).
-
While not located at the same site, it will still be exposed to city- , state- , or province-wide disasters. This may complicate business continuity planning and increase disaster-recovery costs.
Using the Remote Model
The Remote model assumes that the deploying organization has more than one site or campus, and that these locations communicate across geographical boundaries by WAN links. It has the following advantages:
-
Least likely to be affected by natural disasters. This is usually the least complex and costly model with respect to business continuity and disaster recovery.
-
May reduce business insurance costs.
The Remote model has the following disadvantages:
-
More complicated to administer than the Local or Campus models.
-
Does not permit use of a virtual IP address as the single management address for the system, so it requires more device configuration (see What If I Cannot Use Virtual IP Addressing?).
-
Usually provides lower bandwidth and higher latency than the other two models. This can affect HA reliability and may require administrative intervention to remedy (see Network Throughput Restrictions on HA).
What If I Cannot Use Virtual IP Addressing?
Depending on the deployment model you choose, not configuring a virtual IP address may result in the administrator having to perform some additional steps in order to ensure that syslogs and SNMP notifications are forwarded to the secondary server in case of a failover from the primary to the secondary server. The usual method is to configure the devices to forward all syslogs and traps to both servers, usually via forwarding them to a given subnet or range of IP addresses that includes both the primary and secondary server.
This configuration work should be done at the same time HA is being set up: that is, after the secondary server is installed but before HA registration. It must be completed before a failover so that the chance of losing data is eliminated or reduced. Not using a virtual IP address entails no change to the secondary server install procedure. The primary and secondary servers still need to be provisioned with their individual IP addresses, as normal.
Automatic Versus Manual Failover
Configuring HA for Automatic failover reduces the need for network administrators to manage HA. It also reduces the time taken to respond to the conditions that provoked the failover, since it brings up the secondary server automatically.
However, we recommend that the system be configured for Manual failover under most conditions. Following this recommendation ensures that Prime Infrastructure does not go into a state where it keeps failing over to the secondary server due to intermittent network outages. This scenario is most likely when deploying HA using the Remote model. This model is often especially susceptible to extreme variations in bandwidth and latency (see Planning HA Deployments and Network Throughput Restrictions on HA).
If the failover type is set to Automatic and the network connection goes down or the network link between the primary and secondary servers becomes unreachable, there is also a small possibility that both the primary and secondary servers will become active at the same time. We refer to this as the “split brain scenario”.
To prevent this, the primary server always checks to see if the secondary server is Active. As soon as the network connection or link is restored and the primary is able to reach the secondary again, the primary server checks the secondary server's state. If the secondary's state is Active, then the primary server goes down on its own. Users can then trigger a normal, manual failback to the primary server (see Triggering Failback).
Note that this scenario
only
occurs when the primary HA server is configured for Automatic failover. Configuring the primary server for Manual failover eliminates the possibility of this scenario. This is another reason why we recommend Manual failover configuration.
Automatic failover is especially ill-advised for larger enterprises. If a particular HA deployment chooses to go with Automatic failover anyway, an administrator may be forced to choose between the data that was newly added to the primary or to the secondary. This means, essentially, that there is a possibility of data loss whenever a split-brain scenario occurs. For help dealing with a split-brain scenario if it does occur, see Recovering From Split-Brain Scenario.
To ensure that HA is managed correctly, we recommend that administrators always confirm the overall health of the HA deployment before initiating failover or failback, including:
-
The current state of the primary.
-
The current state of the secondary.
-
The current state of the connectivity between the two servers.
Setting Up High Availability
To use the HA capabilities in Prime Infrastructure, you must:
1. Install a second Prime Infrastructure server, which will run as your secondary server.
2. Configure High Availability mode on the primary server.
If you install the primary server in FIPS mode, the secondary server must also be installed in FIPS mode. You will also need to generate a valid, signed SSL certificate and import it into both the primary and secondary servers before performing HA registration (see Setting Up HA in FIPS Mode for details).
Related Topics
Before You Begin Setting Up High Availability
Before you begin, you will need:
-
The Prime Infrastructure installation software. You will use this software to create the secondary HA server. The version of this software must match the version of Prime Infrastructure installed on your primary server. You can use the CLI
show version
command to verify the current version of the primary server software.
-
If you have applied patches to your primary server, you must also patch the secondary server to the same level. Choose
Administration > Software
to see a list of the patches applied to the primary server. Then, after setting up High Availability, follow the procedure in Patching Paired High Availability Servers to patch the secondary server to the same level as the primary server.
-
A secondary server with hardware and software specifications that match or exceed the requirements for your primary server. For example: If your primary server was installed as a Prime Infrastructure Standard size OVA, your secondary server must also be installed as a Standard server, and must meet or exceed all requirements given for Standard size servers in the Cisco Prime Infrastructure Quick Start Guide.
-
The IP address or host name of the secondary server. You will need these when configuring HA on the primary server. Note that if you plan on using the virtual IP feature (see Virtual IP Addressing), the secondary server must be on the same subnet as the primary server.
-
The virtual IPv4 and IPv6 (if used) IP address you want to use as the virtual IP for both servers. This is required only if you plan to use the virtual IP feature.
-
An authentication key of any length. It must contain at least three of the following types of characters: lowercase letters, uppercase letters, digits and special characters. You will enter this authentication key when you install the secondary server. The HA implementation uses this key to authenticate communications between the primary and secondary servers. Administrators also use the key to configure HA in the primary server, and to log on to the secondary server's Health Monitor page to monitor the HA implementation and troubleshoot problems with it.
-
A Prime Infrastructure user ID with Administrator privileges on the primary server.
-
A valid email address to which HA state-change notifications can be set. Prime Infrastructure will send email notifications for the following changes: HA registration, failure, failover, and failback.
-
Sufficient network bandwidth on the link between the two servers, with the lowest latency achievable. Failure to provide acceptable link quality will interfere with data replication and may lead to HA failures. See Network Throughput Restrictions on HA.
-
If there is a firewall configured between the primary and the secondary servers, ensure that the firewall permits incoming and outgoing TCP/UDP on the following ports:
– 8082: Used by the Health Monitor process to exchange heartbeat messages
– 1522: Used by Oracle to synchronize data
Installing the Secondary Server
If your primary server has been patched, be sure to apply the same patches to your secondary server after installation and before registering HA on the primary server.
If you installed the primary server in FIPS mode, the secondary server must also be installed in FIPS mode. You will also need to generate a valid, signed SSL certificate and import it into both the primary and secondary servers before performing HA registration (see Setting Up HA in FIPS Mode). Once you have installed the primary and secondary servers in FIPS mode and applied the SSL certificates, the HA configuration will use SSH encryption during all data transfers and other inter-server communications.
Make sure you have already decided on an authentication key, as explained in Before You Begin Setting Up High Availability
To install the secondary server, follow these steps:
Step 1 Begin installing the Prime Infrastructure server software on your secondary server just as you would for a primary server. For instructions on installing the server, see the Cisco Prime Infrastructure Quick Start Guide
Step 2 During the installation, you will be prompted as follows:
Will this server be used as a secondary for HA? (yes/no)
Enter
yes
at the prompt.
Step 3 You will then be prompted for the HA authentication key, as follows:
Enter Authentication Key:
Enter the authentication key at the prompt. Enter it again at the confirmation prompt.
Step 4 When the secondary server is installed:
a. Use the CLI
show version
command on both servers, to verify that they are at the same version and patch level (see Checking Prime Infrastructure Version and Patch Status).
b. Register HA on the primary server (see Registering High Availability on the Primary Server).
Registering High Availability on the Primary Server
You always register HA on the primary server. The primary server needs no configuration during installation in order to participate in the HA configuration. The primary only needs to have the IP address or host name of the secondary server, plus the authentication key you set during the secondary installation, an email address for notifications, and the Failover Type. Note that you follow these same steps when re-registering HA.
If your primary and secondary servers are installed in FIPS mode, you must generate a valid, signed SSL certificate and import it into both the primary and secondary servers before registering HA mode on the primary server. See Setting Up HA in FIPS Mode for details.
Step 1 Log in to Prime Infrastructure with a user ID and password that has administrator privileges.
Step 2 From the menu, select
Administration > System Settings > High Availability
. Prime Infrastructure displays the HA status page.
Step 3 Select
HA Configuration
and then complete the fields as follows:
-
Secondary Server
: Enter the IP address or the host name of the secondary server.
-
Authentication Key
: Enter the authentication key password you set during the secondary server installation.
-
Email Address
: Enter the address (or comma-separated list of addresses) to which notification about HA state changes should be mailed. If you have already configured email notifications using the Mail Server Configuration page (see Configuring Email Settings), the email addresses you enter here will be appended to the list of addresses already configured for the mail server.
-
Failover Type
: Select either
Manual
or
Automatic
. We recommend that you select
Manual
(see Automatic Versus Manual Failover).
Step 4 If you are using the virtual IP feature (see Virtual IP Addressing): Select the
Virtual IP
checkbox, then complete the additional fields as follows:
-
IPv4 Address
: Enter the virtual IPv4 address you want both HA servers to use.
-
IPv6 Address
: (Optional) Enter the IPv6 address you want both HA servers to use.
Note that virtual IP addressing will
not
work unless both servers are on the same subnet.
Step 5 Click
Save
to save your changes. Prime Infrastructure initiates the HA registration process. When registration completes successfully,
Configuration Mode
will display the value
HA Enabled
.
Checking High Availability Status
You can check on the status of the High Availability enabled on a Prime Infrastructure server.
Step 1 Open a CLI session with the Prime Infrastructure server (see Connecting Via CLI).
Step 2 Enter the following command to display the current status of Prime Infrastructure HA processes:
PIServer/admin#
ncs ha status
What Happens During HA Registration
Once you finish entering configuration information and click the Save button on the HA Configuration page, the primary and secondary HA servers will register with each other and begin copying all database and configuration data from the primary to the secondary server.
The time required to complete the copying is a function of the amount of database and configuration data being replicated and the available bandwidth on the network link between the two servers. The bigger the data and the slower the link, the longer the replication will take. For a relatively fresh server (in operation for a few days), with 100 devices and a 1 Gbps link, copying will take approximately 25 minutes.
During HA registration, the primary and secondary server state will go through the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: HA Not Configured
|
From: HA Not Configured
|
To: HA Initializing
|
To: HA Initializing
|
To: Primary Active
|
To: Secondary Syncing
|
You can view these state change on the HA Status page for the primary server, or the Health Monitor web pages for each of the two servers. If you are using the HA Status page, click
Refresh
to view progress. Once the data is fully synchronized, the HA Status page will be updated to show the current state as “Primary Active”, as shown in the following figure.
Figure 9-3 HA Status Page: Primary Active
After registration is initiated, there is a small window of time (usually less than five minutes) during which the database process on the primary server is restarted. During this period, the database will be offline. Once the database server is restarted, Prime Infrastructure initiates synchronization between the primary and the secondary HA servers. The synchronization should not have any impact on user activity, although users may observe slow system response until the synchronization is complete. The length of the synchronization is a function of the total database size and, is handled at the Oracle level by the RMAN related processes. There is no impact on the execution of user- or system-related activity during the sync.
During registration, Prime Infrastructure performs a full database replication to the secondary server.
Patching Paired High Availability Servers
If your current Prime Infrastructure implementation has High Availability servers that are not at the same patch level, or you have a new patch you must install on both your HA servers, follow the steps below. You must start the patch install with the primary server in “Primary Active” state and the secondary server in “Secondary Syncing” state.
Patching of primary and secondary HA servers takes approximately one hour. During that period, both servers will be down.
Step 1 Ensure that your HA implementation is enabled and ready for update:
a. Log in to the primary server using an ID with Administrator privileges.
b. Select
Administration > System Settings > High Availability
, The primary server state displayed on the HA Status page should be “Primary Active”.
c. Select
HA Configuration
. The current Configuration Mode should show “HA Enabled”. We recommend that you set the Failover Type to “manual” during the patch installation.
d. Access the secondary server’s Health Monitor (HM) web page by pointing your browser to the following URL:
https://
ServerIP
:8082
where
ServerIP
is the IP address or host name of the secondary server.
e. You will be prompted for the authentication key entered when HA was enabled. Enter it and click
Login
.
f. Verify that the secondary server state displayed on the HM web page is in the “Secondary Syncing” state.
Step 2 Download the patch and install it on the primary server:
a. Point your browser to the
software patches listing for Cisco Prime Infrastructure 2.2.
b. Click the
Download
button for the patch file you need to install (the file name ends with a UBF file extension), and save the file locally.
c. Log in to the primary server using an ID with administrator privileges and choose
Administration > Software Update
.
d. Click
Upload Update File
and browse to the location where you saved the patch file.
e. Click
OK
to upload the file.
f. When the upload is complete: On the Software Upload page, verify that the Name, Published Date and Description of the patch file are correct.
g. Select the patch file and click
Install
. When the installation is complete, you will see a message confirming this.
h. After the installation is complete on the primary server, verify that the Status of Updates table on the Software Update page shows “Installed” or “Installed [Requires Restart]” for the patch.
Step 3 Install the same patch on the secondary server:
a. Access the secondary server’s HMweb page by pointing your browser to the following URL:
https://
ServerIP
:8082
where
ServerIP
is the IP address or host name of the secondary server.
b. You will be prompted for the authentication key entered when HA was enabled. Enter it and click
Login
.
c. Click the HM web page’s
Software Update
link. You will be prompted for the authentication key a second time. Enter it and click
Login
again.
d. Click
Upload Update File
and browse to the location where you saved the patch file.
e. Click
OK
to upload the file.
f. When the upload is complete: On the Software Upload page, confirm that the Name, Published Date and Description of the patch file are correct.
g. Select the patch file and click
Install
. When the installation is complete, you will see a message confirming this.
h. After the installation is complete on the secondary server, verify that the Status of Updates table on the Software Update page shows “Installed” or “Installed [Requires failover]” for the patch.
Step 4 Stop the servers in the following sequence, using the commands explained in Restarting Prime Infrastructure:
a. On the secondary server, run the
ncs stop
command.
b. On the primary server, run the
ncs stop
command.
Step 5 Re-start and monitor the servers in the following sequence
a. On the secondary server, run the following commands in this order:
– Run the
ncs start
command (see Restarting Prime Infrastructure) to start the secondary server. Wait for the processes on the secondary to restart.
– Run the
ncs status
command (see Checking Prime Infrastructure Server Status) to verify that the secondary’s processes have re-started. The only process you should see started on the secondary is “Health Monitor”.
– Run the
ncs ha status
command (see Checking High Availability Status) to verify that the secondary state is “Secondary Lost Primary”.
Once the secondary server is in “Secondary Lost Primary” state, you can go on to the next step.
b. On the primary server, run the following commands in this order:
– Run the
ncs start
command to restart the primary server. Wait for the processes on the primary to restart.
– Run the
ncs status
command to verify that the primary’s Health Monitor and other processes have re-started.
Once all the processes on the primary are up and running, automatic HA registration will be triggered between the primary and secondary servers. This normally completes after a few minutes. You will also receive email notification that registration has started.
Step 6 Once registration completes, verify the patch installation as follows:
a. Run the
ncs ha status
command on both the primary and secondary servers. You should see the primary server state change from “HA Initializing” to “Primary Active”. You should see the secondary server state change from “Secondary Lost Primary” to “Secondary Syncing”.
b. Log in to the primary server and access its Software Update page as you did in step 2, above. The “Status” column on the Status of Updates > Status tab should show “Installed” for the patch.
c. Access the secondary server’s Health Monitor page as you did in step 3, above. The “Status” column on the Status of Updates > Status tab should show “Installed” for the patch.
Patching New High Availability Servers
If you are setting up a new Prime Infrastructure High Availability (HA) implementation and your new servers are not at the same patch level, follow the steps below to install patches on both servers and bring them to the same patch level.
Step 1 Download the patch and install it on the primary server:
a. Point your browser to the
software patches listing for Cisco Prime Infrastructure 2.2.
b. Click the
Download
button for the patch file you need to install (the file name ends with a UBF file extension), and save the file locally.
c. Log in to the primary server using an ID with administrator privileges and choose
Administration > Software Update
.
d. Click
Upload Update File
and browse to the location where you saved the patch file.
e. Click
OK
to upload the file.
f. When the upload is complete: On the Software Upload page, verify that the Name, Published Date and Description of the patch file are correct.
g. Select the patch file and click
Install
. When the installation is complete, you will see a message confirming this.
h. After the installation is complete on the primary server, verify that the Status of Updates table on the Software Update page shows “Installed” or “Installed [Requires Restart]” for the patch.
i. Before you continue, restart the primary server as follows:
– Use the
ncs stop
and
ncs start
commands to restart the server (see Restarting Prime Infrastructure).
– Use the
ncs status
command to verify that the primary’s Health Monitor and other processes have restarted (see Checking Prime Infrastructure Server Status).
Step 2 Install the same patch on the secondary server:
a. Access the secondary server’s Health Monitor (HM) web page by pointing your browser to the following URL:
https://
ServerIP
:8082
where
ServerIP
is the IP address or host name of the secondary server.
b. You will be prompted for the secondary server authentication key. Enter it and click
Login
.
c. Click the HM web page’s
Software Update
link. You will be prompted for the authentication key a second time. Enter it and click
Login
again.
d. Click
Upload Update File
and browse to the location where you saved the patch file.
e. Click
OK
to upload the file.
f. When the upload is complete: On the Software Upload page, confirm that the Name, Published Date and Description of the patch file are correct.
g. Select the patch file and click
Install
. When the installation is complete, you will see a message confirming this.
h. After the installation is complete on the secondary server, verify that the Status of Updates table on the Software Update page shows “Installed” or “Installed [Requires failover]” for the patch.
i. Before you continue, restart the secondary server as follows:
– Use the
ncs stop
and
ncs start
commands (see Restarting Prime Infrastructure) to restart the server.
– Use the
ncs status
command (see Checking Prime Infrastructure Server Status) to verify that the secondary’s Health Monitor process has restarted.
Step 3 Verify that the patch status is the same both servers, as follows:
a. Log in to the primary server and access its Software Update page as you did in step 1, above. The “Status” column should show “Installed” instead of “Installed [Requires Restart]” for the installed patch.
b. Access the secondary server’s Health Monitor page as you did in step 2, above. The “Status” column should show “Installed” instead of “Installed [Requires Failover]” for the installed patch
Step 4 Register the servers as explained in Registering High Availability on the Primary Server.
Monitoring High Availability
Once you have configured HA (see Registering High Availability on the Primary Server), most of your interactions with it will involve accessing the server Health Monitor web page and responding to email notifications by triggering a failover or failback. Special cases are also covered in this section.
Related Topics
Accessing the Health Monitor Web Page
You can access the Health Monitor web page for the primary or secondary server at any time by pointing your browser to the following URL:
https://
ServerIP
:8082
where
ServerIP
is the IP address or host name of the primary or secondary server whose Health Monitor web page you want to see.
You can also access the Health Monitor web page for the currently active server by logging in to the Prime Infrastructure GUI, selecting
Administration > High Availability > HA Status
, and then clicking
Launch Health Monitor
.
Triggering Failover
Failover is the process of activating the secondary server in response to a detected failure on the primary.
Health Monitor (HM) detects failure conditions using the heartbeat messages that the two servers exchange (see How High Availability Works). If the primary server is not responsive to three consecutive heartbeat messages from the secondary, it is considered to have failed. During the health check, HM also checks the application process status and database health; if there is no proper response to these checks, these are also treated as having failed.
The HA system takes approximately 10 to 15 seconds to detect a process failure on the primary server and initiate a failover. If the secondary server is unable to reach the primary server due to a network issue, it might take more time to initiate a failover. In addition, it may take additional time for the application processes on the secondary server to be fully operational.
As soon as HM detects the failure, it sends an email notification. The email includes the failure status along with a link to the secondary server's Health Monitor web page.
If HA is currently configured for automatic failover (see Registering High Availability on the Primary Server), the secondary server will activate automatically and there is no action you need to perform.
If HA is currently configured for manual failover, you must trigger the failover as follows:
Step 1 Access the secondary server's Health Monitor web page using the web link given in the email notification, or using the steps in Accessing the Health Monitor Web Page.
Step 2 Trigger the failover by clicking the
Failover
button.
Triggering Failback
Failback is the process of re-activating the primary server once it is back online. It also transfers Active status from the secondary server to the primary, and stops active network monitoring processes on the secondary.
When a failback is triggered, the secondary server replicates its current database information and updated files to the primary server. The time it takes to complete the failback from the secondary server to the primary server will depend on the amount of data that needs to be replicated and the available network bandwidth.
Once the data has begun replicating successfully, HA changes the state of the primary server to Primary Active and the state of the secondary server to Secondary Syncing. Once all the data is copied, all the processes on the secondary server will be shut down except for the Health Monitor and database.
During failback, the secondary server is available except during the period when processes are started on the primary and stopped on the secondary. Both servers’ Health Monitor web pages are accessible for monitoring the progress of the failback. Additionally, users can also connect to the secondary server to access all normal functionality, except for these caveats:
-
Do not initiate configuration or provisioning activity while the failback is in progress.
-
Be aware that, after a successful failback, the secondary server will go down and control will switch over to the primary server. During this process, Prime Infrastructure will be inaccessible to the users for a few moments.
You must always trigger failback manually, as follows:
Step 1 Access the secondary server's Health Monitor web page using the link given in the email notification, or using the steps in Accessing the Health Monitor Web Page.
Step 2 Trigger the failback by clicking the
Failback
button.
Responding to Other HA Events
All the HA related events are displayed on the HA Status page, the Health Monitor web pages, and under the Prime Infrastructure Alarms and Events page. Most events require no response from you other than Triggering Failover and Triggering Failback. A few events are more complex, as explained in the related topics.
Related Topics
HA Registration Fails
If HA registration fails, you will see the following HA state-change transitions for each server (instead of those detailed in What Happens During HA Registration):
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: HA Initializing
|
From: HA Initializing
|
To: HA Not Configured
|
To: HA Not Configured
|
To recover from failed HA registration, follow the steps below.
Step 1 Use ping and other tools to check the network connection between the two Prime Infrastructure servers. Confirm that the secondary server is reachable from the primary, and vice versa.
Step 2 Check that the gateway, subnet mask, virtual IP address (if configured), server hostname, DNS, NTP settings are all correct.
Step 3 Check that the configured DNS and NTP servers are reachable from the primary and secondary servers, and that both are responding without latency or other network-specific issues.
Step 4 Check that all Prime Infrastructure licenses are correctly configured.
Step 5 Once you have remedied any connectivity or setting issues, try the steps in Registering High Availability on the Primary Server again.
Network is Down (Automatic Failover)
If there is a loss of network connectivity between the two Prime Infrastructure servers, you will see the following HA state-change transitions for each server, assuming that the Failover Type is set to “Automatic”:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Active
|
From: Secondary Syncing
|
To: Primary Lost Secondary
|
To: Secondary Lost Primary
|
To: Primary Lost Secondary
|
To: Secondary Failover
|
To: Primary Lost Secondary
|
To: Secondary Active
|
You will get email notification that the secondary has lost the primary. Once the automatic failover is completed, you will get another email notification that the secondary server is now active.
In this case, you will want to recover by following the steps below.
Step 1 Check on and restore network connectivity between the two servers. Once network connectivity is restored, and the primary server can detect that the secondary is active, all services on the primary will be stopped. You will see the following state changes:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Lost Secondary
|
From: Secondary Active
|
To: Primary Failover
|
To: Secondary Active
|
Step 2 Trigger a failback from the secondary to the primary (see Triggering Failback). You will then see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Failover
|
From: Secondary Active
|
To: Primary Failback
|
To: Secondary Failback
|
To: Primary Failback
|
To: Secondary Post Failback
|
To: Primary Active
|
To: Secondary Syncing
|
Network is Down (Manual Failover)
If there is a loss of network connectivity between the two Prime Infrastructure servers, you will see the following HA state-change transitions for each server, assuming that the Failover Type is set to “Manual”:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Active
|
From: Secondary Syncing
|
To: Primary Lost Secondary
|
To: Secondary Lost Primary
|
You will get email notifications that each server has lost the other. In this case, you will want to follow the steps below.
Step 1 Check on and restore network connectivity between the two servers.
Step 2 As soon as network connectivity is restored, use the HM web page for the secondary server to trigger a failover from the primary to the secondary server. You will see the following state changes:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Lost Secondary
|
From: Secondary Lost Primary
|
To: Primary Lost Secondary
|
To: Secondary Failover
|
To: Primary Failover
|
To: Secondary Active
|
Step 3 Once you have received email notification that the secondary is now active, trigger a failback from the secondary to the primary (see Triggering Failback). You will then see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Failover
|
From: Secondary Active
|
To: Primary Failback
|
To: Secondary Failback
|
(no change)
|
To: Secondary Post Failback
|
To: Primary Active
|
To: Secondary Syncing
|
Process Restart Fails (Automatic Failover)
The Prime Infrastructure Health Monitor process is responsible for attempting to restart any Prime Infrastructure server processes that have failed. Generally speaking, the current state of the primary and secondary servers should be “Primary Active” and “Secondary Syncing” at the time any such failures occur.
If HM cannot restart a critical process on the primary server, then the primary server is considered to have failed. If your currently configured Failover Type is “automatic”, you will see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Active
|
From: Secondary Syncing
|
To: Primary Uncertain
|
To: Secondary Lost Primary
|
To: Primary Failover
|
To: Secondary Failover
|
To: Primary Failover
|
To: Secondary Active
|
When this process is complete, you will get an email notification that the secondary server is now active.
In this case, you will want to follow the steps below.
Step 1 Restart the primary server and ensure that it is running. Once the primary is restarted, it will be in the state “Primary Alone”.
Step 2 Trigger a failback from the secondary to the primary (see Triggering Failback). You will then see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Alone
|
From: Secondary Active
|
To: Primary Failback
|
To: Secondary Failback
|
To: Primary Failback
|
To: Secondary Post Failback
|
To: Primary Active
|
To: Secondary Syncing
|
Process Restart Fails (Manual Failover)
The Prime Infrastructure Health Monitor process is responsible for attempting to restart any Prime Infrastructure server processes that have failed. Generally speaking, the current state of the primary and secondary servers should be “Primary Active” and “Secondary Syncing” at the time any such failures occur.
If HM cannot restart a critical process on the primary server, then the primary server is considered to have failed. You will receive an email notification of this failure. If your currently configured Failover Type is “Manual”, you will see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Active
|
From: Secondary Syncing
|
To: Primary Uncertain
|
To: Secondary Lost Primary
|
In this case, you will want to follow the steps below.
Step 1 Trigger on the secondary server a failover from the primary to the secondary (see Triggering Failover). You will then see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Uncertain
|
From: Secondary Syncing
|
To: Primary Failover
|
To: Secondary Failover
|
To: Primary Failover
|
To: Secondary Active
|
Step 2 Restart the primary server and ensure that it is running. Once the primary server is restarted, the primary’s HA state will be “Primary Alone”.
Step 3 Trigger a failback from the secondary to the primary (see Triggering Failback). You will then see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Alone
|
From: Secondary Active
|
To: Primary Failback
|
To: Secondary Failback
|
To: Primary Failback
|
To: Secondary Post Failback
|
To: Primary Active
|
To: Secondary Syncing
|
Primary Server Restarts During Sync (Manual)
If the primary Prime Infrastructure server is restarted while the secondary server is syncing, you will see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Active
|
From: Secondary Syncing
|
To: Primary Alone
|
To: Secondary Lost Primary
|
To: HA Initializing
|
To: HA Initializing
|
To: Primary Active
|
To: Secondary Syncing
|
The “Primary Alone” and the initialization states occur immediately after the primary comes back online. No administrator response should be required.
Secondary Server Restarts During Sync
If the secondary Prime Infrastructure server is restarted while syncing with the primary server, you will see the following state transitions:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: Primary Active
|
From: Secondary Syncing
|
To: Primary Lost Secondary
|
From: Secondary Lost Primary
|
To: HA Initializing
|
To: HA Initializing
|
To: Primary Active
|
To: Secondary Syncing
|
No administrator response should be required.
Both HA Servers Are Down
If both the primary and secondary servers are shut down at the same time, you can recover by bringing them back up in the correct order, as explained in the steps below.
Step 1 Restart the secondary server and the instance of Prime Infrastructure running on it.
Step 2 When Prime Infrastructure is running on the secondary, access the secondary server’s Health Monitor web page (see Accessing the Health Monitor Web Page). You will see the secondary server transition to the state “Secondary Lost Primary”.
Step 3 Restart the primary server and the instance of Prime Infrastructure running on it. When Prime Infrastructure is running on the primary, the primary will automatically register with the secondary and enable HA. To verify this, access the primary server’s Health Monitor web page. You will see the two servers transition through the following series of HA states:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
To: Primary Alone
|
To: Secondary Lost Primary
|
To: HA Initializing
|
To: HA Initializing
|
To: Primary Active
|
To: Secondary Syncing
|
Replacing the Primary Server
Under normal circumstances, the state of your primary and secondary servers will be “Primary Active” and “Secondary Syncing”, respectively. If the primary server fails for any reason, a failover to the secondary will take place, either automatically or manually.
You may find that restoring full HA access requires you to reinstall the primary server using new hardware. If this happens, you can follow the steps below to bring up the new primary server without data loss.
Step 1 Ensure that the secondary server is currently in “Secondary Active” state. If you have set the Failover Type on the primary server to “manual”, you will need to trigger the failover to the secondary manually (see Triggering Failover).
Step 2 Ensure that the old primary server you are replacing has been disconnected from the network.
Step 3 Ensure that the new primary server is ready for use. This will include connecting it to the network and assigning it the same server IP, subnet mask, gateway as the old primary server. You will also need to enter the same authentication key that you entered when installing the secondary server.
Step 4 Trigger a failback from the secondary to the newly installed primary (see Triggering Failback). You will see the two servers transition through the following series of HA states:
Primary HA State Transitions...
|
Secondary HA State Transitions...
|
From: HA not configured
|
From: Secondary Active
|
To: Primary Failback
|
To: Secondary Failback
|
To: Primary Failback
|
To: Secondary Post Failback
|
To: Primary Active
|
To: Secondary Syncing
|
Recovering From Split-Brain Scenario
As explained in Automatic Versus Manual Failover, the possibility of data loss always exists on the rare occasions when a “split-brain scenario” occurs. The choices and actions available to the Administrator in this case are as follows:
1. Choose to go with the newly added data on the primary and forget the data that was added on the secondary. To choose this option:
a. Once the network is up, the primary will go down and the HA status of the primary server will be “Primary Failover”.
b. Remove HA using the primary or secondary CLI (see Removing HA Via the CLI).
c. Restart the primary server (see Restarting Prime Infrastructure). The primary’s HA status will change to “Primary Alone”.
d. Re-register the secondary with the primary using the primary HA Configuration page (see Registering High Availability on the Primary Server.)
2. Choose to go with the newly added data on the secondary and forget the data that was added on the primary. To choose this option:
a. Once the network is up, the primary will go down and the HA status of the primary server will be “Primary Failover”.
b. Using the web browser, the administrator should confirm that a user can log into the secondary server’s Prime Infrastructure page (for example, https://x.x.x.x:443). Do
not
proceed until this access has been verified.
c. Once access to the secondary is verified, the administrator should initiate a failback from the secondary server's Health Monitor web page (see Triggering Failback). Users can continue to perform monitoring activities on the secondary server until the switchover to the primary is completed.
Setting Up HA in FIPS Mode
If you have installed your primary HA server in FIPS mode, you must also install your secondary server in FIPS mode. You will also need to generate an SSL certificate and import it into both HA servers, as well as your clients communicating with the servers.
The following instructions apply to FIPS mode and to the installation of CA certificates. Note that Prime Infrastructure in non-FIPS mode allows you to use self-signed certificates. Installing in FIPS mode requires you to use CA certificates that are signed by an external registered Certificate Authority (CA).
Note Online Certificate Status Protocol (OCSP) client authentication will not be supported in HA set up.
Related Topics:
About Certificates, Certificate Authorities (CAs), and Certificate Signing Requests (CSRs)
A certificate is an electronic document that identifies a server, company, or another entity, and that associates that identity with a public encryption key.
Certificates can be self-signed or can be attested to by a digital signature from a certificate authority (CA).
A self-signed certificate is an identity certificate that is signed by its own creator. That is, the person who created the certificate also signed off on its legitimacy.
A CA is an entity that validates identities and issues certificates. The certificate issued by the CA binds a particular public key to the name of the entity that the certificate identifies, such as the name of a server or company. Only the public key that the certificate identifies works with the corresponding private key possessed by the entity that the certificate identifies. Certificates help prevent the use of fake public keys by intruders impersonating legitimate entities.
A CSR is a message that an applicant sends to a CA to apply for a digital identity certificate. Before a CSR is created, the applicant first generates a key pair, which keeps the private key secret. The CSR contains information that identifies the applicant, such as a directory name in the case of an X.509 certificate, and the public key chosen by the applicant. The corresponding private key is not included in the CSR, but is used to digitally sign the entire request.
The CSR can be accompanied by other credentials or proofs of identity required by the certificate authority, and the certificate authority can contact the applicant for further information. For the most part, a third-party CA company, such as Entrust or VeriSign, requires a CSR before the company can create a digital certificate.
CSR generation is independent of the device on which you plan to install an external certificate. Therefore, a CSR and a private key file can be generated on any individual machine which supports CSR generation. CSR generation is not switch-dependent or appliance-dependent in this case.
Generating CSRs
To generate a Certificate Service Request (CSR) for a third-party certificate using Cisco Prime Infrastructure:
Step 1 Connect to the primary server via CLI (see Connecting Via CLI). Do not enter “configure terminal” mode.
Step 2 At the command line, enter the following command:
admin#
ncs key genkey -newdn -csr
csrfile
.csr repository
reponame
Where:
-
csrfile
is the name of the new CSR file.
-
reponame
is the location of the Prime Infrastructure repository to which the newly created CSR files should be backed up (p to 80 alphanumeric characters).
The command generates a new key/self-signed certificate pair, and outputs the CSR to the specified filename in the specified repository.
Step 3 Because the command includes the
-newdn
flag, you will be prompted for Distinguished Name fields for the certificate. To avoid browser warnings in future, be sure to specify in the domain name field the final hostname that will be used to access the Prime Infrastructure servers.
Step 4 Once the CSR is generated, submit it to the Certificate Authority.
Generating CSRs: Example
The following example shows how to generate a new RSA encryption public key and CSR certificate files using the Prime Infrastructure server. The example includes responses to Distinguished Name prompts:
admin# ncs key genkey -newdn -csr csrfile.csr repository ncs-sftp-repo
Prime Infrastructure
server is running
Changes will take affect on the next server restart
Enter the domain name of the server: PrimeInfrastructureServer
Enter the name of your organizational unit: Cloud Systems
Enter the name of your organization: Cisco
Enter the name of your city or locality: San Jose
Enter the name of your state or province: California
Enter the two letter code for your country: US
Generating RSA keys
Importing CA Certificates to Prime Infrastructure Servers
Once you have received the signed CA certificate, import it to a trust store in Prime Infrastructure using the Prime Infrastructure key
importcacert
command. You need to perform this task on both the primary and secondary HA servers installed in FIPS mode or registration cannot happen.
The following example shows how to apply the CA certificate file to a trust store in Prime Infrastructure server:
admin# ncs key importcacert alias1 cacertfile repository ncs-sftp-repo
admin# ncs key importsignedcert server.cer repository ncs-sftp-repo
Step 1 Connect to the primary server via CLI (see Connecting Via CLI). Do not enter “configure terminal” mode.
Step 2 At the prompt, enter the following command to import the CA certificate file:
admin#
ncs key importcacert CA-Alias CA.cer repository defaultRepo
If you have more than one CA certificate file, repeat this step for each CA cert file.
Step 3 When you are finished importing all CA cert files, import the CN.cer file into the server:
admin#
ncs key importsignedcert CN.cer repository defaultRepo
Step 4 To restart the Prime Infrastructure server and apply the changes, issue the following two commands in this order:
ncs stop
ncs start
High Availability Reference Information
The following sections supply reference information on HA.
Related Topics
HA Configuration Mode Reference
Table 9-1 High Availability Modes
|
|
HA not configured
|
HA is not configured on this Prime Infrastructure server
|
HA initializing
|
The HA registration process between the primary and secondary server has started.
|
HA enabled
|
HA is enabled between the primary and secondary server.
|
HA alone
|
Primary server is now running alone. HA is enabled, but the primary server is out of sync with the secondary, or the secondary is down or otherwise unreachable.
|
HA State Reference
The following table lists all possible HA states, including those that require no response from you.
Table 9-2 High Availability States
|
|
|
Stand Alone
|
Both
|
HA is not configured on this Prime Infrastructure server
|
Primary Alone
|
Primary
|
Primary restarted after it lost secondary. Only Health Monitor is running in this state.
|
HA Initializing
|
Both
|
HA Registration process between the primary and secondary server has started.
|
Primary Active
|
Primary
|
Primary server is now active and is synchronizing with secondary server.
|
Primary Failover
|
Primary
|
Primary server detected a failure.
|
Primary Failback
|
Primary
|
Failback triggered by the User is currently in progress.
|
Primary Lost Secondary
|
Primary
|
Primary server is unable to communicate with the secondary server.
|
Primary Uncertain
|
Primary
|
Primary server's application processes are not able to connect to its database.
|
Secondary Alone
|
Secondary
|
Primary server is not reachable from secondary after primary server restart.
|
Secondary Syncing
|
Secondary
|
Secondary server is synchronizing the database and configuration files from the primary.
|
Secondary Active
|
Secondary
|
Failover from the primary server to the secondary server has completed successfully.
|
Secondary Lost Primary
|
Secondary
|
Secondary server is not able to connect to the primary server (occurs when the primary fails or network connectivity is lost).
In case of automatic failover from this state, the secondary will automatically move to Active state. In case of a manual failover, the user can trigger a failover to make the secondary active.
|
Secondary Failover
|
Secondary
|
Failover triggered and in progress.
|
Secondary Failback
|
Secondary
|
Failback triggered and in progress (database and file replication is in progress).
|
Secondary Post Failback
|
Secondary
|
This state occurs after failback is triggered, replication of database and configuration files from the secondary to the primary is complete, and Health Monitor has initiated changes of the secondary server's status to Secondary Syncing and the primary server's status to Primary Active. These status changes and associated process starts and stops are in progress.
|
Secondary Uncertain
|
Secondary
|
Secondary server's application processes are not able to connect to secondary server's database.
|
HA State Transition Reference
The following figure details all possible state transitions for the primary server.
Figure 9-4 Primary Server State Transitions
The following figure details all possible state transitions for the secondary server.
Figure 9-5 Secondary Server State Transitions
High Availability CLI Command Reference
The following table lists the CLI commands available for HA management. Log in as admin to run these commands on the primary server (see Connecting Via CLI):
Table 9-3 High Availability Commands
|
|
ncs ha ?
|
Get help with high availability CLI commands
|
ncs ha authkey authkey
|
Update the authentication key for high availability
|
ncs ha remove
|
Remove the High Availability configuration
|
ncs ha status
|
Get the current status for High Availability
|
Resetting the Authentication Key
Prime Infrastructure administrators can change the HA authentication key using the
ncs ha authkey
command. You will need to ensure that the new authorization key meets the password standards (see Before You Begin Setting Up High Availability).
Step 1 Connect to the primary server via CLI (see Connecting Via CLI). Do not enter “configure terminal” mode.
Step 2 Enter the following at the command line:
admin#
ncs ha authkey
MyNewAuthKey
Where
MyNewAuthKey
is the new authorization key.
Removing HA Via the GUI
The simplest method for removing an existing HA implementation is via the GUI, as shown in the following steps. You can also remove the HA setup via the command line (see Removing HA Via the CLI)
Step 1 Log in to Prime Infrastructure with a user ID that has administrator privileges.
Step 2 Select
Administration > Settings > High Availability
.
Step 3 Select
Remove
.
Removing HA Via the CLI
If for any reason you cannot access the Prime Infrastructure GUI on the primary server (see Removing HA Via the GUI), administrators can remove the HA setup via the command line, as follows:
Step 1 Connect to the primary server via CLI (see Connecting Via CLI). Do not enter “configure terminal” mode.
Step 2 Enter the following at the command line:
admin#
ncs ha remove
Removing HA During Restore
Prime Infrastructure does not back up configuration settings related to High Availability.
In order to restore a Prime Infrastructure implementation that is using HA, be sure to restore the backed up data to the primary server only. The restored primary will automatically replicate its data to the secondary server. Running a restore on the secondary server is not needed and will generate an error message.
To restore a Prime Infrastructure implementation that uses HA, follow the steps below.
Step 1 Remove the HA settings from the primary server (see Removing HA Via the GUI).
Step 2 Restore the primary server as needed (see Restoring From Backups).
Step 3 Once the restore is complete, perform the HA registration process again (see Registering High Availability on the Primary Server).
Using HA Error Logging
Error logging for the High Availability feature is disabled by default, to save disk space and maximize performance. If you are having trouble with HA, the best place to begin is by enabling error logging and to examine the log files.
Step 1 View the Health Monitor page for the server having trouble (see Accessing the Health Monitor Web Page).
Step 2 In the
Logging
area, in the
Message Level
dropdown, select the error-logging level you want.
Step 3 Click
Save
.
Step 4 When you want to download the log files: In the
Logs
area, click
Download
. You can open the downloaded log files using any ASCII text editor.
Resetting the Server IP Address or Host Name
Avoid changing the IP address or hostname of the primary or secondary server, if possible. If you must change the IP address or hostname, remove the HA configuration from the primary server before making the change (see Removing HA Via the GUI). When finished, re-register HA (see Registering High Availability on the Primary Server).