The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
A minor failure event is an event that caused a failover and can be cleared without replacing hardware or reimaging the appliance. Some examples include:
•A monitored service failing more than 5 times on the active unit.
•A service failed to start or stopped.
•An external event, such as a network failure.
•A single disk failure is a minor failure. Replace the disk and reboot the appliance. If more than one disk fails, you have to perform a major failure event recovery.
When a failover occurs, clear the cause of the failover and reboot the failed appliance. It will boot to standby and receive data from the active unit. Rebooting the appliance also clears the monitored service fail counters.
If you cannot clear the condition that caused failover, you may have to perform a major event recovery.
Major failure events are events that require the appliance to be reimaged or replaced in order to bring it back into service.
If you need to replace hardware, obtain the replacement hardware before starting the recovery process. If you need to replace an appliance, you will need to obtain and install a new license for the appliance.
Note A single disk failure is a minor failure event. Multiple disk failures are a major failure event.
There are two major recovery procedures, depending upon which appliance failed:
•If a secondary appliance failed, see Recover from Secondary Appliance Failure.
•If a primary appliance failed, see Recover from Primary Appliance Failure.
Prerequisites
This procedure must be performed from the appliance console. You cannot perform this procedure through an SSH session.
To recover from a major failure event, you must:
Step 1 On the pair of appliances that did not fail, make the primary appliance the active appliance.
Step 2 Back up the active appliances in your failover cluster.
Step 3 Revert the active appliances to Standalone mode:
a. Log in to AAI.
b. Choose FAIL_OVER > REVERT.
Step 4 Apply the virtual FQDN and IP address to the primary appliances. This reverts them to the pre-failover configuration.
Step 5 Pair the primary appliances.
The appliances operate as a standard, standalone configuration.
Step 6 Reimage the secondary appliances.
Step 7 Re-configure failover. See Configure Failover for Cisco Digital Signs, for the failover configuration process.
Recovering a failed primary requires some additional steps because you cannot use a secondary appliance as a primary appliance. You must reimage the secondary appliances after converting the failover cluster to standalone mode.
Procedure
Step 1 On the pair of appliances that did not fail, make the primary appliance the active appliance.
Step 2 Back up the active appliances in your failover cluster.
Step 3 Revert the standby appliances to Standalone mode:
a. Log into AAI.
b. Choose FAIL_OVER > REVERT.
Step 4 Revert the active appliances to Standalone mode:
a. Log into AAI.
b. Choose FAIL_OVER > REVERT.
Step 5 Reimage the failed primary appliance and the two standby appliances.
Step 6 Apply the virtual FQDN and IP address to the primary appliances. This reverts them to the pre-failover configuration.
Step 7 Pair the primary appliances.
Step 8 Restore the cluster backup on the appliances.
Step 9 Re-configure failover. See Configure Failover for Cisco Digital Signs, for the failover configuration process.
Split brain occurs when both nodes become active or when the data on each node become out of sync with the other node. To recover, you need to determine which set of data you are going to keep. The recovery process overwrites the other set of data.
Procedure
Step 1 Determine which device you want to use as the data source. This is the appliance whose data will be used to populate the cluster.
Step 2 On the appliance you want to receive the data, do the following:
a. Log into AAI.
b. Choose FAIL_OVER > RECOVER.
If split brain is not occurring, you will receive a message that split brain was not detected. Cancel out of the split brain recovery process.
If split brain is occurring, the data selection page appears.
c. Choose OVERWRITE_DATA.
d. Choose Yes if prompted to continue.
Step 3 On the appliance you are going to use as the data source, do the following:
a. Log into AAI.
b. Choose FAIL_OVER > RECOVER.
If split brain is not occurring, you will receive a message that split brain was not detected. Cancel out of the split brain recovery process.
If split brain is occurring, the data selection page appears.
c. Choose USE_DATA.
d. Choose Yes if prompted to continue.