The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This chapter describes how to troubleshoot issues related to high availability (HA).
Cisco VSG high availability (HA) is a subset of the Cisco NX-OS HA. The following HA features minimize or prevent traffic disruption in the event of a failure:
Cisco VSG redundancy is equivalent to HA pairing. The two possible redundancy states are active and standby. An active Cisco VSG is always paired with a standby Cisco VSG. HA pairing is based on the Cisco VSG ID. Two Cisco VSGs that are assigned an identical ID are automatically paired. All processes running in the Cisco VSG are data path critical. If one process fails in an active Cisco VSG, a failover to the standby Cisco VSG occurs instantly and automatically.
The Cisco VSG software contains independent processes known as services. These services perform a function or set of functions for a subsystem or feature set of the Cisco VSG. Each service and service instance runs as an independent, protected process. This operational process ensures a highly fault-tolerant software infrastructure and fault isolation between services. Any failure in a service instance does not affect any other services running at that time. Additionally, each instance of a service runs as an independent process. For example, two instances of a routing protocol run as separate processes.
The Cisco VSG HA pair configuration allows uninterrupted traffic forwarding by using stateful failovers when a failure occurs.
The following key problems are found in Cisco VSG HA. In addition to these issues, some of the common Cisco NX-OS HA symptoms are listed in Table 7-1. The symptoms that are related to high availability, their possible causes, and recommended solutions are as follows.
– In a multitenant environment, if there is a shared management network and a collision occurs in the domain ID (two or more tenants using the same domain ID) spontaneous reboots of the Cisco VSGs that are involved in the collision are triggered. There is no workaround for this issue. When domain IDs are provisioned, they must be unique across all tenants.
|
|
|
---|---|---|
1. Verify the role of each Cisco VSG by entering the show system redundancy status command. 2. Update an incorrect role by entering the system redundancy role command. 3. Save the configuration by entering the copy run start command. |
||
Network connectivity problems are occurring between the Cisco VSG and the upstream and virtual switches. The problem could be in the control or management VLAN. |
Restore connectivity as follows: 1. From the Hyper-V client, shut down Cisco VSG, which should be in standby mode. 2. From the Hyper-V client, power on the standby Cisco VSG after network connectivity is restored. |
|
The active Cisco VSG does not complete synchronization with the standby Cisco VSG. |
1. Verify the software version on both Cisco VSGs by entering the show version command. 2. Reinstall the secondary Cisco VSG with the same version used in the primary. |
|
Fatal errors occur during the gsync process. Check the gsyncctrl log by entering the show system internal log sysmgr gsyncctrl command and look for fatal errors. |
Reload the standby Cisco VSG by entering the reload module standby_module_number command. See the “Reloading a Module” section. |
|
The Cisco VSG has connectivity only through the management interface. When a Cisco VSG is able to communicate through the management interface, but not through the control interface, the active Cisco VSG resets the standby to prevent the two Cisco VSGs from being in HA mode and out of sync. |
Check control VLAN connectivity between the primary and secondary Cisco VSG by entering the show system internal redundancy info command. In the output, degraded_mode flag = true. If there is no connectivity, restore it through the control interface. |
|
The following network connectivity problems might be occurring: |
If network problems exist, do the following: 1. From the Hyper-V client, shut down the Cisco VSG, which should be in standby mode. 2. From the Hyper-V client, bring up the standby Cisco VSG after network connectivity is restored. |
|
The boot variables for active and standby Cisco VSGs are set to different image names, or if image names are the same, the files are not the correct files. When active and standby Cisco VSG devices are running different versions that are not HA compatible, they are unable to synchronize. |
Update the software version or the boot variables as follows: 1. From each Cisco VSG (active and standby), verify the software version by entering the show version command. 2. Reload the standby Cisco VSG with the version that is running the active Cisco VSG by doing one of the following: – Correct the boot variable names. – Replace the incorrect software files. See the “Reloading a Module” section. |
|
The broadcast traffic from the standby to the active Cisco VSG might prevent the Cisco VSGs from synchronizing. The standby Cisco VSG tries to contact the active Cisco VSG periodically, but if broadcast traffic problems persist for over a minute when the standby device is booting up, the system cannot synchronize. |
Fix the traffic problem and reload the standby Cisco VSG as follows. 1. From the standby Cisco VSG, verify the broadcast traffic problem by entering the show system internal log sysmgr verctrl command. The following message appears: 3. Reload the standby Cisco VSG by using the reload module standby_module_number command. See the “Reloading a Module” section. |
|
The active Cisco VSG falsely detects a disconnect with the standby device, which is removed and reinserted and synchronization does not occur. |
Verify redundancy states and reload the standby Cisco VSG as follows: 1. Verify active Cisco VSG redundancy by using the show system internal redundancy status command. The output is as follows: 2. Verify the standby Cisco VSG redundancy by using the show system internal redundancy status command. The output is as follows: 3. Reload the standby Cisco VSG by using the reload module standby_module_number command. See the “Reloading a Module” section. |
|
The Cisco VSG HA pair reboots continuously in headless mode (VSMs are down). |
The nonsystem VLAN Cisco VSG ports are down after they reconnect post reboot of the Cisco VSG because the VSM is not present to bring them up. |
Check if the service and HA VLANs are configured as system VLANs. If they are not system VLANs and the Cisco VSG pair reboots for any reason, they do not come back up until the VSM comes up. |
This section lists commands that you can use to troubleshoot problems related to high availability.
This section includes the following topics:
You can display the domain ID information by entering the show vsg command.
This example shows how to display the domain ID information:
You can check the system redundancy status by entering the show system redundancy status command.
This example shows how to display the system redundancy status:
You can check the system internal redundancy status by entering the show system internal redundancy info command.
This example shows how to display the system internal redundancy status information:
You can check the internal system manager state by entering the show system internal sysmgr state command.
This example shows how to display the internal system manager state information:
You can reload a module by entering the reload module command.
Note Using the reload command without specifying a module reloads the whole system.
This example shows how to reload a module:
The standby Cisco VSG console is not accessible externally. You can access the standby Cisco VSG console through the active Cisco VSG console by entering the attach module module-number command.
This example shows how to access the standby Cisco VSG console through the active Cisco VSG console:
You can check for errors in the event history by entering the show system internal sysmgr event-history errors command.
This example shows how to display errors that have been logged in the event history:
This section includes the following topic:
If the standby Cisco VSG is stuck in the synchronization stage, follow these steps on the active Cisco VSG:
Step 1 Enter the show system internal sysmgr state command and check for a line similar to the following:
If this entry is present and shows the same xxxx value for a long time, the system has trouble synchronizing the state for one of the processes.
Step 2 Identify the process by entering the show system internal sysmgr service running | grep xxxx command.
This message appears in the first few lines of the output:
Step 3 If you are able to login to the standby Cisco VSG console (you might need to press Ctrl-C after giving the password), check the output of these two commands:
Step 4 If access to the system is available only after the standby server has booted up and synchronized with the active server, use the following commands: