HA Issues

HA Issues with Initial Setup

The following table lists issues that you could encounter with the initial setup of HA:

Environment Issue Resolution
ISO installation

I/O error on shared LUN

Ignore this error. Installation proceeds as normal after clicking Ignore.

Validating first node

No shared storage devices detected during first node installation

Check if the RDM disk (shared disk) was added with the specified configuration.

Failed to write on disk

Validating second Node

Peer node unreachable

  • Verify that installation is complete on first node.

  • Verify network connectivity for the two nodes.

Expected shared storage device not found

Verify that the same shared storage device is configured on both nodes (same LUN)

Node not added to the cluster

Verify if the IP address configured on the second node matches the value of the peer node IP entered during first node setup.

Verify that username and password for peer node is correct.

Verify that both of the nodes contain the same version of Cisco UCS Central.

HA Issues with NFS

The following table lists issues that you could encounter with NFS:

Issue Resolution

Using NFS shared storage for HA

  • You can only mount the NFS point using IPv4 address.

  • During restore, you cannot configure RDM if the backup was taken on NFS HA.

  • You cannot switch back to RDM from NFS.

  • As part of tech support, the file sharedStorage.txt contains the result of the performance diagnostic on the NFS server that you are using as shared storage.

Boot failures

If UCS Central shuts down due to an ungraceful shutdown, or unexpected reboot, it could fail to boot due to file system errors.

Contact Cisco TAC for help recovering from a file system error.

Firewall Issues

When Cisco UCS Manager is registered with Cisco UCS Central, the NFS mount definition MO disables. When you move Cisco UCS Manager to a domain group root, or a subgroup, from an ungrouped domain, then NFS mount enables so you can mount the following partitions:
  • /bootflash/images—Used for storing the firmware images needed for copying over to Cisco UCS domains.

  • /bootflash/cfg—Used to store the Cisco UCS Manager scheduled configuration and full state backups.

This mount can fail due to multiple reasons, including that the required NFS ports are not opened between Cisco UCS Manager and Cisco UCS Central. The following TCP ports must be open between Cisco UCS Central and a registered Cisco UCS domain for the firmware management and backup functionality to work correctly:

  • LOCKD_TCPPORT=32803

  • MOUNTD_PORT=892

  • RQUOTAD_PORT=875

  • STATD_PORT=32805

  • NFS_PORT="nfs"(2049)

  • RPC_PORT="sunrpc"(111)

Cisco UCS Manager in lost visibility state

Sometimes Cisco UCS Manager loses visibility with Cisco UCS Central due to various communication failures or network failures. Check the FSM status under the control-ep policy and fix the problem. The NFS mounts recover automatically.

Internal NFS server issues

Sometimes, all of the communication channels look fine, but you observe NFS mount failures on Cisco UCS Manager.

The dcos AG logs would look similar to the following:

mount: 10.193.190.211:/bootflash/cfg/10.193.23.70 failed, reason given by server: Permission denied
+ '[' 32 -ne 0 ']’

Contact Cisco TAC for help.