Using the Cisco UCS Provider for Proactive High Availability (HA)

This chapter includes the following sections:

Cisco UCS Provider for Proactive HA

Cisco UCS Provider for Proactive HA feature allows the system to assess the health of the server running the ESXi host. It assess if the server is healthy, moderately or severely degraded. Any fault which occurs from the Cisco approved predefined list of faults with critical or major severity is reported to the vCenter. For more information on Proactive HA feature and providers, see VMware documentation.

Prerequisites and User Privileges

To use Cisco UCS Provider for the Proactive HA, we recommend that you enable the following:

  • vSphere DRS

  • Proactive HA

You must have the following privileges to use Cisco UCS Provider for the Proactive HA:

  • Health Update Provider

    • Register

    • Unregister

    • Update

  • Host

    • Inventory

      • Modify Cluster

    • Configuration

      • Quarantine

      • Maintenance

  • Storage Views

    • View

Registering a Cisco UCS Provider

Procedure


Step 1

Launch the vSphere HTML Client.

Step 2

From the Shortcuts page, launch Cisco UCS plug-in.

Step 3

Click Proactive HA Registration tab.

Step 4

In the Register Cisco UCS Provider area, enter the following:

Name

Description

Username

Enter the vCenter username

Password

Enter the vCenter password

Note 

If you want to update the vCenter credentials for Cisco UCS Provider, then enter a new password and click Update.

Step 5

Click Register.

The Cisco UCS Provider is visible when the domains which manage all the hosts in the cluster are registered.

Important 

To upgrade the registered Cisco UCS Manager plug-in, unregister the Cisco UCS Provider for Proactive HA, upgrade the registered plug-in and register the Cisco UCS Provider for Proactive HA. For more information on how to unregister the Cisco UCS Provider, and upgrade the plug-in, see Unregistering a Cisco UCS Provider and Upgrading Cisco UCS Manager Plug-In for vSphere HTML Client.


Enabling Cisco UCS Provider

Before you begin

  • Enable vSphere DRS

  • From the vSphere Availability, enable Proactive HA

  • Register all the UCS domains which manage all the hosts in the cluster. To register the domains, see Registering the UCS Domains

Procedure


Step 1

Click Hosts and Clusters > Cluster > Configure > vSphere Availability > Proactive HA > Edit.

Step 2

On the Proactive HA Failures and Responses tab, complete the following:

Name

Description

Automation Level drop-down list

Whether to migrate the VMs automatically or manually in case of hosts failure.

This can be one of the following:

  • Manual

  • Automated

We recommend that you select Automated level.

Remediation drop-down list

The action to be taken depending on the severity of the failure,

This can be one of the following:

  • Quarantine mode for all failures

  • Quarantine mode for moderate and Maintenance Mode for sever failures (Mixed)

  • Maintenance mode for all failures

We recommend that you select Mixed mode.

Step 3

From the list, check the Cisco UCS Provider check box, and click OK.


Unregistering a Cisco UCS Provider

Procedure


Step 1

Launch the vSphere HTML Client.

Step 2

From the Shortcuts page, launch Cisco UCS plug-in.

Step 3

Click Proactive HA Registration tab.

Step 4

Click Unregister.


Modifying Cisco UCS Failure Conditions

Before you begin

  • Enable vSphere DRS

  • From the vSphere Availability, enable Proactive HA

  • Register all the UCS domains which manage all the hosts in the cluster. To register the domains, see Registering the UCS Domains

Procedure


Step 1

Click Hosts and Clusters > Cluster > Configure > vSphere Availability > Proactive HA > Edit.

Step 2

From the list of providers under Providers tab, check the Cisco UCS Provider check box, and click Edit.

A list of Cisco UCS Provider failure conditions appears.

Step 3

To block a failure condition on a host in the cluster, check the failure condition and the associated host check box.

Step 4

To select all current and future hosts in the cluster, check the Cluster-level check box.

Step 5

Click OK.


List of Cisco UCS Provider Failure Conditions for Proactive HA

Table 1. Fault Conditions in Cisco UCS Provider

Sl. No.

Fault ID

Component Type

Description

1.

F0190

Memory

Memory array voltage exceeds the specified hardware voltage

2.

F0539

Network

IO controller temperature is outside the upper or lower critical threshold

3.

F0185

Memory

Memory Unit Inoperable

4.

F0313

Power

Compute Physical BIOS POST Timeout

5.

F0317

Storage

Compute Physical Inoperable

6.

F0373

Fan

Equipment Fan Inoperable

7.

F0374

Power

Equipment PSU Inoperable

8.

F0484

Fan

Equipment Fan Performance Threshold Lower Non Recoverable

9.

F0187

Memory

Memory Unit Thermal Threshold Critical

10.

F0188

Memory

Memory Unit Thermal Threshold Non Recoverable

11.

F0312

Storage

Compute Physical Thermal Problem

12.

F0382

Fan

Equipment Fan Module Thermal Threshold Critical

13.

F0384

Fan

Equipment Fan Module Thermal Threshold Non Recoverable

14.

F0383

Power

Equipment PSU Thermal Threshold Critical

15.

F0385

Storage

Equipment PSU Thermal Threshold Non Recoverable

16.

F0540

Network

Compute IOHub Thermal Threshold Non Recoverable

17.

F0191

Memory

Memory Array Voltage Threshold Non Recoverable

18.

F0389

Power

Equipment PSU Voltage Threshold Critical

19.

F0391

Power

Equipment PSU Voltage Threshold Non Recoverable

20.

F0425

Power

Compute Board CMOS Voltage Threshold Non Recoverable

21.

F0310

Power

Compute Board Power Error

22.

F0311

Power

Compute Physical Power Problem

23.

F0369

Power

Equipment PSU Power Supply Problem

24.

F37600

Memory

Memory temperature beyond threshold

25.

F35962

Power

Motherboard power consumption beyond threshold

26.

F0174

Power

Processor is inoperable

27.

F0181

Power

Local disk has become inoperable

28.

F1004

Power

Storage controller is inaccessible

29.

F0209

Network

Network facing adapter interface is down

30.

F1007

Power

Virtual drive has become inoperable

31.

F1706

Memory

ADDDC Memory RAS Problem

Adding Custom Faults for Proactive HA Monitoring

Before you begin

Perform the following steps only if Proactive HA is already registered:

  • Disable the HA provider and turn off Proactive HA from vSphere availability in cluster settings.

  • Unregister the Proactive HA. To unregister the proactive HA in the domains, see Unregistering a Cisco UCS Provider.

Procedure


Step 1

Launch the vSphere HTML Client.

Step 2

From the Shortcuts page, launch Cisco UCS plug-in.

Step 3

Click Proactive HA Registration tab.

Step 4

In the Fault Monitoring Details area, enter the following:

Name

Description

ID

Fault Code

Description

Description for the fault

Component Type

Component type of the fault

Step 5

Click ADD.

Important 

The ADD button is enabled only when Proactive HA is unregistered. To unregister the proactive HA, see Unregistering a Cisco UCS Provider.

Step 6

Register the Proactive HA. To register the proactive HA in the domains, see Registering a Cisco UCS Provider.

Step 7

Turn on Proactive HA and enable the HA provider from vSphere availability in cluster settings.