Using the Cisco UCS Provider for Proactive High Availability (HA)

This chapter includes the following sections:

Cisco UCS Provider for Proactive HA

Cisco UCS Provider for Proactive HA feature allows the system to assess the health of the server running the ESXi host. It assess if the server is healthy, moderately or severely degraded. Any fault which occurs from the Cisco approved predefined list of faults with critical or major severity is reported to the vCenter. For more information on Proactive HA feature and providers, see VMware documentation.

Prerequisites and User Privileges

To use Cisco UCS Provider for the Proactive HA, we recommend that you enable the following:

  • vSphere DRS

  • Proactive HA

You must have the following privileges to use Cisco UCS Provider for the Proactive HA:

  • Health Update Provider

    • Register

    • Unregister

    • Update

  • Host

    • Inventory

      • Modify Cluster

    • Configuration

      • Quarantine

      • Maintenance

  • Storage Views

    • View

Registering Cisco UCS Manager Provider and Enabling Proactive HA Feature

Before you begin

  • The default username is admin.

Procedure


Step 1

Log into Cisco UCS Manager Plug-in Appliance Web UI. Enter the username as admin and provide the password.

The list of registered VMware vCenter servers is displayed in the Registered VMware vCenter Servers table.

Step 2

Click Register.

The Add VMware vCenter Servers pop-up window is displayed.

Step 3

Enter the required information of a VMware vCenter server in the Add VMware vCenter Servers pop-up window.

  1. Enter the following details:

    Field

    Description

    FQDN/Server IP

    FQDN or Server IP of the VMware vCenter server.

    Port

    The port to use for communication.

    The default is 443.

    Username

    Enter the user name for the VMware vCenter server.

    Password

    Enter the password for the VMware vCenter server.

  2. Select the Proactive HA checkbox to enable Proactive HA in the vCenter server.

  3. Click Next.

    The vCenter server details are validated and added in the Registered VMware vCenter Servers table.

    The field Proactive HA Status displays Enabled for the respective vCenter server.

Alternatively, you can enable Proactive HA in a VMware vCenter server by performing the below given steps:

Step 4

Launch the vSphere HTML Client.

Step 5

Click the Proactive HA Registration tab.

Step 6

In the Register Cisco UCS Manager Provider area, select the VMware vCenter server.

Step 7

Click Register.


Enabling Cisco UCS Manager Provider

Procedure


Step 1

Click Hosts and Clusters > Cluster > Configure > vSphere Availability > Proactive HA > Edit.

Step 2

On the Proactive HA Failures and Responses tab, complete the following:

Name

Description

Automation Level drop-down list

Whether to migrate the VMs automatically or manually in case of hosts failure.

This can be one of the following:

  • Manual

  • Automated

We recommend that you select Automated level.

Remediation drop-down list

The action to be taken depending on the severity of the failure,

This can be one of the following:

  • Quarantine mode for all failures

  • Quarantine mode for moderate and Maintenance Mode for sever failures (Mixed)

  • Maintenance mode for all failures

We recommend that you select Mixed mode.

Step 3

From the list, check the Cisco UCS Manager Provider check box, and click OK.


Unregistering a UCS Manager Provider

Before you begin

Before you unregister, perform the following step if Proactive HA Provider is enabled:

  • Disable the HA provider and turn off Proactive HA from vSphere Availability in the vCenter Cluster Configure page.

Procedure


Step 1

Launch the vSphere HTML Client.

Step 2

Click Proactive HA Registration tab.

Step 3

Click Unregister.


Modifying Cisco UCS Failure Conditions

Procedure


Step 1

Click Hosts and Clusters > Cluster > Configure > vSphere Availability > Proactive HA > Edit.

Step 2

From the list of providers under Providers tab, check the Cisco UCS Provider check box, and click Edit.

A list of Cisco UCS Provider failure conditions appears.

Step 3

To block a failure condition on a host in the cluster, check the failure condition and the associated host check box.

Step 4

To select all current and future hosts in the cluster, check the Cluster-level check box.

Step 5

Click OK.


List of Cisco UCS Provider Failure Conditions for Proactive HA

Table 1. Fault Conditions in Cisco UCS Provider

Sl. No.

Fault ID

Component Type

Description

1.

F0190

Memory

Memory array voltage exceeds the specified hardware voltage

2.

F0539

Network

IO controller temperature is outside the upper or lower critical threshold

3.

F0185

Memory

Memory Unit Inoperable

4.

F0313

Power

Compute Physical BIOS POST Timeout

5.

F0317

Storage

Compute Physical Inoperable

6.

F0373

Fan

Equipment Fan Inoperable

7.

F0374

Power

Equipment PSU Inoperable

8.

F0484

Fan

Equipment Fan Performance Threshold Lower Non Recoverable

9.

F0187

Memory

Memory Unit Thermal Threshold Critical

10.

F0188

Memory

Memory Unit Thermal Threshold Non Recoverable

11.

F0312

Storage

Compute Physical Thermal Problem

12.

F0382

Fan

Equipment Fan Module Thermal Threshold Critical

13.

F0384

Fan

Equipment Fan Module Thermal Threshold Non Recoverable

14.

F0383

Power

Equipment PSU Thermal Threshold Critical

15.

F0385

Storage

Equipment PSU Thermal Threshold Non Recoverable

16.

F0540

Network

Compute IOHub Thermal Threshold Non Recoverable

17.

F0191

Memory

Memory Array Voltage Threshold Non Recoverable

18.

F0389

Power

Equipment PSU Voltage Threshold Critical

19.

F0391

Power

Equipment PSU Voltage Threshold Non Recoverable

20.

F0425

Power

Compute Board CMOS Voltage Threshold Non Recoverable

21.

F0310

Power

Compute Board Power Error

22.

F0311

Power

Compute Physical Power Problem

23.

F0369

Power

Equipment PSU Power Supply Problem

24.

F37600

Memory

Memory temperature beyond threshold

25.

F35962

Power

Motherboard power consumption beyond threshold

26.

F0174

Power

Processor is inoperable

27.

F0181

Power

Local disk has become inoperable

28.

F1004

Power

Storage controller is inaccessible

29.

F0209

Network

Network facing adapter interface is down

30.

F1007

Power

Virtual drive has become inoperable

31.

F1706

Memory

ADDDC Memory RAS Problem

Adding Custom Faults for Proactive HA Monitoring

Before you begin

You must unregister the proactive HA in the domains, before you add a custom fault. See Unregistering a UCS Manager Provider.

Before you unregister, perform the following step if Proactive HA Provider is registered:

  • Disable the HA provider and turn off Proactive HA from vSphere Availability in the vCenter Cluster Configure page.

Procedure


Step 1

Launch the vSphere HTML Client.

Step 2

From the Shortcuts page, launch Cisco UCS plug-in.

Step 3

Click Proactive HA Registration tab.

Step 4

Click ADD.

Step 5

In the Fault Monitoring Details area, enter the following:

Name

Description

Fault Code

Fault code

Description

Description for the fault

Component Type

Component type of the fault


Deleting Custom Faults for Proactive HA Monitoring

Before you begin

You must unregister the proactive HA in the domains, before you delete a custom fault. See Unregistering a UCS Manager Provider.

Before you unregister, perform the following step if Proactive HA Provider is registered:

  • Disable the HA provider and turn off Proactive HA from vSphere Availability in the vCenter Cluster Configure page.

Procedure


Step 1

Launch the vSphere HTML Client.

Step 2

From the Shortcuts page, launch Cisco UCSManager plug-in appliance.

Step 3

Click Proactive HA Registration tab.

Step 4

Select the custom fault that you want to delete.

Step 5

Click Delete.

The confirmation pop-up window is displayed.