GPU Management

GPU Management

Overview

GPUs are widely used for high-performance computing and graphics processing in various applications. The BMC monitors the health status of GPUs, such as temperature, to prevent overheating or malfunction during heavy computational loads, thereby ensuring the reliability and longevity of the hardware.

Monitored and Controlled Features

The BMC monitors and controls the following GPU features:

  • Monitor GPU temperature

  • Monitor GPU current power consumption

  • Monitor the temperature of components on the GPU board

  • Monitor the power consumption of components on the GPU board

  • Display the version of components on the GPU board

  • Remotely update the GPU firmware and the component firmware on the GPU board

Configuring GPU Date and Time Settings


Note


This option is available only for few Cisco UCS C885A M8 Rack Server configurations.


Procedure


Step 1

From the Navigation Pane, select Settings > Date and time.

Step 2

Select the GPU tab.

Step 3

Under Configure Settings, choose between the following options:

  • Manual

  • Set GPU Datetime to be the same as BMC Datetime

Step 4

For Manual, update the following properties:

Name

Description

Date field

Enter in YYYY-MM-DD format.

24-hour time (UTC) field

Enter time in HH:MM format.

Step 5

Select Set GPU Datetime to be the same as BMC Datetime to automatically import the settings from BMC.

Step 6

Click Set.


Viewing GPU FRU Information

Procedure


Step 1

From the Navigation Pane, select Hardware Status > Inventory and LEDs.

Step 2

Select the GPU tab.

Step 3

Under FRU Assembly, you can view the following properties:

Name

Description

Model

Displays the GPU model.

Name

Displays the GPU name.

Part Number

Lists the part number associated with the GPU.

Physical Context

Describes the physical context or placement of the GPU.

Serial Number

Displays the serial number of the GPU.

Vendor

Identifies the vendor or manufacturer of the GPU.

Step 4

Under Versions, you can view the following properties:

Name

Description

Name column

Identifies the component or software related to the GPU.

Version column

Shows the version number associated with the component or software.


Viewing GPU Power Configuration

Procedure


Step 1

From the Navigation Pane, select Resource Management > Power.

Step 2

Select the GPU tab.

Step 3

You can view the following properties:

Name

Description

Name column

Identifies the GPU.

Power Consumption column

Displays the current power usage.

Power Cap column

Indicates the maximum power limit set for the GPU.


Applying GPU Power Cap

Procedure


Step 1

From the Navigation Pane, select Resource Management > Power.

Step 2

Check the Apply power cap check box.

Step 3

In the Power cap value (in watts) field, enter a value between 200 and 750.

Step 4

Click Save.


Updating GPU Firmware

Before you begin

Ensure that the firmware file is available on the client before starting this procedure.

Procedure


Step 1

From the Navigation Pane, select Operations > Firmware update.

Step 2

From the Device drop-down list, select GPU.

Step 3

Click Add File and browse to locate the firmware file.

Select the firmware file.

Step 4

Click Start Update to initiate the firmware update.


What to do next

After the firmware update completes, perform an AC power cycle to activate and complete the GPU upgrade.