Cisco UCS C885A M8 Dense GPU Server Getting Started Guide

1. About this document

This document serves as a "Getting Started" guide for the Cisco UCS^® C885A M8 Rack Server, providing foundational technical instructions and procedures for initial setup and common maintenance tasks. It outlines processes for obtaining required software and firmware, performing updates for various server components, and configuring specific hardware elements.

Scope and assumptions

This guide covers the essential steps for bringing your Cisco UCS C885A M8 Rack Server into operation. Specifically, it addresses:

● Initial server preparation: performing firmware updates for server components (for example, BMC, BIOS, GPU, and FPGA) that can or must be done prior to installing an operating system.

● Host operating system installation: guidance on installing a supported host operating system.

● Post-OS host configurations: procedures for updating host-level components (for example, NVIDIA SuperNICs/DPUs and Intel^® Ethernet Network Adapter X710 cards), configuring network interface modes, and installing necessary drivers and toolkits (for example, NVIDIA GPU drivers and NVIDIA CUDA Toolkit) from within the installed OS.

The procedures described require familiarity with the Linux command-line interface.

This guide primarily focuses on Ubuntu 22.04 Long Term Support (LTS) and 24.04 LTS as the supported host operating systems. Future versions are planned to cover additional supported operating systems.

Examples provided for firmware bundle names, driver versions, and utility file names are illustrative and subject to change as new versions are released.

Users must obtain the latest or specific required versions of software, firmware, and utilities from the official Cisco^® software download site (software.cisco.com) and the respective component vendor websites (for example, Intel and NVIDIA). Consequently, commands and file paths shown in this guide may need to be adjusted by the user to match the actual filenames and versions they download.

Purpose

This guide offers practical steps and a foundational understanding for common operations to help users quickly get started with their Cisco UCS C885A M8 Rack Server.

Disclaimer

This document does not replace the product documentation available on Cisco.com.

For detailed configuration information, advanced procedures, troubleshooting, and the most current updates and release notes, always refer to the official Cisco UCS C885A M8 Rack Server configuration guides, user guides, and release notes published on software.cisco.com and Cisco.com.

2. Initial server setup and firmware updates (pre-OS deployment)

This section details the process for updating the foundational firmware components of your Cisco UCS C885A M8 Rack Server. These updates are crucial for optimal performance, stability, and compatibility, and they should be performed before installing any host operating system. This includes firmware for components such as the baseboard management controller (BMC), BIOS, GPU, and field-programmable gate array (FPGA).

Prerequisite: Before proceeding with the firmware updates, ensure that your server's baseboard management controller (BMC) has been initially set up, is networked, and is reachable. This includes assigning an IP address and configuring basic network connectivity. For detailed instructions on the physical installation and initial BMC configuration, please refer to the following official Cisco documentation:

● Cisco UCS C885A M8 Server Installation and Service Guide: https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/hw/c885A/install/b_c885a-m8-server-hig.html

● Cisco Baseboard Management Controller Configuration Guide for the Cisco UCS C885A M8 Rack Server, Release 1.1: https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/sw/gui/config/guide/1-1-0/b_openbmc-configuration-guide_1-1.html

2.1. Software download

To begin, download the necessary firmware bundles and updater scripts from the official Cisco software download site.

1. Navigate to software.cisco.com.

2. Search for "Cisco UCS C885A M8" or the specific firmware version you require.

3. Download the latest recommended firmware bundle (.tar.gz file) for your server.

Download firmware and updater script from software.cisco.com.

Firmware Download

Figure 1.

Firmware Download

2.2. Update bundle directory structure

Once downloaded, extract the firmware bundle. This bundle contains the necessary files for both out-of-band (OOB) and host-level component updates. Example:

$ tar xzvf ucs-c885a-m8-1.0.0.250003.tar.gz

Upon extraction, the bundle will typically have the directory structure shown in Figure 2, in which:

● host: contains networking drivers and firmware for components that are updated from within a host operating system (for example, NVMe drives and NVIDIA BlueField-3 cards). Each binary within this directory usually includes a README file with specific instructions.

● oob: contains firmware for out-of-band components such as the BIOS, BMC, and FPGA. This directory may contain different software versions based on specific hardware configurations.

A screenshot of a computer programAI-generated content may be incorrect.

Figure 2.

Firmware bundle structure

2.3. Upgrade script (out-of-band components)

The provided upgrade script is designed to update the out-of-band (OOB) components of the C885A M8 server, including the BMC, BIOS, GPU, and FPGA, using the Redfish API. This script is ideal for performing updates before an operating system is installed or when the host is powered off.

IMPORTANT CONSIDERATION: BIOS updates require the host to be powered down. This means you cannot launch the upgrade script from a host operating system that is running on the server you intend to upgrade. It must be run from a separate management workstation that has network access to the server's BMC IP address.

Prerequisites for the management workstation:

● Python: Version 3.1 or higher

● Python libraries: requests, beautifultable, urllib3. To install these, run:

◦ $ pip install requests beautifultable urllib3

Upgrade script: inventory test (optional, but recommended)

Before performing an update, you can run, optionally, an inventory test using the upgrade script. This test will identify which components require updates by comparing their currently running firmware versions with the versions included in the firmware bundle.

Command syntax:

$ python3 ucs-c885a-m8-upgrade-v1.2.py -U <BMC_USERNAME> -P <BMC_PASSWORD> -I <BMC_IP_ADDRESS> -B <FIRMWARE_BUNDLE_PATH>

Example:

$ python3 ucs-c885a-m8-upgrade-v1.2.py -U user -P password -I 10.0.0.1 -B ucs-c885a-m8-1.1.0.250022.tar.gz

Extracting firmware bundle... success

Validating BMC login details... success

Inventory started... success

Inventory Details

-----------------

IP : 10.0.0.1

Hostname : C885A-LLLYYWWSSSS

Board Serial : LLLYYWWSSSS

Product Name : C885A M8

Host Power State: Off

GPU Model :

+------+------------+--------------------+---------------------+-----------------+

+------+------------+--------------------+---------------------+-----------------+

| 1 | BMC | 1.0.32 | 1.0.35 | Yes |

+------+------------+--------------------+---------------------+-----------------+

| 2 | BIOS | 1.0.32 | 1.0.35 | Yes |

+------+------------+--------------------+---------------------+-----------------+

| 3 | GPU | HGX-22.10-1-rc80 | HGX-22.10-1-rc80 | No |

+------+------------+--------------------+---------------------+-----------------+

| 4 | DCSCM-FPGA | 4.01 | 4.01 | No |

+------+------------+--------------------+---------------------+-----------------+

| 5 | MB-FPGA | 4.01 | 4.01 | No |

+------+------------+--------------------+---------------------+-----------------+

| 6 | HIB-FPGA | 6.0 | 6.0 | No |

+------+------------+--------------------+---------------------+-----------------+

Inventory completed successfully

Run the upgrade script

Once you have reviewed the inventory and are ready to proceed with the updates, run the upgrade script with the -F option to initiate the firmware flash process.

Command syntax:

$ python3 ucs-c885a-m8-upgrade-v1.2.py -U <BMC_USERNAME> -P <BMC_PASSWORD> -I <BMC_IP_ADDRESS> -B <FIRMWARE_BUNDLE_PATH> -F

Example:

$ python3 ucs-c885a-m8-upgrade-v1.2.py -U user -P password -I 10.0.0.1 -B ucs-c885a-m8-1.1.0.250022.tar.gz -F

Update Status

-------------

IP : 10.0.0.1

Hostname : C885A-LLLYYWWSSSS

Board Serial : LLLYYWWSSSS

Product Name : C885A M8

Host Power State: Off

GPU Model :

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

| 1 | BMC | 1.0.32 | 1.0.35 | Yes | Triggered | 0 |

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

| 2 | BIOS | 1.0.32 | 1.0.35 | Yes | Completed | 100 |

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

| 3 | GPU | HGX-22.10-1-rc80 | HGX-22.10-1-rc80 | No | Skipped | - |

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

| 4 | DCSCM-FPGA | 4.01 | 4.01 | No | Skipped | - |

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

| 5 | MB-FPGA | 4.01 | 4.01 | No | Skipped | - |

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

| 6 | HIB-FPGA | 6.0 | 6.0 | No | Skipped | - |

+------+------------+--------------------+---------------------+-----------------+---------------+-------------------+

Update completed successfully

The BIOS update has been completed successfully. Please power ON the host to activate.

The BMC update has been successfully triggered and will take approximately 15 minutes to complete. During this time, the HTTPS service will be unavailable.

Note: Please wait until the HTTPS service is restored.

Post-update actions

● BIOS update: If the BIOS was updated, the server will remain powered off. You must power on the host (using the BMC web UI or the Redfish API) to activate the new BIOS firmware.

● BMC update: The BMC update will proceed in the background. The BMC's web interface (HTTPS service) will be temporarily unavailable during this process (approximately 15 minutes). Wait for the service to be restored before attempting to access the BMC again.

2.4. Manual update from the BMC web UI

In cases where a manual update is preferred or required, you can also update system components using the web user interface of the BMC.

Steps:

1. Access the BMC web interface by navigating to its IP address in your web browser.

2. Log in with your BMC credentials.

3. Navigate to the firmware update section. (The exact path may vary slightly depending on the BMC firmware version, but it is typically found under "Operations," "Firmware.")

4. Follow the on-screen instructions to upload the necessary firmware files (for example, for BMC, BIOS, or FPGA) from the extracted OOB directory of your firmware bundle.

Warning: The server needs to be power down to perform a BIOS update. Ensure that you are prepared for this interruption. If a host operating system were installed, you would perform a clean host shutdown before initiating a BIOS update through the BMC web UI.

BMC update

Firmware update from BMC

Figure 3.

Firmware update from BMC

Firmware update image file selection

Figure 4.

Firmware update image file selection

Update firmware confirm window

Figure 5.

Update firmware confirm window

BIOS update

A screenshot of a computerAI-generated content may be incorrect.

Figure 6.

OEM firmware update from BM

A screenshot of a software updateAI-generated content may be incorrect.

Figure 7.

Device and image file selection screen

A close-up of a messageAI-generated content may be incorrect.

Figure 8.

Firmware update status message

A screenshot of a computerAI-generated content may be incorrect.

Figure 9.

Firmware update status message

A screenshot of a messageAI-generated content may be incorrect.

Figure 10.

Firmware update status message

A screenshot of a computerAI-generated content may be incorrect.

Figure 11.

Server power operations

3. Host operating system installation

After successfully updating the server's foundational firmware components (BMC, BIOS, etc.) as described in Section 2, the next step is to install a supported host operating system (OS). This guide focuses on Ubuntu LTS versions, which are validated for the Cisco UCS C885A M8 Rack Server.

3.1. Supported operating systems

The Cisco UCS C885A M8 Rack Server is validated to support the following host operating systems, as of July 2025:

● Ubuntu Server 22.04 LTS

● Ubuntu Server 24.04 LTS

● Red Hat Enterprise Linux > 9.4

● Red Hat Enterprise Linux Core OS > 4.16

● VMware ESXi 8.0 U3

3.2. Prerequisites for OS installation

Before beginning the OS installation, ensure that you have the following:

● Operating system installation media: an ISO image of the desired operating system version

● BMC access: ensure that you have network access to the server's BMC IP address and login credentials, because the BMC KVM Console is the tool to use for remote OS installation.

● Network connectivity: basic network connectivity for the server, especially if performing a network-based installation or if the OS installer requires internet access for updates or package downloads.

3.3. Operating system install with BMC KVM

The BMC KVM (Keyboard, Video, Mouse) Console allows you to remotely view the server's console and interact with it as if you were physically present, including mounting virtual media for OS installation.

Steps:

1. Connect to the server BMC.

2. Go to "Operations," "Virtual Media."

3. Select a method to mount your ISO image:

◦ Load Image from a web browser: the image will be streamed from your workstation.

◦ Load image from external server: remotely mount the image from a CIFS or HTTPS location, reachable from the BMC.

4. Under "Operations," "Server power operations," select "CD" as boot source override and check the "Enable one time boot" option. This will ensure the server will boot from the ISO image you have mapped

5. Reboot the server.

6. Under "Operations," "KVM," launch the KVM console and wait for the ISO image to boot.

7. Proceed normally with OS installation.

Note: It is recommended to host the image as close as possible to the server to avoid possible issues due to bandwidth constraints and/or high latency.

Server power operations, one time boot

Figure 12.

Server power operations, one time boot

3.4. Operating system install with PXE boot

For large-scale deployments or automated installations, you can leverage a PXE server to deploy the OS over the network. This method requires a preconfigured PXE environment on your network.

Steps:

1. Configure PXE server: ensure that your PXE environment is properly set up and ready to serve from the interface you want to boot from.

2. Configure server for network boot:

◦ Under "Operations," "KVM," launch the KVM console.

◦ Reboot the server and wait for the boot menu prompt instructions.

◦ Enter the boot menu by pressing ESC or F7.

◦ Select the interface you want to boot from.

3. Proceed normally with OS installation.

4. Software and driver downloads

This chapter provides a consolidated list of official sources for downloading the necessary software and drivers for your Cisco UCS C885A M8 Rack Server. Subsequent chapters will detail the installation procedures for these components.

4.1. Cisco host drivers image

Cisco provides comprehensive host driver images for the Cisco UCS C885A M8 Rack Server, consolidating necessary drivers and utilities for various operating systems. These images can be downloaded from the official Cisco^® software download site (software.cisco.com).

To obtain and utilize these images:

1. Download the ISO: navigate to software.cisco.com and search for "Cisco UCS C885A M8" to find and download the appropriate host driver ISO image for your operating system (for example, Linux or VMware).

2. Mount the ISO: once downloaded, mount the ISO image onto your host operating system. For Linux, this can typically be done with a command such as:

$ sudo mkdir -p /mnt/drivers

$ sudo mount -o loop <downloaded_iso_file.iso> /mnt/drivers

3. Access drivers: navigate to the mounted directory to access the organized driver packages.

Important considerations:

● Content: the host driver images include drivers for Linux (supporting RHEL/CoreOS and Ubuntu distributions) and VMware.

● Third-party software: while the ISO provides a structured way to access drivers, not all third-party software binaries are directly included due to redistribution limitations. Some folders within the ISO contain README.html files or direct links to specific software versions that Cisco has tested and validated.

● Version support: these linked versions represent the configurations tested by Cisco. However, the system generally supports newer versions of drivers and software as they become available from the respective vendors. Users are encouraged to obtain the latest or specific required versions from official vendor websites if not directly provided in the ISO.

Example Linux driver directory structure:

The Linux host driver ISO typically presents a directory structure similar to this:

├── Network

│ └── nVIDIA

│ └── MCX7

│ ├── RHEL

│ │ ├── RHEL9.4

│ │ │ └── README.html

│ │ └── RHELcoreos

│ │ └── README.html

│ └── Ubuntu

│ ├── Ubuntu22.04LTS

│ │ └── README.html

│ └── Ubuntu24.04LTS

│ └── README.html

├── release.txt

├── Storage

│ ├── Broadcom

│ │ └── UCS-M2-NVRAID

│ │ ├── RHEL

│ │ │ ├── RHEL9.4

│ │ │ │ └── README.html

│ │ │ └── RHELcoreos

│ │ │ └── README.html

│ │ └── Ubuntu

│ │ ├── Ubuntu22.04LTS

│ │ │ └── README.html

│ │ └── Ubuntu24.04LTS

│ │ └── README.html

│ └── Kioxia

│ └── UCS-NVDxxxxxxxx

│ ├── RHEL

│ │ ├── RHEL9.4

│ │ │ └── README.html

│ │ └── RHELcoreos

│ │ └── README.html

│ └── Ubuntu

│ ├── Ubuntu22.04LTS

│ │ └── README.html

│ └── Ubuntu24.04LTS

│ └── README.html

├── tag.txt

└── Video

├── AMD

│ └── UCSC-GPU-MI210

│ ├── RHEL

│ │ ├── RHEL9.4

│ │ │ └── README.html

│ │ └── RHELcoreos

│ │ └── README.html

│ └── Ubuntu

│ ├── Ubuntu22.04LTS

│ │ └── README.html

│ └── Ubuntu24.04LTS

│ └── README.html

└── nVIDIA

└── Tesla_Hx

├── RHEL

│ ├── RHEL9.4

│ │ └── nvidia-driver-local-repo-rhel9-570.86.15-1.0-1.x86_64.rpm

│ └── RHELcoreos

│ └── nvidia-driver-local-repo-rhel9-570.86.15-1.0-1.x86_64.rpm

└── Ubuntu

├── Ubuntu22.04LTS

│ └── nvidia-driver-local-repo-ubuntu2204-570.86.15_1.0-1_arm64.deb

└── Ubuntu24.04LTS

└── nvidia-driver-local-repo-ubuntu2404-570.86.15_1.0-1_amd64.deb

These downloads are relevant for general host-level components:

● Intel Ethernet Network Adapter 700 Series NVM update utility:

◦ Source: Intel Support website

◦ Example URL: https://www.intel.com/content/www/us/en/download/18190/non-volatile-memory-nvm-update-utility-for-intel-ethernet-network-adapter-700-series.html

4.2. NVIDIA GPU stack downloads

These downloads are specific to NVIDIA GPU drivers and compute platforms.

● NVIDIA DOCA-Host Installer (for NVIDIA BlueField-3 DPUs/SuperNICs)

◦ Source: NVIDIA Developer website

◦ https://developer.nvidia.com/doca-downloads

◦ Note: MLNX_OFED last standalone release is October 2024 Long Term Support (3 years). Starting January 2025, all new features will only be included in DOCA-OFED.

● NVIDIA GPU drivers (NVIDIA Data Center Drivers):

◦ Source: NVIDIA Official Drivers website

◦ Example URL: www.nvidia.com/drivers (Search for your specific GPU model and OS.)

● NVIDIA CUDA Toolkit:

◦ Source: NVIDIA Developer website

◦ Example URL: developer.nvidia.com/cuda-toolkit (Select your specific OS and version.)

4.3. AMD GPU stack downloads

These downloads are specific to AMD GPU drivers and compute platforms (ROCm).

AMD ROCm platform:

● Source: AMD ROCm GitHub Repository and AMD Radeon APT Repository

● The installation process involves adding an APT repository, which pulls the necessary packages. No direct .deb download is typically required for the main stack.

● GPG Key URL: https://repo.radeon.com/rocm/rocm.gpg.key

● Driver and ROCm installation guide: https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-ubuntu.html

RCCL tests (source code for benchmarking):

● Source: ROCm GitHub Repository

● URL: https://github.com/ROCm/rccl-tests.git (This is the github repository which will be cloned and built from source)

5. Post-OS host component configuration and updates

With the host operating system successfully installed and basic network connectivity established, you can now proceed with configuring and updating the server's various host-level components.

This chapter covers the configuration and updates for various host-level components that are not directly part of the GPU software stack, such as network interface cards (Intel Ethernet Network Adapter X710 cards) and east/west / north/south networking.

5.1. Prerequisites

Before proceeding with the updates, ensure the following:

● Supported OS: Ubuntu 22.04 LTS or 24.04 LTS is installed and running.

● A sudo user (a user capable of running commands with root privileges) can be used.

● Required packages: package “nvme-cli,” “dracut” (or “initramfs-tools”), “unzip,” and “screen” (or “tmux”) are installed (not mandatory but highly recommended).

● Internet connectivity: the host OS is manageable and connected to the internet (with or without proxy) to download necessary packages and drivers.

● System up-to-date: all system packages are updated and repositories synchronized.

◦ apt update ; apt upgrade -y

● NICs and/or DPUs visibility: confirm that your network interface cards (NICs) and/or data-processing units (DPUs) are properly placed and visible from the host OS.

◦ $ lspci | grep -Ei "ethernet|infiniband"

5.2. Use of “screen” or "tmux" on Linux (optional, but recommended)

Since many firmware updates and software installations can be lengthy processes, it is highly recommended to perform these operations within a persistent terminal session provided by utilities such as screen or tmux. This prevents session loss due to network interruptions (for example, SSH disconnection) and allows you to detach from the session and reattach later.

Example for screen:

Start a screen session:

$ screen

Show all running sessions:

$ screen -ls

There is a screen on:

12572.pts-0.sephiroth (10/27/2024 07:45:13 PM) (Attached)

1 Socket in /run/screen/S-rtortori.

Resume a session:

$ screen -r 12572

Alternatively, to start a tmux session:

$ tmux

Refer to the respective documentation for more advanced screen or tmux commands.

5.3. Update Intel Ethernet Network Adapter X710 card

Update Intel Ethernet Network Adapter X710 card (downloaded from intel.com).

Open a screen session or a tmux session (optional):

$ screen

Or:

$ tmux

Navigate to the directory where you downloaded the Intel Ethernet NVM Update Tool.

Extract Intel Ethernet NVM Update Tool for Linux (change the zip file name as required).

$ unzip 700Series_NVMUpdatePackage_v9_53.zip

$ tar xzvf 700Series_NVMUpdatePackage_v9_53_Linux.tar.gz && cd 700Series/Linux_x64

Run the updater and confirm if the firmware is not up to date:

$ sudo ./nvmupdate64e

Intel(R) Ethernet NVM Update Tool

NVMUpdate version 1.42.24.2

WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.

Inventory in progress. Please wait [***|......]

Num Description Ver.(hex) DevId S:B Status

=== ================================== ============ ===== ====== ==============

01) Intel(R) Ethernet Network Adapter 9.80(9.50) 1572 00:210 Update

X710-2 for OCP NIC 3.0 available

Options: Adapter Index List (comma-separated), [A]ll, e[X]it

Update the Intel X710 card (using the utility from the firmware bundle):

Open a screen session or a tmux session (optional):

$ screen

Or:

$ tmux

Navigate to the directory where the firmware bundle is located, extract the .zip file, and initiate update:

$ sudo ./nvmupdate64e -a PCIe_X710-T2L-OCP3/ -u -l -c nvmupdate.cfg

Intel(R) Ethernet NVM Update Tool

NVMUpdate version 1.42.8.0

Config file read.

[…]

Inventory

[00:210:00:00]: Intel(R) Ethernet Network Adapter X710-T2L for OCP 3.0

Flash inventory started.

[…]

OROM inventory finished.

[00:210:00:01]: Intel(R) Ethernet Network Adapter X710-TL

Device already inventoried.

Update

[00:210:00:00]: Intel(R) Ethernet Network Adapter X710-T2L for OCP 3.0

Flash update started.

|======================[ 90%]================>.....|

[…]

5.4. Installing prerequisites on host for NVIDIA BlueField-3 DPUs/SuperNICs (Cisco UCS C885A M8 Rack Server NVIDIA GPUs only)

To work with NVIDIA BlueField-3 DPUs, you need to install specific NVIDIA DOCA components and utilities.

Navigate to the directory where you downloaded the DOCA-Host Installer and unpack the docahost repo:

$ cd ~/drivers #replace with the right directory where the doca-host deb file is located

$ sudo dpkg -i doca-host_2.8.0-204000-24.07-ubuntu2204_amd64.deb

Install, start, and enable at boot rshim:

$ sudo apt update && sudo apt install -y rshim pv

$ sudo systemctl daemon-reload

$ sudo systemctl start rshim

$ sudo systemctl enable rshim

Install local doca repo package. Restart openibd and mst.

Note: This will attempt a firmware update on all supported interfaces. Use of screen/tmux is recommended.

$ sudo apt update && sudo apt install doca-all mlnx-fw-updater

$ sudo /etc/init.d/openibd restart

$ sudo mst restart

Reboot the system, then verify that rshim is running and can see the devices:

$ sudo reboot

After reboot, verify:

$ ls /dev/ | grep rshim

rshim0

[…]

5.5. Update SuperNICs/DPUs (NVIDIA B3220, B3140H, and Connect-X7)

You can update SuperNICs/DPUs using the mlxup utility or by leveraging the firmware bundle provided by Cisco.

Using mlxup:

● Download mlxup

◦ Navigate to: https://network.nvidia.com/support/firmware/mlxup-mft/ and download the mlxup version of your choice.

◦ Example: $ wget https://www.mellanox.com/downloads/firmware/mlxup/4.29.0/SFX/linux_x64/mlxup

● Change permissions and execute the mlxup firmware utility. If a new firmware is required, the user will be prompted for an action:

◦ $ chmod +x mlxup

◦ $ sudo ./mlxup

● By default, if needed, all cards will be updated. Please refer to the documentation for additional information on how to selectively update firmware or update pulling the latest version from NVIDIA. You can, optionally, use local files, which are within the firmware bundle downloaded from Cisco.com:

● https://docs.nvidia.com/networking/display/mlxupfwutility

● Note: Updating all of the cards can take a long time. Using tmux or screen is recommended because, in that case, the session won’t run in foreground.

Using a firmware bundle:

Alternatively, you can use the firmware files provided within the Cisco UCS C885A M8 Rack Server update bundle for NVIDIA BlueField-3 and NVIDIA ConnectX-7 devices.

Steps:

1. Navigate to ./host/bluefield-3 and ./host/connectx7 directories.

2. Follow the instructions in the README file.

3. IMPORTANT: As of July 2025, only Ubuntu 22.04 LTS and 24.04 LTS have been tested.

5.6. Query SuperNICs and DPUs

To quickly view the status of all SuperNICs/DPUs and their firmware versions against the latest available from NVIDIA, you can use a simple bash script.

In the same directory where you downloaded mlxup, create a file named mlxup_query.sh with the following content:

#!/bin/bash

# Execute the mlxup query command and store the output

output=$(sudo ./mlxup --query --online)

# Check if mlxup command returned any output

if [ -z "$output" ]; then

echo "Error: mlxup --query --online returned no output or an error occurred. Please check the mlxup utility and its permissions."

exit 1

# Use awk to parse the output and format it into a table

echo "$output" | awk '

BEGIN {

# Print header

print "PCI Device Name\tDescription\tStatus";

print "---------------\t-----------\t------";

pci_device_name = "";

description = "";

status = "";

}

# Capture PCI Device Name

/PCI Device Name:/ {

# Remove "PCI Device Name:" and any leading/trailing whitespace

pci_device_name = $0;

sub(/^[ \t]*PCI Device Name:[ \t]*/, "", pci_device_name);

gsub(/^[ \t]+|[ \t]+$/, "", pci_device_name); # Trim any remaining leading/trailing whitespace

}

# Capture Description

/Description:/ {

# Remove "Description:" and any leading/trailing whitespace

description = $0;

sub(/^[ \t]*Description:[ \t]*/, "", description);

# Take only the part before the first semicolon, as in your original script

split(description, desc_parts, ";");

description = desc_parts[1];

gsub(/^[ \t]+|[ \t]+$/, "", description); # Trim any remaining leading/trailing whitespace

}

# Capture Status and print the collected data

/Status:/ {

# Remove "Status:" and any leading/trailing whitespace

status = $0;

sub(/^[ \t]*Status:[ \t]*/, "", status);

gsub(/^[ \t]+|[ \t]+$/, "", status); # Trim any remaining leading/trailing whitespace

# Print the collected data for the current device

print pci_device_name "\t" description "\t" status;

# Reset variables for the next device block

pci_device_name = "";

description = "";

status = "";

}

' | column -t -s $'\t'

Change permissions and execute to have a quick view of all SuperNICs/DPUs and their status. It will check against the latest version available from NVIDIA:

$ chmod +x mlxup_query.sh

$ ./mlxup_query.sh

PCI Device Name Description Status

--------------- ----------- ------

/dev/mst/mt41692_pciconf8 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf7 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf6 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf5 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf4 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf3 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf2 NVIDIA BlueField-3 B3240 P-Series Dual-slot FHHL DPU Up to date

/dev/mst/mt41692_pciconf1 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

/dev/mst/mt41692_pciconf0 Nvidia BlueField-3 B3140H E-series HHHL SuperNIC Up to date

5.7. Change interface mode to Infiniband or Ethernet

NVIDIA BlueField-3 DPUs can operate in either Infiniband or Ethernet mode. You can check and change the mode using mlxconfig.

Run the following command. (Interfaces starting with "ibs" are running in Infiniband mode, whereas interfaces starting with "ens" run in Ethernet mode.)

$ sudo lshw -C network -short

H/W path Device Class Description

=====================================================================

/0/100/1.1/0/1/0 ens204f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/117/1.1/0/3/0 ens203f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/11e/1.1/0/2/0 ibs211f0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/11e/1.1/0/2/0.1 ibs211f1 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/11e/1.1/0/3/0 ens201f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/125/1.1/0/1/0 ens202f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/12c/1.1/0/5/0 ens207f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/133/1.1/0/1/0 ens208f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/13a/1.1/0/2/0 ens206f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/13a/3.1/0 ens21f0 network Ethernet Controller X710 for 10GbE SFP+

/0/13a/3.1/0.1 ens21f1 network Ethernet Controller X710 for 10GbE SFP+

/0/141/1.1/0/4/0 ens205f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/a enxc60fa3e9679a network Ethernet interface

Ensure mst is started:

$ sudo mst status

MST modules:

------------

MST PCI module is not loaded

MST PCI configuration module is not loaded

[…]

$ sudo mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI module - Success

Loading MST PCI configuration module - Success

Create devices

Unloading MST PCI module (unused) – Success

Identify the MST name for the interfaces you want to convert:

$ sudo mst status -v

MST modules:

------------

MST PCI module is not loaded

MST PCI configuration module loaded

PCI devices:

------------

DEVICE_TYPE MST PCI RDMA NET

[…]

BlueField3(rev:1) /dev/mst/mt41692_pciconf2.1 45:00.1 mlx5_2 net-ibs211f1

BlueField3(rev:1) /dev/mst/mt41692_pciconf2 45:00.0 mlx5_1 net-ibs211f0

BlueField3(rev:1) /dev/mst/mt41692_pciconf1 2b:00.0 mlx5_5 net-ens203f0np0

[…]

In this case, for a B3220 which has two interfaces, MST is /dev/mst/mt41692_pciconf2.

If you want to convert an interface to Ethernet mode, change the LINK_TYPE for both ports to “2” (Ethernet).

To convert an Ethernet interface to Infiniband mode, change the LINK_TYPE for both ports to “1” (Infiniband).

$ sudo mlxconfig -d /dev/mst/mt41692_pciconf2 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:

----------

Device type: BlueField3

Name: 900-9D3B6-00SN-A_Ax

Description: NVIDIA BlueField-3 B3240 P-Series Dual-slot FHHL DPU; 400GbE / NDR IB (default mode); Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Disabled

Device: /dev/mst/mt41692_pciconf2

Configurations: Next Boot New

LINK_TYPE_P1 IB(1) ETH(2)

LINK_TYPE_P2 IB(1) ETH(2)

Apply new Configuration? (y/n) [n] : y

Applying... Done!

-I- Please reboot machine to load new configurations.

While the output suggests rebooting the system, DPUs will NOT reset even if the host is rebooted. You have to perform a complete host power cycle; that is, shut it down and power it back on.

Alternatively, you can gracefully shut down Bluefield ARM cores following the procedure given below.

Gracefully shut down Bluefield ARM cores. (The following procedure will gracefully shutdown BF ARM cores without rebooting the host OS.)

$ sudo mlxfwreset -d /dev/mst/mt41692_pciconf2 -l 1 -t 4 r

The reset level for device, /dev/mst/mt41692_pciconf2 is:

1: Only ARM side will not remain up ("Immediate reset").

Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset.

The ARM side will be restarted, and it will be unavailable for a while.

Continue with reset?[y/N] y

-I- Sending Reset Command To Fw -Done

-I- FW was loaded successfully.

Warm reboot of the interface and host reboot:

The following procedure will perform a warm reboot of the interface, then send a reboot command to the host OS:

$ sudo mlxfwreset -d /dev/mst/mt41692_pciconf2 -l 4 r

The reset level for device, /dev/mst/mt41692_pciconf2 is:

4: Warm Reboot

Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset.

The ARM side will be restarted, and it will be unavailable for a while.

Continue with reset?[y/N] y

-I- Sending Reset Command To Fw -Done

-I- Sending reboot command to machine -

Check that the interfaces are now in the expected state after the necessary power cycle or reset:

$ sudo lshw -C network -short

H/W path Device Class Description

=====================================================================

/0/100/1.1/0/1/0 ens204f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/117/1.1/0/3/0 ens203f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/11e/1.1/0/2/0 ens211f0np0 network MT43244 BlueField-3 integrated ConnectX-7 network controller

/0/11e/1.1/0/2/0.1 ens211f1np1 network MT43244 BlueField-3 integrated ConnectX-7 network controller