Introduction

This chapter contains the following topics:

Overview

The key to success when troubleshooting the system hardware is to isolate the problem to a specific system component. The first step is to compare what the system is doing to what it should be doing. Because a startup problem can usually be attributed to a single component, it is more efficient to isolate the problem to a subsystem rather than troubleshoot each separate component in the system.

Terminology

Below is a list of terminology essential for understanding the concepts in this troubleshooting guide.

Term Definition
CPU Central Processing Unit. Common reference to both CISC (complex instruction set computer) and RISC (reduced instruction set computer) processors
FHHL PCIe card form factor, Full Height Half Length
GPU Graphics Processing Unit
NIC Network Interface Card
NVMe Non-Volatile Memory Express
OCP Open Compute Project
PCIe Peripheral Component Interconnect Express. Refers to a communication like defined within PCISIG standards body.
BMC Baseboard Management Controller
FRU Field Replace Unit
DCSCM Datacenter Secure control Module Specification
IPMI Intelligent Platform Management Interface

Initial Power Up

Problems with the initial power up are often caused by a module that is not firmly connected to the backplane or a power supply that has been disconnected from the power cord connector.

Overheating can also cause problems with the system, though typically only after the system has been operating for an extended period. The most common cause of overheating is the failure of a fan module.

Guidelines for Troubleshooting

When you troubleshoot issues with a C-Series Rack-Mount Server or any component in it, we recommend that you follow the guidelines.

Guidelines

Descriptions

Take screenshots of the fault or error message dialog box and other relevant areas.

These screenshots provide visual cues about the state of the C-Series server when the problem occurred. If your computer does not have software to take screenshots, check the documentation for your operating system, as it may include this functionality

Record the steps that you took directly before the issue occurred.

If you have access to screen or keystroke recording software, repeat the steps you took and record what occurs.

If you do not have access to this type of software, repeat the steps you took and make detailed notes of the steps and what happens after each step.

Enter the show tech-support command

The information about the current state of the server is very helpful to the Cisco Technical Assistance Center (TAC) and frequently provides the information needed to identify the source of the problem.

The Cisco UCS C-Series Rack Servers Family includes the following subsystems on most chassis:

  • Power supply— This includes the power supply fans.

  • Fan module—The chassis fan module should operate whenever system power is on. You should see the Fan LED turn green and hear the fan module to determine whether it is operating. If the Fan LED is red, this indicates that one or more fans in the fan module are not operating. You should immediately contact your customer service representative.


Note


There are no installation adjustments that you can make if the fan module does not function properly at initial startup.