Cisco UCS Manager B-Series Troubleshooting Guide
Introduction

Table Of Contents

Overview

Troubleshooting Information in the Cisco UCS Manager GUI

Troubleshooting Information in the Cisco UCS Manager CLI

Additional Troubleshooting Documentation

Faults

Fault Severities

Fault States

Fault Types

Properties of Faults

Lifecycle of Faults

Faults in the Cisco UCS Manager GUI

Faults in the Cisco UCS Manager CLI

Fault Collection Policy

Events

Properties of Events

Events in the Cisco UCS Manager GUI

Events in the Cisco UCS Manager CLI

Core Files

Core Files in the Cisco UCS Manager GUI

Core Files in the Cisco UCS Manager CLI

Core File Exporter

Audit Log

Properties of the Audit Log Entries

Audit Log in the Cisco UCS Manager GUI

Audit Log in the Cisco UCS Manager CLI

System Event Log

SEL File

SEL Policy

Syslog

Syslog Configuration

Syslog Location

Syslog Entry Format

Syslog Entry Severities

Syslog Entry Parameters

Syslog Services


Overview


This chapter provides an overview of where to find faults, events, and other information that can help you troubleshoot issues with Cisco Unified Computing System (Cisco UCS) B-Series Servers.

This chapter includes the following sections:

Troubleshooting Information in the Cisco UCS Manager GUI

Troubleshooting Information in the Cisco UCS Manager CLI

Additional Troubleshooting Documentation

Faults

Events

Core Files

Audit Log

System Event Log

Syslog

Troubleshooting Information in the Cisco UCS Manager GUI

The Cisco UCS Manager GUI provides several tabs and other areas that you can use to find troubleshooting information for a Cisco UCS instance. For example, you can view faults and events for specific objects or for all objects in the system.

The Admin tab in the Navigation pane provides access to faults, events, core files, and other information that can help you troubleshoot issues.

If you select Faults, Events and Audit Log in the Filter field on the Admin tab, the Cisco UCS Manager GUI limits the tree browser so that you can only access the following:

The faults for all components in the system

The events for all components in the system

The audit log for the system

Any core files created by the fabric interconnects in the system

The fault collection and core file export settings


Note Fault thresholds might need to be modified. See the "Statistics Threshold Policy" section in the Cisco UCS Manager GUI Configuration Guide.


Troubleshooting Information in the Cisco UCS Manager CLI

The Cisco UCS Manager CLI includes several show commands that you can execute to find troubleshooting information for a Cisco UCS instance. These show commands are scope-aware, which means that if you enter the show fault command from the top scope, it displays all faults in the system. However, if you scope to a specific object, the show fault command displays faults that are related to that object only.


Note Fault thresholds might need to be modified. See the "Statistics Threshold Policy" section in the Cisco UCS Manager CLI Configuration Guide.


Additional Troubleshooting Documentation

Additional troubleshooting information is available in the following documents:

Cisco UCS Manager Faults and Error Message Reference—Contains information about Cisco UCS Manager faults and System Event Log messages, including BIOS and CIMC messages.

Cisco UCS C-Series Servers Integrated Management Controller Troubleshooting Guide—Contains information about how to troubleshoot issues with C-Series rack-mount servers.

Faults

In Cisco UCS, a fault is a mutable object that is managed by Cisco UCS Manager. Each fault represents a failure in the Cisco UCS instance or an alarm threshold that has been raised. During the lifecycle of a fault, it can change from one state or severity to another.

Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, the object transitions to a functional state.

A fault remains in Cisco UCS Manager until the fault is cleared and deleted according to the settings in the fault collection policy.

You can view all faults in a Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. You can also configure the fault collection policy to determine how a Cisco UCS instance collects and retains faults.


Note All Cisco UCS faults can be trapped by SNMP.


This section includes the following topics:

Fault Severities

Fault States

Fault Types

Properties of Faults

Lifecycle of Faults

Faults in the Cisco UCS Manager GUI

Faults in the Cisco UCS Manager CLI

Fault Collection Policy

Fault Severities

A fault raised in a Cisco UCS instance can transition through more than one severity during its lifecycle. Table 1-1 describes the fault severities that you may encounter.

Table 1-1 Fault Severities in Cisco UCS

Severity
Description

Critical

Service-affecting condition that requires immediate corrective action. For example, this severity could indicate that the managed object is out of service and its capability must be restored.

Major

Service-affecting condition that requires urgent corrective action. For example, this severity could indicate a severe degradation in the capability of the managed object and that its full capability must be restored.

Minor

Nonservice-affecting fault condition that requires corrective action to prevent a more serious fault from occurring. For example, this severity could indicate that the detected alarm condition is not degrading the capacity of the managed object.

Warning

Potential or impending service-affecting fault that has no significant effects in the system. You should take action to further diagnose, if necessary, and correct the problem to prevent it from becoming a more serious service-affecting fault.

Condition

Informational message about a condition, possibly independently insignificant.

Info

Basic notification or informational message, possibly independently insignificant.


Fault States

A fault raised in a Cisco UCS instance transitions through more than one state during its lifecycle. Table 1-2 describes the possible fault states in alphabetical order.

Table 1-2 Fault States in Cisco UCS

State
Description

Cleared

Condition that has been resolved and cleared.

Flapping

Fault that was raised, cleared, and raised again within a short time interval, known as the flap interval.

Soaking

Fault that was raised and cleared within a short time interval, known as the flap interval. Because this state may be a flapping condition, the fault severity remains at its original active value, but this state indicates the condition that raised the fault has cleared.


Fault Types

A fault raised in a Cisco UCS instance can be one of the types described in Table 1-3.

Table 1-3 Types of Faults in Cisco UCS

Type
Description

fsm

FSM task has failed to complete successfully, or Cisco UCS Manager is retrying one of the stages of the FSM.

equipment

Cisco UCS Manager has detected that a physical component is inoperable or has another functional issue.

server

Cisco UCS Manager cannot complete a server task, such as associating a service profile with a server.

configuration

Cisco UCS Manager cannot successfully configure a component.

environment

Cisco UCS Manager has detected a power problem, thermal problem, voltage problem, or loss of CMOS settings.

management

Cisco UCS Manager has detected a serious management issue, such as one of the following:

Critical services could not be started

The primary fabric interconnect could not be identified

Components in the instance include incompatible firmware versions

connectivity

Cisco UCS Manager has detected a connectivity problem, such as an unreachable adapter.

network

Cisco UCS Manager has detected a network issue, such as a link down.

operational

Cisco UCS Manager has detected an operational problem, such as a log capacity issue or a failed server discovery.


Properties of Faults

Cisco UCS Manager provides detailed information about each fault raised in a Cisco UCS instance. Table 1-4 describes the fault properties that you can view in the Cisco UCS Manager CLI or the Cisco UCS Manager GUI.

Table 1-4 Fault Properties 

Property Name
Description

Severity

Current severity level of the fault, which can be any of the severities described in Table 1-1.

Last Transition

Day and time on which the severity for the fault last changed. If the severity has not changed since the fault was raised, this property displays the original creation date.

Affected Object

Component that is affected by the condition that raised the fault.

Description

Description of the fault.

ID

Unique identifier assigned to the fault.

Type

Type of fault that has been raised, which can be any of the types described in Table 1-3.

Cause

Unique identifier associated with the condition that caused the fault.

Created at

Day and time when the fault occurred.

Code

Unique identifier assigned to the fault.

Number of Occurrences

Number of times the event that raised the fault occurred.

Original Severity

Severity assigned to the fault the first time it occurred.

Previous Severity

Previous severity level.

Highest Severity

Highest severity encountered for this issue.


Lifecycle of Faults

Faults in Cisco UCS are stateful. Only one instance of a given fault can exist on each object. If the same fault occurs a second time, Cisco UCS increases the number of occurrences by one.

A fault has the following lifecycle:

1. A condition occurs in the system and Cisco UCS raises a fault. This is the active state.

2. When the fault is alleviated, it enters a flapping or soaking interval that is designed to prevent flapping. Flapping occurs when a fault is raised and cleared several times in rapid succession. During the flapping interval, the fault retains its severity for the length of time specified in the fault collection policy.

3. If the condition reoccurs during the flapping interval, the fault returns to the active state. If the condition does not reoccur during the flapping interval, the fault is cleared.

4. The cleared fault enters the retention interval. This interval ensures that the fault reaches the attention of an administrator even if the condition that caused the fault has been alleviated and the fault has not been deleted prematurely. The retention interval retains the cleared fault for the length of time specified in the fault collection policy.

5. If the condition reoccurs during the retention interval, the fault returns to the active state. If the condition does not reoccur, the fault is deleted.

Faults in the Cisco UCS Manager GUI

If you want to view faults for a single object in the system, navigate to that object in the Cisco UCS Manager GUI and click the Faults tab in the Work pane. If you want to view faults for all objects in the system, navigate to the Faults node on the Admin tab under Faults, Events and Audit Log.

In addition, you can also view a summary of all faults in a Cisco UCS instance in the Fault Summary area in the upper left of the Cisco UCS Manager GUI. This area provides a summary of all faults that have occurred in the Cisco UCS instance.

Each fault severity is represented by a different icon. The number below each icon indicates how many faults of that severity have occurred in the system. If you click an icon, the Cisco UCS Manager GUI opens the Faults tab in the Work area and displays the details of all faults with that severity.

Faults in the Cisco UCS Manager CLI

If you want to view the faults for all objects in the system, enter the show fault command from the top-level scope. If you want to view the faults for a specific object, scope to that object and then execute the show fault command.

If you want to view all available details about a fault, enter the show fault detail command.

Fault Collection Policy

The fault collection policy controls the lifecycle of a fault in the Cisco UCS instance, including the length of time that each fault remains in the flapping and retention intervals.


Tip For information on how to configure the fault collection policy, see the Cisco UCS configuration guides, which are accessible through the Cisco UCS B-Series Servers Documentation Roadmap.


Events

In Cisco UCS, an event is an immutable object that is managed by Cisco UCS Manager. Each event represents a nonpersistent condition in the Cisco UCS instance. After Cisco UCS Manager creates and logs an event, the event does not change. For example, if you power on a server, Cisco UCS Manager creates and logs an event for the beginning and the end of that request.

You can view events for a single object, or you can view all events in a Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. Events remain in the Cisco UCS until the event log fills up. When the log is full, Cisco UCS Manager purges the log and all events in it.

This section includes the following topics:

Properties of Events

Events in the Cisco UCS Manager GUI

Events in the Cisco UCS Manager CLI

Properties of Events

Cisco UCS Manager provides detailed information about each event created and logged in a Cisco UCS instance. Table 1-5 describes the fault properties that you can view in the Cisco UCS Manager CLI or the Cisco UCS Manager GUI.

Table 1-5 Event Properties

Property Name
Description

Affected Object

Component that created the event.

Description

Description of the event.

Cause

Unique identifier associated with the event.

Created at

Day and time when the event was created.

User

Type of user that created the event, such as one of the following:

admin

internal

blank

Code

Unique identifier assigned to the event.


Events in the Cisco UCS Manager GUI

If you want to view events for a single object in the system, navigate to that object in the Cisco UCS Manager GUI and click the Events tab in the Work pane. If you want to view events for all objects in the system, navigate to the Events node on the Admin tab under the Faults, Events and Audit Log.

Events in the Cisco UCS Manager CLI

If you want to view events for all objects in the system, enter the show event command from the top-level scope. If you want to view events for a specific object, scope to that object and then enter the show event command.

If you want to view all available details about an event, enter the show event detail command.

Core Files

Critical failures in Cisco UCS Manager and some of the Cisco UCS components, such as a fabric interconnect or an I/O module, can cause the system to create a core file. Each core file contains a large amount of data about the system and the component at the time of the failure.

Cisco UCS Manager manages the core files from all of the components. You can configure Cisco UCS Manager to export a copy of a core file to a location on an external TFTP server as soon as that core file is created.

This section includes the following topics:

Core Files in the Cisco UCS Manager GUI

Core Files in the Cisco UCS Manager CLI

Core File Exporter

Core Files in the Cisco UCS Manager GUI

You can find out if a component in the Cisco UCS instance generated a core file by navigating to the Core Files node on the Admin tab under the Faults, Events and Audit Log.

Core Files in the Cisco UCS Manager CLI

You can find out if a component in the Cisco UCS instance generated a core file by entering the following commands:

scope monitoring

scope sysdebug

show cores

Core File Exporter

If you enable the Core File Exporter, you can configure Cisco UCS Manager to export the core files as soon as they occur to a specified location on the network through TFTP. This functionality allows you to export the tar file with the contents of the core file to the location specified.


Tip For information on how to enable the exporter, see the Cisco UCS configuration guides, which are accessible through the Cisco UCS B-Series Servers Documentation Roadmap.


Audit Log

The audit log records actions performed by users in Cisco UCS Manager, including direct and indirect actions. Each entry in the audit log represents a single, non-persistent action. For example, if a user logs in, logs out, or creates, modifies, or deletes an object such as a service profile, Cisco UCS Manager adds an entry to the audit log for that action.

You can view the audit log entries in the Cisco UCS Manager CLI, Cisco UCS Manager GUI, or output from the show tech-support command.

This section includes the following topics:

Properties of the Audit Log Entries

Audit Log in the Cisco UCS Manager GUI

Audit Log in the Cisco UCS Manager CLI

Properties of the Audit Log Entries

Cisco UCS Manager provides detailed information about each entry in the audit log. Table 1-6 describes the fault properties that you can view in the Cisco UCS Manager CLI or the Cisco UCS Manager GUI.

Table 1-6 Audit Log Entry Properties

Property Name
Description

ID

Unique identifier associated with the audit log message.

Affected Object

Component affected by the user action.

Severity

Current severity level of the user action associated with the audit log message. These severities are also used for the faults, as described in Table 1-1.

Trigger

User role associated with the user that raised the message.

User

Type of user that created the event, as follows:

admin

internal

blank

Indication

Action indicated by the audit log message. This can be one of the following:

creation—A component was added to the system.

modification—An existing component was changed.

Description

Description of the user action.


Audit Log in the Cisco UCS Manager GUI

If you want to view the audit log, navigate to the Audit Log node on the Admin tab under Faults, Events and Audit Log.

Audit Log in the Cisco UCS Manager CLI

If you want to view the audit log, enter the following commands:

scope security

show audit-logs

System Event Log

The system event log (SEL) resides on the CIMC in NVRAM. It records most of the server-related events, such as overvoltage and undervoltage, temperature events, fan events, events from BIOS, and so on. The SEL is primarily used for troubleshooting purposes.


Tip For more information about the SEL, including how to view the SEL for each server and configure the SEL policy, see the Cisco UCS configuration guides, which are accessible through the Cisco UCS B-Series Servers Documentation Roadmap.


This section includes the following topics:

SEL File

SEL Policy

SEL File

The SEL file is approximately 40 KB. No further events can be recorded when the SEL file is full. It must be cleared before additional events can be recorded.

SEL Policy

You can use the SEL policy to back up the SEL to a remote server and optionally clear the SEL after a backup operation occurs. Backup operations can be triggered, based on specific actions, or they can occur at regular intervals. You can also manually back up or clear the SEL.

Cisco UCS Manager automatically generates the SEL backup file, according to the settings in the SEL policy. The filename format is sel-SystemName-ChassisID-ServerID-ServerSerialNumber-Timestamp

For example, a filename could be sel-UCS-A-ch01-serv01-QCI12522939-20091121160736.

Syslog

The syslog provides a central point for collecting and processing system logs that you can use to troubleshoot and audit the Cisco UCS instance. Cisco UCS Manager relies on the Cisco NX-OS syslog mechanism and API, and on the syslog feature of the primary fabric interconnect to collect and process the syslog entries.

This section includes the following topics:

Syslog Configuration

Syslog Location

Syslog Entry Format

Syslog Entry Severities

Syslog Entry Parameters

Syslog Services

Syslog Configuration

Cisco UCS Manager manages and configures the syslog collectors for the Cisco UCS instance and deploys the configuration to the fabric interconnect or fabric interconnects. This configuration affects all syslog entries generated in the Cisco UCS instance by Cisco NX-OS or by Cisco UCS Manager.

Syslog Location

You can configure Cisco UCS Manager to do one or more of the following with the syslog and syslog entries:

Display the syslog entries in the console or on the monitor

Store the syslog entries in a file

Forward the syslog entries to up to three external log collectors where the syslog for the Cisco UCS instance is stored

Syslog Entry Format

Each syslog entry generated by a Cisco UCS component is formatted as follows:

Year month date hh:mm:ss hostname %facility-severity-MNEMONIC description

For example: 2007 Nov 1 14:07:58 excal-113 %MODULE-5-MOD_OK: Module 1 is online

Syslog Entry Severities

A syslog entry is assigned a Cisco UCS severity by Cisco UCS Manager. Table 1-7 shows how the Cisco UCS severities map to the syslog severities.

Table 1-7 Syslog Entry Severities in Cisco UCS

Cisco UCS Severity
Syslog Severity

Critical

CRIT

Major

ERR

Minor

ERR

Warning

WARNING

Info

INFO


Syslog Entry Parameters

Table 1-8 describes the information contained in each syslog entry.

Table 1-8 Syslog Message Content

Name
Description

Facility

Logging facility that generated and sent the syslog entry. The facilities are broad categories that are represented by integers. These sources can be one of the following standard Linux facilities:

local0

local1

local2

local3

local4

local5

local6

local7

Severity

Severity of the event, alert, or issue that caused the syslog entry to be generated. The severity can be one of the following:

emergencies

critical

alerts

errors

warnings

information

notifications

debugging

Hostname

Hostname included in the syslog entry that depends upon the component where the entry originated, as follows:

The fabric interconnect, the Cisco UCS Manager, or the hostname of the Cisco UCS instance

For all other components, the hostname associated with the virtual interface

Timestamp

Date and time when the syslog entry was generated.

Message

Description of the event, alert, or issue that caused the syslog entry to be generated.


Syslog Services

The following Cisco UCS components use the Cisco NX-OS syslog services to generate syslog entries for system information and alerts:

I/O module—All syslog entries are sent by syslogd to the fabric interconnect to which it is connected.

CIMC—All syslog entries are sent to the primary fabric interconnect in a cluster configuration.

Adapter—All syslog entries are sent by NIC-Tools/Syslog to both fabric interconnects.

Cisco UCS Manager—Self-generated syslog entries are logged according to the syslog configuration.