Table Of Contents
Overview
Troubleshooting Information in the Cisco UCS Manager GUI
Troubleshooting Information in the Cisco UCS Manager CLI
Additional Troubleshooting Documentation
Faults
Fault Severities
Fault States
Fault Types
Properties of Faults
Lifecycle of Faults
Faults in the Cisco UCS Manager GUI
Faults in the Cisco UCS Manager CLI
Fault Collection Policy
Events
Properties of Events
Events in the Cisco UCS Manager GUI
Events in the Cisco UCS Manager CLI
Core Files
Core Files in the Cisco UCS Manager GUI
Core Files in the Cisco UCS Manager CLI
Core File Exporter
Audit Log
Properties of the Audit Log Entries
Audit Log in the Cisco UCS Manager GUI
Audit Log in the Cisco UCS Manager CLI
System Event Log
SEL File
SEL Policy
Syslog
Syslog Configuration
Syslog Location
Syslog Entry Format
Syslog Entry Severities
Syslog Entry Parameters
Syslog Services
Overview
This chapter provides an overview of where to find faults, events, and other information that can help you troubleshoot issues with Cisco Unified Computing System (Cisco UCS) B-Series Servers.
This chapter includes the following sections:
•Troubleshooting Information in the Cisco UCS Manager GUI
•Troubleshooting Information in the Cisco UCS Manager CLI
•Additional Troubleshooting Documentation
•Faults
•Events
•Core Files
•Audit Log
•System Event Log
•Syslog
Troubleshooting Information in the Cisco UCS Manager GUI
The Cisco UCS Manager GUI provides several tabs and other areas that you can use to find troubleshooting information for a Cisco UCS instance. For example, you can view faults and events for specific objects or for all objects in the system.
The Admin tab in the Navigation pane provides access to faults, events, core files, and other information that can help you troubleshoot issues.
If you select Faults, Events and Audit Log in the Filter field on the Admin tab, the Cisco UCS Manager GUI limits the tree browser so that you can only access the following:
•The faults for all components in the system
•The events for all components in the system
•The audit log for the system
•Any core files created by the fabric interconnects in the system
•The fault collection and core file export settings
Note Fault thresholds might need to be modified. See the "Statistics Threshold Policy" section in the Cisco UCS Manager GUI Configuration Guide.
Troubleshooting Information in the Cisco UCS Manager CLI
The Cisco UCS Manager CLI includes several show commands that you can execute to find troubleshooting information for a Cisco UCS instance. These show commands are scope-aware, which means that if you enter the show fault command from the top scope, it displays all faults in the system. However, if you scope to a specific object, the show fault command displays faults that are related to that object only.
Note Fault thresholds might need to be modified. See the "Statistics Threshold Policy" section in the Cisco UCS Manager CLI Configuration Guide.
Additional Troubleshooting Documentation
Additional troubleshooting information is available in the following documents:
•Cisco UCS Manager Faults and Error Message Reference—Contains information about Cisco UCS Manager faults and System Event Log messages, including BIOS and CIMC messages.
•Cisco UCS C-Series Servers Integrated Management Controller Troubleshooting Guide—Contains information about how to troubleshoot issues with C-Series rack-mount servers.
Faults
In Cisco UCS, a fault is a mutable object that is managed by Cisco UCS Manager. Each fault represents a failure in the Cisco UCS instance or an alarm threshold that has been raised. During the lifecycle of a fault, it can change from one state or severity to another.
Each fault includes information about the operational state of the affected object at the time the fault was raised. If the fault is transitional and the failure is resolved, the object transitions to a functional state.
A fault remains in Cisco UCS Manager until the fault is cleared and deleted according to the settings in the fault collection policy.
You can view all faults in a Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. You can also configure the fault collection policy to determine how a Cisco UCS instance collects and retains faults.
Note All Cisco UCS faults can be trapped by SNMP.
This section includes the following topics:
•Fault Severities
•Fault States
•Fault Types
•Properties of Faults
•Lifecycle of Faults
•Faults in the Cisco UCS Manager GUI
•Faults in the Cisco UCS Manager CLI
•Fault Collection Policy
Fault Severities
A fault raised in a Cisco UCS instance can transition through more than one severity during its lifecycle. Table 1-1 describes the fault severities that you may encounter.
Table 1-1 Fault Severities in Cisco UCS
Severity
|
Description
|
Critical
|
Service-affecting condition that requires immediate corrective action. For example, this severity could indicate that the managed object is out of service and its capability must be restored.
|
Major
|
Service-affecting condition that requires urgent corrective action. For example, this severity could indicate a severe degradation in the capability of the managed object and that its full capability must be restored.
|
Minor
|
Nonservice-affecting fault condition that requires corrective action to prevent a more serious fault from occurring. For example, this severity could indicate that the detected alarm condition is not degrading the capacity of the managed object.
|
Warning
|
Potential or impending service-affecting fault that has no significant effects in the system. You should take action to further diagnose, if necessary, and correct the problem to prevent it from becoming a more serious service-affecting fault.
|
Condition
|
Informational message about a condition, possibly independently insignificant.
|
Info
|
Basic notification or informational message, possibly independently insignificant.
|
Fault States
A fault raised in a Cisco UCS instance transitions through more than one state during its lifecycle. Table 1-2 describes the possible fault states in alphabetical order.
Table 1-2 Fault States in Cisco UCS
State
|
Description
|
Cleared
|
Condition that has been resolved and cleared.
|
Flapping
|
Fault that was raised, cleared, and raised again within a short time interval, known as the flap interval.
|
Soaking
|
Fault that was raised and cleared within a short time interval, known as the flap interval. Because this state may be a flapping condition, the fault severity remains at its original active value, but this state indicates the condition that raised the fault has cleared.
|
Fault Types
A fault raised in a Cisco UCS instance can be one of the types described in Table 1-3.
Table 1-3 Types of Faults in Cisco UCS
Type
|
Description
|
fsm
|
FSM task has failed to complete successfully, or Cisco UCS Manager is retrying one of the stages of the FSM.
|
equipment
|
Cisco UCS Manager has detected that a physical component is inoperable or has another functional issue.
|
server
|
Cisco UCS Manager cannot complete a server task, such as associating a service profile with a server.
|
configuration
|
Cisco UCS Manager cannot successfully configure a component.
|
environment
|
Cisco UCS Manager has detected a power problem, thermal problem, voltage problem, or loss of CMOS settings.
|
management
|
Cisco UCS Manager has detected a serious management issue, such as one of the following:
•Critical services could not be started
•The primary fabric interconnect could not be identified
•Components in the instance include incompatible firmware versions
|
connectivity
|
Cisco UCS Manager has detected a connectivity problem, such as an unreachable adapter.
|
network
|
Cisco UCS Manager has detected a network issue, such as a link down.
|
operational
|
Cisco UCS Manager has detected an operational problem, such as a log capacity issue or a failed server discovery.
|
Properties of Faults
Cisco UCS Manager provides detailed information about each fault raised in a Cisco UCS instance. Table 1-4 describes the fault properties that you can view in the Cisco UCS Manager CLI or the Cisco UCS Manager GUI.
Table 1-4 Fault Properties
Property Name
|
Description
|
Severity
|
Current severity level of the fault, which can be any of the severities described in Table 1-1.
|
Last Transition
|
Day and time on which the severity for the fault last changed. If the severity has not changed since the fault was raised, this property displays the original creation date.
|
Affected Object
|
Component that is affected by the condition that raised the fault.
|
Description
|
Description of the fault.
|
ID
|
Unique identifier assigned to the fault.
|
Type
|
Type of fault that has been raised, which can be any of the types described in Table 1-3.
|
Cause
|
Unique identifier associated with the condition that caused the fault.
|
Created at
|
Day and time when the fault occurred.
|
Code
|
Unique identifier assigned to the fault.
|
Number of Occurrences
|
Number of times the event that raised the fault occurred.
|
Original Severity
|
Severity assigned to the fault the first time it occurred.
|
Previous Severity
|
Previous severity level.
|
Highest Severity
|
Highest severity encountered for this issue.
|
Lifecycle of Faults
Faults in Cisco UCS are stateful. Only one instance of a given fault can exist on each object. If the same fault occurs a second time, Cisco UCS increases the number of occurrences by one.
A fault has the following lifecycle:
1. A condition occurs in the system and Cisco UCS raises a fault. This is the active state.
2. When the fault is alleviated, it enters a flapping or soaking interval that is designed to prevent flapping. Flapping occurs when a fault is raised and cleared several times in rapid succession. During the flapping interval, the fault retains its severity for the length of time specified in the fault collection policy.
3. If the condition reoccurs during the flapping interval, the fault returns to the active state. If the condition does not reoccur during the flapping interval, the fault is cleared.
4. The cleared fault enters the retention interval. This interval ensures that the fault reaches the attention of an administrator even if the condition that caused the fault has been alleviated and the fault has not been deleted prematurely. The retention interval retains the cleared fault for the length of time specified in the fault collection policy.
5. If the condition reoccurs during the retention interval, the fault returns to the active state. If the condition does not reoccur, the fault is deleted.
Faults in the Cisco UCS Manager GUI
If you want to view faults for a single object in the system, navigate to that object in the Cisco UCS Manager GUI and click the Faults tab in the Work pane. If you want to view faults for all objects in the system, navigate to the Faults node on the Admin tab under Faults, Events and Audit Log.
In addition, you can also view a summary of all faults in a Cisco UCS instance in the Fault Summary area in the upper left of the Cisco UCS Manager GUI. This area provides a summary of all faults that have occurred in the Cisco UCS instance.
Each fault severity is represented by a different icon. The number below each icon indicates how many faults of that severity have occurred in the system. If you click an icon, the Cisco UCS Manager GUI opens the Faults tab in the Work area and displays the details of all faults with that severity.
Faults in the Cisco UCS Manager CLI
If you want to view the faults for all objects in the system, enter the show fault command from the top-level scope. If you want to view the faults for a specific object, scope to that object and then execute the show fault command.
If you want to view all available details about a fault, enter the show fault detail command.
Fault Collection Policy
The fault collection policy controls the lifecycle of a fault in the Cisco UCS instance, including the length of time that each fault remains in the flapping and retention intervals.
Tip For information on how to configure the fault collection policy, see the Cisco UCS configuration guides, which are accessible through the Cisco UCS B-Series Servers Documentation Roadmap.
Events
In Cisco UCS, an event is an immutable object that is managed by Cisco UCS Manager. Each event represents a nonpersistent condition in the Cisco UCS instance. After Cisco UCS Manager creates and logs an event, the event does not change. For example, if you power on a server, Cisco UCS Manager creates and logs an event for the beginning and the end of that request.
You can view events for a single object, or you can view all events in a Cisco UCS instance from either the Cisco UCS Manager CLI or the Cisco UCS Manager GUI. Events remain in the Cisco UCS until the event log fills up. When the log is full, Cisco UCS Manager purges the log and all events in it.
This section includes the following topics:
•Properties of Events
•Events in the Cisco UCS Manager GUI
•Events in the Cisco UCS Manager CLI
Properties of Events
Cisco UCS Manager provides detailed information about each event created and logged in a Cisco UCS instance. Table 1-5 describes the fault properties that you can view in the Cisco UCS Manager CLI or the Cisco UCS Manager GUI.
Table 1-5 Event Properties
Property Name
|
Description
|
Affected Object
|
Component that created the event.
|
Description
|
Description of the event.
|
Cause
|
Unique identifier associated with the event.
|
Created at
|
Day and time when the event was created.
|
User
|
Type of user that created the event, such as one of the following:
•admin
•internal
•blank
|
Code
|
Unique identifier assigned to the event.
|
Events in the Cisco UCS Manager GUI
If you want to view events for a single object in the system, navigate to that object in the Cisco UCS Manager GUI and click the Events tab in the Work pane. If you want to view events for all objects in the system, navigate to the Events node on the Admin tab under the Faults, Events and Audit Log.
Events in the Cisco UCS Manager CLI
If you want to view events for all objects in the system, enter the show event command from the top-level scope. If you want to view events for a specific object, scope to that object and then enter the show event command.
If you want to view all available details about an event, enter the show event detail command.
Core Files
Critical failures in Cisco UCS Manager and some of the Cisco UCS components, such as a fabric interconnect or an I/O module, can cause the system to create a core file. Each core file contains a large amount of data about the system and the component at the time of the failure.
Cisco UCS Manager manages the core files from all of the components. You can configure Cisco UCS Manager to export a copy of a core file to a location on an external TFTP server as soon as that core file is created.
This section includes the following topics:
•Core Files in the Cisco UCS Manager GUI
•Core Files in the Cisco UCS Manager CLI
•Core File Exporter
Core Files in the Cisco UCS Manager GUI
You can find out if a component in the Cisco UCS instance generated a core file by navigating to the Core Files node on the Admin tab under the Faults, Events and Audit Log.
Core Files in the Cisco UCS Manager CLI
You can find out if a component in the Cisco UCS instance generated a core file by entering the following commands:
•scope monitoring
•scope sysdebug
•show cores
Core File Exporter
If you enable the Core File Exporter, you can configure Cisco UCS Manager to export the core files as soon as they occur to a specified location on the network through TFTP. This functionality allows you to export the tar file with the contents of the core file to the location specified.
Tip For information on how to enable the exporter, see the Cisco UCS configuration guides, which are accessible through the Cisco UCS B-Series Servers Documentation Roadmap.
Audit Log
The audit log records actions performed by users in Cisco UCS Manager, including direct and indirect actions. Each entry in the audit log represents a single, non-persistent action. For example, if a user logs in, logs out, or creates, modifies, or deletes an object such as a service profile, Cisco UCS Manager adds an entry to the audit log for that action.
You can view the audit log entries in the Cisco UCS Manager CLI, Cisco UCS Manager GUI, or output from the show tech-support command.
This section includes the following topics:
•Properties of the Audit Log Entries
•Audit Log in the Cisco UCS Manager GUI
•Audit Log in the Cisco UCS Manager CLI
Properties of the Audit Log Entries
Cisco UCS Manager provides detailed information about each entry in the audit log. Table 1-6 describes the fault properties that you can view in the Cisco UCS Manager CLI or the Cisco UCS Manager GUI.
Table 1-6 Audit Log Entry Properties
Property Name
|
Description
|
ID
|
Unique identifier associated with the audit log message.
|
Affected Object
|
Component affected by the user action.
|
Severity
|
Current severity level of the user action associated with the audit log message. These severities are also used for the faults, as described in Table 1-1.
|
Trigger
|
User role associated with the user that raised the message.
|
User
|
Type of user that created the event, as follows:
•admin
•internal
•blank
|
Indication
|
Action indicated by the audit log message. This can be one of the following:
•creation—A component was added to the system.
•modification—An existing component was changed.
|
Description
|
Description of the user action.
|
Audit Log in the Cisco UCS Manager GUI
If you want to view the audit log, navigate to the Audit Log node on the Admin tab under Faults, Events and Audit Log.
Audit Log in the Cisco UCS Manager CLI
If you want to view the audit log, enter the following commands:
•scope security
•show audit-logs
System Event Log
The system event log (SEL) resides on the CIMC in NVRAM. It records most of the server-related events, such as overvoltage and undervoltage, temperature events, fan events, events from BIOS, and so on. The SEL is primarily used for troubleshooting purposes.
Tip For more information about the SEL, including how to view the SEL for each server and configure the SEL policy, see the Cisco UCS configuration guides, which are accessible through the Cisco UCS B-Series Servers Documentation Roadmap.
This section includes the following topics:
•SEL File
•SEL Policy
SEL File
The SEL file is approximately 40 KB. No further events can be recorded when the SEL file is full. It must be cleared before additional events can be recorded.
SEL Policy
You can use the SEL policy to back up the SEL to a remote server and optionally clear the SEL after a backup operation occurs. Backup operations can be triggered, based on specific actions, or they can occur at regular intervals. You can also manually back up or clear the SEL.
Cisco UCS Manager automatically generates the SEL backup file, according to the settings in the SEL policy. The filename format is sel-SystemName-ChassisID-ServerID-ServerSerialNumber-Timestamp
For example, a filename could be sel-UCS-A-ch01-serv01-QCI12522939-20091121160736.
Syslog
The syslog provides a central point for collecting and processing system logs that you can use to troubleshoot and audit the Cisco UCS instance. Cisco UCS Manager relies on the Cisco NX-OS syslog mechanism and API, and on the syslog feature of the primary fabric interconnect to collect and process the syslog entries.
This section includes the following topics:
•Syslog Configuration
•Syslog Location
•Syslog Entry Format
•Syslog Entry Severities
•Syslog Entry Parameters
•Syslog Services
Syslog Configuration
Cisco UCS Manager manages and configures the syslog collectors for the Cisco UCS instance and deploys the configuration to the fabric interconnect or fabric interconnects. This configuration affects all syslog entries generated in the Cisco UCS instance by Cisco NX-OS or by Cisco UCS Manager.
Syslog Location
You can configure Cisco UCS Manager to do one or more of the following with the syslog and syslog entries:
•Display the syslog entries in the console or on the monitor
•Store the syslog entries in a file
•Forward the syslog entries to up to three external log collectors where the syslog for the Cisco UCS instance is stored
Syslog Entry Format
Each syslog entry generated by a Cisco UCS component is formatted as follows:
Year month date hh:mm:ss hostname %facility-severity-MNEMONIC description
For example: 2007 Nov 1 14:07:58 excal-113 %MODULE-5-MOD_OK: Module 1 is online
Syslog Entry Severities
A syslog entry is assigned a Cisco UCS severity by Cisco UCS Manager. Table 1-7 shows how the Cisco UCS severities map to the syslog severities.
Table 1-7 Syslog Entry Severities in Cisco UCS
Cisco UCS Severity
|
Syslog Severity
|
Critical
|
CRIT
|
Major
|
ERR
|
Minor
|
ERR
|
Warning
|
WARNING
|
Info
|
INFO
|
Syslog Entry Parameters
Table 1-8 describes the information contained in each syslog entry.
Table 1-8 Syslog Message Content
Name
|
Description
|
Facility
|
Logging facility that generated and sent the syslog entry. The facilities are broad categories that are represented by integers. These sources can be one of the following standard Linux facilities:
•local0
•local1
•local2
•local3
•local4
•local5
•local6
•local7
|
Severity
|
Severity of the event, alert, or issue that caused the syslog entry to be generated. The severity can be one of the following:
•emergencies
•critical
•alerts
•errors
•warnings
•information
•notifications
•debugging
|
Hostname
|
Hostname included in the syslog entry that depends upon the component where the entry originated, as follows:
•The fabric interconnect, the Cisco UCS Manager, or the hostname of the Cisco UCS instance
•For all other components, the hostname associated with the virtual interface
|
Timestamp
|
Date and time when the syslog entry was generated.
|
Message
|
Description of the event, alert, or issue that caused the syslog entry to be generated.
|
Syslog Services
The following Cisco UCS components use the Cisco NX-OS syslog services to generate syslog entries for system information and alerts:
•I/O module—All syslog entries are sent by syslogd to the fabric interconnect to which it is connected.
•CIMC—All syslog entries are sent to the primary fabric interconnect in a cluster configuration.
•Adapter—All syslog entries are sent by NIC-Tools/Syslog to both fabric interconnects.
•Cisco UCS Manager—Self-generated syslog entries are logged according to the syslog configuration.