About Environmental Monitoring
Environmental monitoring of chassis components provides early warning indications of possible component failure. This warning helps you to ensure the safe and reliable operation of your system and avoid network interruptions.
This section describes how to monitor critical system components so that you can identify and rapidly correct hardware-related problems.
Using CLI Commands to Monitor your Environment
Enter the show environment [all | counters | history | location | sensor | status | summary | table] command to display system status information. Keyword descriptions are listed in the following table.
Keyword |
Purpose |
---|---|
all |
Displays a detailed listing of all the environmental monitor parameters (for example, the power supplies, temperature readings, voltage readings, and so on). This is the default. |
counters |
Displays operational counters. |
history |
Displays the sensor state change history. |
location |
Displays sensors by location. |
sensor |
Displays the sensor summary. |
status |
Displays field-replaceable unit (FRU) operational status and power and power supply fan sensor information. |
summary |
Displays the summary of all the environment monitoring sensors. |
table |
Displays a sensor state table. |
Displaying Environment Conditions
Supervisor modules and their associated line cards support multiple temperature sensors per card. The environment condition output includes the temperature reading from each sensor and the temperature thresholds for each sensor. These line cards support three thresholds: warning, critical, and shutdown.
The following example illustrates how to display the environment condition on a supervisor module. The thresholds appear within parentheses.
Device# show environment
Number of Critical alarms: 0
Number of Major alarms: 0
Number of Minor alarms: 0
Slot Sensor Current State Reading Threshold(Minor,Major,Critical,Shutdown)
---- ------ ------------- ------- ---------------------------------------
R0 HotSwap: Volts Normal 53 V DC na
R0 HotSwap: Power Normal 231 Watts na
R0 Temp: Coretemp Normal 46 Celsius (107,117,123,125)(Celsius)
R0 Temp: DopplerD Normal 55 Celsius (107,117,123,125)(Celsius)
R0 V1: VX1 Normal 845 mV na
R0 V1: VX2 Normal 1499 mV na
R0 V1: VX3 Normal 1058 mV na
R0 V1: VX4 Normal 849 mV na
R0 V1: VX5 Normal 1517 mV na
R0 V1: VX6 Normal 1306 mV na
R0 V1: VX7 Normal 1007 mV na
R0 V1: VX8 Normal 1098 mV na
R0 V1: VX9 Normal 1205 mV na
R0 V1: VX10 Normal 1704 mV na
R0 V1: VX11 Normal 1208 mV na
R0 V1: VX12 Normal 1804 mV na
R0 V1: VX13 Normal 2518 mV na
R0 V1: VX14 Normal 3288 mV na
R0 Temp: outlet Normal 39 Celsius (55 ,65 ,75 ,100)(Celsius)
R0 Temp: inlet Normal 35 Celsius (45 ,55 ,65 ,72 )(Celsius)
The following example illustrates how to display the LED status on a supervisor module.
Device# show hardware led
Current Mode: STATUS
SWITCH: C9407R
SYSTEM: AMBER
SUPERVISOR: ACTIVE
STATUS: (10) Te3/0/1:BLACK Te3/0/2:BLACK Te3/0/3:BLACK Te3/0/4:BLACK Te3/0/5:BLACK Te3/0/6:BLACK Te3/0/7:BLACK Te3/0/8:BLACK Fo3/0/9:BLACK Fo3/0/10:BLACK
BEACON: BLACK
RJ45 CONSOLE: GREEN
FANTRAY STATUS: GREEN
FANTRAY BEACON: BLACK
POWER-SUPPLY 1 BEACON: BLACK
POWER-SUPPLY 3 BEACON: BLACK
Displaying On Board Failure Logging (OBFL) information
The OBFL feature records operating temperatures, hardware uptime, interrupts, and other important events and messages that can assist with diagnosing problems with line cards and supervisor modules installed in a switch. Data is logged to files stored in nonvolatile memory. When the onboard hardware is started up, a first record is made for each area monitored and becomes a base value for subsequent records. The OBFL feature provides a circular updating scheme for collecting continuous records and archiving older (historical) records, ensuring accurate data about the system. Data is recorded in one of two formats: continuous information that displays a snapshot of measurements and samples in a continuous file, and summary information that provides details about the data being collected. The data is displayed using the show logging onboard command. The message “No historical data to display” is seen when historical data is not available.
Device# show logging onboard RP active voltage detail
--------------------------------------------------------------------------------
VOLTAGE SUMMARY INFORMATION
--------------------------------------------------------------------------------
Number of sensors : 16
--------------------------------------------------------------------------------
Sensor ID Normal Range Maximum Sensor Value
--------------------------------------------------------------------------------
SYSTEM Rail-5.0 0 0 - 5 0
SYSTEM Rail-0.9PEX 1 0 - 5 1
SYSTEM Rail-0.9 2 0 - 5 1
SYSTEM Rail-1.8 3 0 - 5 0
SYSTEM Rail-3.3 4 0 - 5 1
SYSTEM Rail-2.5 5 0 - 5 1
SYSTEM Rail-1.5CPU 6 0 - 5 1
SYSTEM Rail-1.5 7 0 - 5 1
SYSTEM Rail-1.2 8 0 - 5 1
SYSTEM Rail-1.1 9 0 - 5 1
SYSTEM Rail-1.0 10 0 - 5 1
SYSTEM Rail-0.9CPU 11 0 - 5 1
SYSTEM Rail-0.85 12 0 - 5 2
SYSTEM Rail-0.85DOPv
13 0 - 5 3
SYSTEM Rail-0.85DOPv^N 14 0 - 5 5
SYSTEM Rail-0.85DOPv^O 15 0 - 5 0
--------------------------------------------------------------------------------
Sensor Value
Total Time of each Sensor
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
No historical data
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
VOLTAGE CONTINUOUS INFORMATION
--------------------------------------------------------------------------------
Sensor ID
--------------------------------------------------------------------------------
SYSTEM Rail-5.0 0
SYSTEM Rail-0.9PEX 1
SYSTEM Rail-0.9 2
SYSTEM Rail-1.8 3
SYSTEM Rail-3.3 4
SYSTEM Rail-2.5 5
SYSTEM Rail-1.5CPU 6
SYSTEM Rail-1.5 7
SYSTEM Rail-1.2 8
SYSTEM Rail-1.1 9
SYSTEM Rail-1.0 10
SYSTEM Rail-0.9CPU 11
SYSTEM Rail-0.85 12
SYSTEM Rail-0.85DOPv
13
SYSTEM Rail-0.85DOPv^N 14
SYSTEM Rail-0.85DOPv^O 15
--------------------------------------------------------------------------------
Time Stamp | Sensor Voltage 0V
MM/DD/YYYY HH:MM:SS | Sensor Value
--------------------------------------------------------------------------------
05/06/2015 16:42:51 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
05/06/2015 18:24:24 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
05/10/2015 17:53:42 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
08/30/2017 16:14:40 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
08/30/2017 23:34:24 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
08/31/2017 22:16:23 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
09/01/2017 00:57:15 0 1 1 0 1 1 1 1 1 1 1 1 2 3 5 0
--------------------------------------------------------------------------------
Emergency Actions
The chassis can power down a single card, providing a detailed response to over-temperature conditions on line cards. However, the chassis cannot safely operate when the temperature of the supervisor module itself exceeds the critical threshold. The supervisor module turns off the chassis’ power supplies to protect itself from overheating. When this happens, you can recover the switch only by cycling the power on and off switches on the power supplies or by cycling the AC or DC inputs to the power supplies.
Critical and shutdown temperature emergencies trigger the same action. The following table lists temperatureemergencies but does not distinguish between critical and shutdown emergencies.
Case 1. Complete fan failure emergency. |
SYSLOG message displays and the chassis shuts down. |
Case 2. Temperature emergency on a line card. |
Power down the line card. |
Case 3. Temperature emergency on a power supply. When critical or shutdown alarm threshold is exceeded, all the power supplies will shut down. |
Power cycle the device to recover from power supply shut down. |
Case 4. Temperature emergency on the active supervisor module. |
Power down the chassis. |
System Alarms
Any system has two types of alarms: major and minor. A major alarm indicates a critical problem that could lead to system shutdown. A minor alarm is informational—it alerts you to a problem that could become critical if corrective action is not taken.
The following table lists the possible environment alarms.
A temperature sensor over its warning threshold |
minor |
||
A temperature sensor over its critical threshold |
major |
||
A temperature sensor over its shutdown threshold |
major |
||
A partial fan failure |
minor |
||
A complete fan failure
|
major |
Fan failure alarms are issued as soon as the fan failure condition is detected and are canceled when the fan failure condition clears. Temperature alarms are issued as soon as the temperature reaches the threshold temperature. An LED on the supervisor module indicates whether an alarm has been issued.
When the system issues a major alarm, it starts a timer whose duration depends on the alarm. If the alarm is not canceled before the timer expires, the system takes emergency action to protect itself from the effects of overheating. The timer values and the emergency actions depend on the type of supervisor module.
Note |
Refer to the Hardware Installation Guide for information on LEDs, including the startup behavior of the supervisor module system LED. |
Event |
Alarm Type |
Supervisor LED Color |
Description and Action |
---|---|---|---|
Card temperature exceeds the critical threshold. |
Major |
Red |
Syslog message displays when the alarm is issued. |
Card temperature exceeds the shutdown threshold. |
Major |
Red |
Syslog message displays when the alarm is issued. |
Chassis temperature exceeds the warning threshold. |
Minor |
Orange |
Syslog message displays when the alarm is issued. |
Chassis fan tray experiences partial failure. |
Minor |
Orange |
Syslog message displays when the alarm is issued. |
Chassis fan tray experiences complete failure. |
Major |
Red |
Syslog message displays when the alarm is issued. |