Overview of the Cisco cBR Router Thermal Monitoring

The Cisco cBR routers are equipped with a comprehensive thermal monitoring system. You can enable the thermal protection shutdown of the system for critical thermal sensors. The routers have preconfigured thermal shutdown levels and general alarm levels for all the sensors. The values indicated in this document are specific to the product IDs referenced and apply to Cisco IOS-XE 3.18.0S and later.

The main cooling system of the Cisco cBR comprises of five fan modules at the rear of the chassis with two fans placed within each module. Additionally, the power modules have two internal fans each for self-cooling.

The Cisco cBR does not power up if all the five fan modules are not installed. The fan modules do not have to be 100% functional but they must all be present. The fan modules (functional or not) are required to be installed in the chassis at startup to seal the chassis fan bay slot and prevent recirculation to the functional fans.

When an individual fan fails or the complete fan module is removed, the system continues to run and post alarms. The system does not power down because of missing or failed fans. Alarms are posted for failed or removed fan modules and any thermal event that occurs at the line card level. The system relies on the line card level thermal monitoring of the critical sensors and any enabled system thermal protection to shut down a line card.

Overview of the Cisco cBR AC and DC Power Supplies Thermal Sensors

The Cisco cBR AC Power Supply (cBR-AC-PS) and Cisco cBR DC Power Supply (cBR-DC-PS) (CBR8 power supplies) are equipped with over temperature protection shutdown. This safety feature CANNOT be turned off and is configured by the manufacturer of the power supply. The power supplies have numerous sensors placed throughout to prevent thermal run away. If a power supply sensor reaches its over temperature limit the power supply will shut itself down.

The released power supplies have only the inlet and outlet temperature sensors that are read by IOS. The power supply inlet sensor has a preconfigured over temperature protection shutdown limit set at 65C. With sensor tolerances an over temperature shutdown can occur as low as 60C TRUE facility inlet temperature. This setting is above the recommended operational range for the CBR8 product.

If there are not enough power supply modules functioning it can cause a system to shutdown the linecards first, then secondly the entire system.

Enabling Thermal Shutdown

You can configure the Cisco cBR system to protect the major power consuming chips in the chassis that reside on the front line cards. When you enable the thermal shutdown configuration, the chassis shuts down the line cards when the major heat generating chips reach their design limits listed in the tables in the section Critical Thermal Sensor Identifiers and Temperature Limit Set-Points for Specific Line Cards. Only the line card with a thermal event is shutdown.

To enable the thermal shutdown feature, complete the following procedure:

router#configure terminal  
router(config)#facility-alarm critical exceed-action shutdown  

The primary supervisor is an exception to the thermal shutdown configuration. When the thermal shutdown feature is enabled and the primary supervisor has a thermal event that exceeds the shutdown limit, all the front line cards in the chassis are shut down but the primary supervisor continues to run and provide telemetry until the failure or event is cleared or the system completely shuts down due to the thermal event.

The thermal shutdown configuration affects only the front line cards and does not affect the rear physical interface card (PIC), power supplies, or fan modules.

Recovering a Card After a Thermal Shutdown

You must clear the thermal events on the line card and the primary supervisor before the line card comes back online.

After a thermal shutdown event, you must review the alarms, system and facility temperatures, and failure logs to determine the root cause of the thermal event and correct it.

Once a line card is placed into the thermal shutdown state, there are three ways to recover the line card:

  • Issue the hw-module slot x reload command—You can issue this command and try to bring the line card back online. This command must be issued twice. The first issue of the command resets the line card and the line card will default to be offline. The second issue of the command allows the card to boot up if the thermal event alarm is cleared.

  • Online insertion and removal (OIR) of the card or a complete card replacement—This procedure allows the line card slot to boot up. You must perform this procedure twice or along with the hw-module slot x reload command. Before OIR or card replacement, you must issue the hw-module slot x reload command. After the card is reset, you can OIR or replace the card. On reinserting the card it boots up normally if the thermal event is cleared.

  • Reboot or power cycle of the entire chassis—This procedure also clears the thermal shutdown alarm. If the thermal conditions persist upon rebooting, the line cards shift into the shutdown state.

Viewing the System Component Temperatures

To view the system component temperatures, use the show env | inc Temp command.

The following example shows the output for the show env | inc Temp command.

router#show env | inc Temp  
 5/1   Temp: RTMAC      Normal           40 Celsius
 5/1   Temp: INLET      Normal           31 Celsius
 5/1   Temp: OUTLET     Normal           31 Celsius
 5/1   Temp: MAX6697    Normal           54 Celsius
 5/1   Temp: TCXO       Normal           37 Celsius
 5/1   Temp: SUP_OUT    Normal           54 Celsius
 5/1   Temp: 3882_1 P   Normal           46 Celsius
 5/1   Temp: 3882_2 P   Normal           40 Celsius
 5/1   Temp: 3882_3 P   Normal           46 Celsius
 5/1   Temp: INLET PD   Normal           28 Celsius

The output displays the slot location, sensor name, status, and temperature.

Viewing Temperature Sensor Alarm Status

To view the posted temperature sensor alarms, use the show facility-alarm status command to show all the alarms in the system. This command also shows if the fan modules are installed and functioning properly and other important alarm states related to a thermal event.

The following example shows the output for the show facility-alarm status command.

router#show facility-alarm status  
System Totals  Critical: 11  Major: 1  Minor: 0

Source                     Time                   Severity      Description [Index]
------                     ------                 --------      -------------------
Temp: Outlet P0/4          Apr 18 2016 13:05:12   INFO          Temp Above Normal [4]
Temp: Outlet P1/4          Apr 18 2016 13:05:12   INFO          Temp Above Normal [4]
Temp: Outlet P2/4          Apr 18 2016 13:05:12   INFO          Temp Above Normal [4]
Power Supply Bay 1         Apr 18 2016 13:05:12   INFO          Power Supply/FAN Module Missing [2]
Power Supply Bay 3         Apr 18 2016 13:05:12   INFO          Power Supply/FAN Module Missing [2]
Power Supply Bay 4         Apr 18 2016 13:05:12   INFO          Power Supply/FAN Module Missing [2]
Power Supply Bay 5         Apr 18 2016 13:05:12   INFO          Power Supply/FAN Module Missing [2]
Fan Slot 0                 Apr 18 2016 13:05:12   CRITICAL      Fan Tray Module Missing [0]
Fan Slot 0                 Apr 18 2016 13:05:12   CRITICAL      System shutdown will occur in few min [1]
Fan Slot 1                 Apr 18 2016 13:05:12   CRITICAL      Fan Tray Module Missing [0]
Fan Slot 1                 Apr 18 2016 13:05:12   CRITICAL      System shutdown will occur in few min [1]
Fan Slot 2                 Apr 18 2016 13:05:12   CRITICAL      Fan Tray Module Missing [0]
Fan Slot 2                 Apr 18 2016 13:05:12   CRITICAL      System shutdown will occur in few min [1]
Fan Slot 3                 Apr 18 2016 13:05:12   CRITICAL      Fan Tray Module Missing [0]
Fan Slot 3                 Apr 18 2016 13:05:12   CRITICAL      System shutdown will occur in few min [1]
Fan Slot 4                 Apr 18 2016 13:05:12   CRITICAL      Fan Tray Module Missing [0]
Fan Slot 4                 Apr 18 2016 13:05:12   CRITICAL      System shutdown will occur in few min [1]
Cable3/0-MAC2              Apr 18 2016 13:07:05   INFO          Physical Port Administrative State Down [1]
Cable3/0-MAC4              Apr 18 2016 13:07:05   INFO          Physical Port Administrative State Down [1]
sup 0                      Apr 18 2016 13:05:13   MAJOR         Unknown state [0]
TenGigabitEthernet5/1/3    Apr 18 2016 19:13:17   CRITICAL      Physical Port Link Down [35]
SFP+ container 5/1/4       Apr 18 2016 13:05:23   INFO          Transceiver Missing [0]
SFP+ container 5/1/5       Apr 18 2016 13:05:23   INFO          Transceiver Missing [0]
SFP+ container 5/1/6       Apr 18 2016 13:05:23   INFO          Transceiver Missing [0]
SFP+ container 5/1/7       Apr 18 2016 13:05:23   INFO          Transceiver Missing [0]

To view only the thermal alarms, use the show facility-alarm status | inc Temp command.

The following example shows the output for the show facility-alarm status | inc Temp command.

router#show facility-alarm status | inc Temp  
Temp: Outlet P0/4          Apr 18 2016 13:05:12   INFO          Temp Above Normal [4]
Temp: Outlet P1/4          Apr 18 2016 13:05:12   INFO          Temp Above Normal [4]
Temp: Outlet P2/4          Apr 18 2016 13:04:12   INFO          Temp Above Normal [4]
Temp: U18 P10/1            Apr 18 2016 13:03:12   INFO          Temp Above Normal [4]
Temp: U17 P11/1            Apr 18 2016 13:01:12   INFO          Temp Above Normal [4]

Setting Up SNMP Traps for Temperature Alarms

You can send alarms to a trap server and capture using the snmp-server enable traps alarms informational command. This command enables traps for all the alarms and not only the thermal alarms. There is no command to enable traps only for the thermal alarm. You cannot configure the system to set temperature alarm values as the thresholds are preprogramed on the line cards.

Below is an example of a thermal alarm trap. This example shows a critical alarm for slot 5 BB_DIE on the CBR-SUP-160G card.

Received SNMPv1 Trap:
Community: public
Enterprise: ciscoEntityAlarmMIBNotificationsPrefix
Agent-addr: 10.0.10.10
Enterprise Specific trap.
Enterprise Specific trap: 1
Time Ticks: 362693
ceAlarmHistEntPhysicalIndex.74 = 60142
ceAlarmHistAlarmType.74 = 2
ceAlarmHistSeverity.74 = critical(1)
ceAlarmHistTimeStamp.74 = 362692
ceAlarmDescrText.9.2 = Temp Above Normal

In the example above, you can find “ceAlarmHistEntPhysicalIndex.74 = 60142” in the trap. Using this index you can get the detail descriptor in the SNMP table “entPhysicalDescr”.

For example, entPhysicalDescr.60142 = Temp: BB_DIE.

Each thermal alarm trap contains its PhysicalIndex. For example, for BB_DIE: ceAlarmHistEntPhysicalIndex.74 = 60142, its index starts with 6xxxx. For the front linecards, cylons, and SUP, cylons and SUP, the PhysicalIndex start from (slot+1)*10,000.

In this example, the equation is (slot5+1)*10,000 = 6000, which means the sensor that has sent the alarm is in slot 5.

Once you get the sensor's PhysicalIndex that starts from (slot+1)*10,000, use the above formula to get the slot number, and search for the PhysicalIndex in the entPhysicalDescr table to get its descriptor.

Each line card and type has a different entPhysicalDescr table. Always reference the entPhysicalDescr table for your specific line card that an alarm is originating from.

The following are some examples of entPhysicalDescr tables.

  • Example entPhysicalDescr for temperature sensors for power supplies installed in bay P0 and P5. P1-P4 bays are empty.

    entPhysicalDescr.1000 = Power Supply Bay
    entPhysicalDescr.1001 = Cisco cBR CCAP AC Power Supply
    entPhysicalDescr.1002 = PEM Iout
    entPhysicalDescr.1003 = PEM Vout
    entPhysicalDescr.1004 = PEM Vin
    entPhysicalDescr.1005 = Temp: INLET
    entPhysicalDescr.1006 = Temp: OUTLET
    entPhysicalDescr.1020 = Power Supply Bay
    entPhysicalDescr.1040 = Power Supply Bay
    entPhysicalDescr.1060 = Power Supply Bay
    entPhysicalDescr.1080 = Power Supply Bay
    entPhysicalDescr.1100 = Power Supply Bay
    entPhysicalDescr.1101 = Cisco cBR CCAP AC Power Supply
    entPhysicalDescr.1102 = PEM Iout
    entPhysicalDescr.1103 = PEM Vout
    entPhysicalDescr.1104 = PEM Vin
    entPhysicalDescr.1105 = Temp: INLET
    entPhysicalDescr.1106 = Temp: OUTLET
    
    
  • Example entPhysicalDescr for temperature sensors for fan modules installed in P10-P14 bays.

    entPhysicalDescr.2000 = Fan Slot
    entPhysicalDescr.2001 = Cisco cBR Fan Assembly
    entPhysicalDescr.2002 = Temp: U17
    entPhysicalDescr.2003 = Temp: U18
    entPhysicalDescr.2004 = Temp: FC 
    entPhysicalDescr.2005 = MPL115A
    entPhysicalDescr.2012 = Fan
    entPhysicalDescr.2013 = Fan
    entPhysicalDescr.2020 = Fan Slot
    entPhysicalDescr.2021 = Cisco cBR Fan Assembly
    entPhysicalDescr.2022 = Temp: U17
    entPhysicalDescr.2023 = Temp: U18
    entPhysicalDescr.2024 = Temp: FC 
    entPhysicalDescr.2025 = MPL115A
    entPhysicalDescr.2032 = Fan
    entPhysicalDescr.2033 = Fan
    entPhysicalDescr.2040 = Fan Slot
    entPhysicalDescr.2041 = Cisco cBR Fan Assembly
    entPhysicalDescr.2042 = Temp: U17
    entPhysicalDescr.2043 = Temp: U18
    entPhysicalDescr.2044 = Temp: FC 
    entPhysicalDescr.2045 = MPL115A
    entPhysicalDescr.2052 = Fan
    entPhysicalDescr.2053 = Fan
    entPhysicalDescr.2060 = Fan Slot
    entPhysicalDescr.2061 = Cisco cBR Fan Assembly
    entPhysicalDescr.2062 = Temp: U17
    entPhysicalDescr.2063 = Temp: U18
    entPhysicalDescr.2064 = Temp: FC 
    entPhysicalDescr.2065 = MPL115A
    entPhysicalDescr.2072 = Fan
    entPhysicalDescr.2073 = Fan
    entPhysicalDescr.2080 = Fan Slot
    entPhysicalDescr.2081 = Cisco cBR Fan Assembly
    entPhysicalDescr.2082 = Temp: U17
    entPhysicalDescr.2083 = Temp: U18
    entPhysicalDescr.2084 = Temp: FC 
    entPhysicalDescr.2085 = MPL115A
    entPhysicalDescr.2092 = Fan
    entPhysicalDescr.2093 = Fan
    
    
  • Example entPhysicalDescr for temperature sensors for CBR-SUP-160G installed in slot 4.

    entPhysicalDescr.50000 = Cisco cBR CCAP Supervisor Card
    entPhysicalDescr.50001 = Cisco cBR CCAP Supervisor MB
    entPhysicalDescr.50141 = Temp: Y0_DIE
    entPhysicalDescr.50142 = Temp: BB_DIE
    entPhysicalDescr.50143 = Temp: VP_DIE
    entPhysicalDescr.50144 = Temp: RT-E_DIE
    entPhysicalDescr.50145 = Temp: INLET_1
    entPhysicalDescr.50146 = Temp: INLET_2
    entPhysicalDescr.50147 = Temp: OUTLET_1
    entPhysicalDescr.50148 = Temp: 3882_1
    entPhysicalDescr.50149 = Temp: 3882_2
    entPhysicalDescr.50150 = Temp: 3882_2A
    entPhysicalDescr.50151 = Temp: 3882_2B
    entPhysicalDescr.50152 = Temp: 3882_3
    entPhysicalDescr.50153 = Temp: 3882_3A
    entPhysicalDescr.50154 = Temp: 3882_3B
    entPhysicalDescr.50155 = Temp: 3882_4
    entPhysicalDescr.50156 = Temp: 3882_4A
    entPhysicalDescr.50157 = Temp: 3882_4B
    entPhysicalDescr.50158 = Temp: 3882_5
    entPhysicalDescr.50159 = Temp: 3882_5A
    entPhysicalDescr.50160 = Temp: 3882_5B
    entPhysicalDescr.50161 = Temp: 3882_6
    entPhysicalDescr.50162 = Temp: 3882_6A
    entPhysicalDescr.50163 = Temp: 3882_6B
    entPhysicalDescr.50164 = Temp: 3882_7
    entPhysicalDescr.50165 = Temp: 3882_8
    entPhysicalDescr.50166 = Temp: 3882_9
    entPhysicalDescr.50167 = Temp: 3882_9A
    entPhysicalDescr.50168 = Temp: 3882_9B
    entPhysicalDescr.50169 = Temp: 3882_10
    entPhysicalDescr.50170 = Temp: 3882_10A
    entPhysicalDescr.50171 = Temp: 3882_10B
    entPhysicalDescr.50172 = Temp: 3882_11
    entPhysicalDescr.50173 = Temp: 3882_11A
    entPhysicalDescr.50174 = Temp: 3882_11B
    entPhysicalDescr.50175 = Temp: 8314_1
    entPhysicalDescr.50176 = Temp: 8314_2
    entPhysicalDescr.50177 = Temp: 3536_1A
    entPhysicalDescr.50178 = Temp: 3536_1B
    entPhysicalDescr.50179 = Temp: AS_DIE
    entPhysicalDescr.50182 = SUP_dSUM
    
    
  • Example entPhysicalDescr for temperature sensors for CBR-LC-8D30-16U30 installed in slot 7.

    entPhysicalDescr.80000 = Cisco cBR CCAP Line Card
    entPhysicalDescr.80001 = Cisco cBR CCAP Line Card
    entPhysicalDescr.80014 = Temp: CAPRICA
    entPhysicalDescr.80015 = Temp: BASESTAR
    entPhysicalDescr.80016 = Temp: RAIDER
    entPhysicalDescr.80017 = Temp: CPU
    entPhysicalDescr.80018 = Temp: INLET
    entPhysicalDescr.80019 = Temp: OUTLET
    entPhysicalDescr.80020 = Temp: DIGITAL
    entPhysicalDescr.80021 = Temp: UPX
    entPhysicalDescr.80022 = Temp: LEOBEN1
    entPhysicalDescr.80023 = Temp: LEOBEN2
    
    

Supported SNMP MIBS

The following SNMP MIBs are supported for thermal sensors:

  • CISCO-ENVM ON-MIB

  • CISCO-ENTITY-ALARM-MIB

  • ENTITY-SENSOR-MIB

  • ENTITY-MIB

SUP_dSUM Alarm

SUP_dSUM alarm is an alarm on the supervisor motherboard of CBR-SUP-250G, CBR-CCAP-SUP-160G, and CBR-CCAP-SUP-60G line cards. This alarm is a summation of values of the sensors spread across the motherboard. SUP_dSUM alarm is triggered in cases such as open slots in a chassis that have not been filled properly upon removal of the cards. This alarm is not a temperature sensor alarm but a warning to inspect the system. The SUP_dSUM alarm is displayed in the facility alarm status output and in the SNMP trap notifications.

Below is an example of the SUP_dSUM alarm:

router#sho facility-alarm status | inc SUP  
SUP_dSUM R0/192   Jan 19 2016 10:48:32    Critical   CHECK FOR OPEN SLOTS & BLOCKED AIR INTAKE [9]

Critical Thermal Sensor Identifiers and Temperature Limit Set-Points for Specific Line Cards


Note

This section contains only the important sensors and their alarm limits. It does not list all sensors that are displayed when you run a query with temperature as the criterion.


CBR-SUP-250G: Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Sensor Name

Minor

Major

Critical

Shutdown

System Response

Temp: VP_DIE

NA

60

65

72

Power Down Card

Temp: MB_IN_1

42

47

52

57

Power Down Card

Temp: MB_IN_2

42

47

52

57

Power Down Card

Temp: AS_DIE

82

90

95

103

Power Down Card

Temp: Y1_DIE

NA

60

65

72

Power Down Card

Temp: Y2_DIE

NA

60

65

72

Power Down Card

Temp: Y3_DIE

NA

60

65

72

Power Down Card

Temp: Y0_DIE

NA

60

65

72

Power Down Card

Temp: Falcon_DIE

NA

63

70

77

Just Alarm

SUP_dSUM

NA

NA

75

NA

Alarm/UI

Temp: MB_OUT_1

75

80

85

NA

Just Alarm

Temp: VP_CHIP

80

85

90

98

Power Down Card

Temp: Falcon_CHIP

80

85

90

98

Power Down Card

Temp:CPU_C0

NA

75

85

91

Power Down Card

Temp:CPU_C1

NA

75

85

91

Power Down Card

Temp:CPU_C2

NA

75

85

91

Power Down Card

Temp:CPU_C3

NA

75

85

91

Power Down Card

Temp:CPU_C4

NA

75

85

91

Power Down Card

Temp:CPU_C5

NA

75

85

91

Power Down Card

Temp:CPU_C6

NA

75

85

91

Power Down Card

Temp:CPU_C7

NA

75

85

91

Power Down Card

CBR-CCAP-SUP-160G/CBR-CCAP-SUP-60G Motherboard Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Card

IOS Sensor Name

Minor Limit

Major Limil

Critical Limit

Shutdown Limit

System Response At Shutdown Limit (If Enabled)

CBR-CCAP-SUP-160G

/CBR-CCAP-SUP-60G

Temp: VP_DIE

NA

60

65

72

Power Down Card

Temp: MB_IN_1

42

47

52

57

Power Down Card

Temp: MB_IN_2

42

47

52

57

Power Down Card

Temp: AS_DIE

82

90

95

103

Power Down Card

Temp: Y0_DIE

NA

60

65

72

Power Down Card

Temp: BB_DIE

NA

60

65

72

Power Down Card

Temp: RT-E_DIE

NA

63

70

77

Power Down Card

Temp: MB_OUT_1

75

80

87

NA

Alarm Only

CBR-CCAP-SUP-160G Daughterboard Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Card

IOS Sensor Name

Minor Limit

Major Limit

Critical Limit

Shutdown Limit

System Response At Shutdown Limit (If Enabled)

CBR-CCAP-SUP-160G

Temp: Y1_DIE

NA

60

65

72

Power Down Card

Temp: Y2_DIE

NA

60

65

72

Power Down Card

Temp: Y3_DIE

NA

60

65

72

Power Down Card

CBR-2X100G-PIC: Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Sensor Name

Minor

Major

Critical

Shutdown

System Response

Temp:INLET

50

60

70

NA

Just alarm

Temp:RT_OUT

70

80

90

NA

Just alarm

Temp:MB_OUT

85

95

105

NA

Just alarm

Temp: SUP_OUT

55

65

75

NA

Just alarm

CBR-CCAP-LC-40G Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Card

IOS Sensor Name

Minor Limit

Major Limit

Critical Limit

Shutdown Limit

System Response at Shutdown Limit (If Enabled)

CBR-CCAP-LC-40G

Temp: CAPRICA

NA

75

80

85

Power Down Card

Temp: BASESTAR

80

85

90

98

Power Down Card

Temp: RAIDER

80

85

90

98

Power Down Card

Temp: CPU

80

85

90

95

Power Down Card

Temp: INLET

42

47

52

57

Power Down Card

Temp: UPX

NA

70

75

NA

Alarm Only

Temp: LEOBEN1

NA

70

75

NA

Alarm Only

Temp: LEOBEN2

NA

73

78

85

Power Down Card

CBR-FAN-ASSEMBLY Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Card

IOS Sensor Name

Minor Limit

Major Limit

Critical Limit

Shutdown Limit

System Response at Shutdown Limit (If Enabled)

CBR-FAN-ASSEMBLY

Temp: U17

60

65

70

NA

Alarm Only

Temp: U18

60

65

70

NA

Alarm Only

CBR-XX-PS Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Card

IOS Sensor Name

Minor Limit

Major Limit

Critical Limit

Shutdown Limit

System Response at Shutdown Limit (If Enabled)

CBR-XX-PS

Temp: INLET

50

55

60

65

Alarm Only

Temp: OUTLET

60

65

70

NA

Alarm Only

CBR-DPIC-8X10G Specific Thermal Sensor Identifier and Temperature Limit Set-Point

Card

IOS Sensor Name

Minor Limit

Major Limit

Critical Limit

Shutdown Limit

System Response at Shutdown Limit (If Enabled)

CBR-DPIC-8X10G

Temp: INLET

55

60

75

NA

Alarm only

Temp: ZYNQ_OUTLET

70

80

90

NA

Alarm only

Temp: SWT_OUTLET

70

80

90

NA

Alarm only

Temp: PHY_OUTLET

70

80

90

NA

Alarm only

Temp: SFP_OUTLET/ OUTLET

70

80

90

NA

Alarm only