Introduction
This document describes how to troubleshoot FAN module failure in ASR9k.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
Note: Cisco recommends that you must have access to Cisco IOS® XR CLI and admin CLI.
Components Used
The information in this document is based on these software and hardware versions:
- The ASR 9000 series encompasses a range of models, including the ASR 9001, ASR 9006, ASR 9010, ASR 9901, ASR 9906, ASR 9910, ASR 9912, and ASR 9922, among others.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
The Cisco ASR 9000 Series Aggregation Services Routers (ASR9k) are high-performance routers designed for service provider networks, offers scalability, reliability, and advanced features to support network environments demands. The ASR9k routers provide modular hardware architecture and allow flexible configuration and expansion to meet diverse network requirements.
ASR9k router family include:
• Modular Design: ASR9k routers feature modular components such as route processors, line cards, and fan trays, and enable easy upgrades and maintenance without network operations disruption.
• Cooling System: For example, the ASR 9001 model uses a single front-accessible fan tray containing redundant fans to ensure continuous cooling. The fan tray supports side-to-side airflow and, from software release 4.3.0 onwards, allows online insertion and removal (OIR) with certain ambient temperature restrictions, thus enahancing serviceability.
• High Availability: The ASR9k series supports redundant power supplies and fans, conttributes to high availability, and minimizes downtime.
• Performance and Scalability: Designed to handle large-scale aggregation and edge routing, ASR9k routers support high throughput and advanced routing protocols suitable for service provider core and edge networks.
• Software Features: The routers run Cisco IOS® XR software, which provides carrier-grade reliability, modularity, and programmability to support evolving network demands.
Problem
A fan module or fan tray failure in an ASR 9000 Series router can lead to inadequate cooling, resulting in the overheating of critical hardware components. This overheating can cause system instability, degraded performance, unexpected shutdowns, or permanent hardware damage, ultimately impacting network availability and service reliability. Given the critical role of the cooling system in maintaining device health, timely detection and mitigation of fan failures are essential to prevent network disruptions and maintain high availability in service provider environments.
Procedure to Resolve FAN Module failure in ASR9k
The procedure to troubleshoot fan module failures in ASR 9000 Series routers generally outlines a consistent approach across models, with specific physical actions differing based on whether the model uses a fixed fan module or a modular fan tray.
Step 1. Initial CLI Verification
Login to the router in Cisco IOS® XR CLI and execute the these commands to identify the status of fan trays and individual fans. These commands are common across all ASR 9000 platforms running Cisco IOS® XR.
Step 1.1 Check Plarform Status: Run this command to identify if it is a FAN Tray failure or ,one or more FAN failure in a FAN Tray.
Sample Command Output:
RP/0/RSP0/CPU0:ASR-9006#show platform
Wed Jul 16 12:16:00.408 IST
Node Type State Config state
--------------------------------------------------------------------------------
0/RSP0/CPU0 A9K-RSP5-SE(Active) IOS XR RUN NSHUT
0/RSP1/CPU0 A9K-RSP5-SE(Standby) IOS XR RUN NSHUT
0/FT0 ASR-9006-FAN-V2 OPERATIONAL NSHUT
0/FT1 ASR-9006-FAN-V2 OPERATIONAL NSHUT
0/0/CPU0 A9K-MOD200-SE IOS XR RUN NSHUT
0/0/0 A9K-MPA-20X1GE OK
0/1/CPU0 A9K-8X100GE-SE IOS XR RUN NSHUT
0/2/CPU0 A9K-MOD200-SE IOS XR RUN NSHUT
0/2/0 A9K-MPA-20X10GE OK
0/PT0 A9K-DC-PEM-V2 OPERATIONAL NSHUT
RP/0/RSP0/CPU0:ASR-9006#
Note: If all the FAN Trays are in "OPERATIONAL" , then you can conclude the FAN Tray works fine. Else if any FAN Tray is Non-Operational,it implies FAN Tray is in failed state.
Step 1.2. Identify Failed Fan Modules: Run this command to check the status and speed of individual fans within a fan tray.
Sample Command Output:
RP/0/RSP0/CPU0:ASR-9006#admin show environment fan
Wed Jul 16 12:16:09.843 IST
=============================================================================
Fan speed (rpm)
Location FRU Type FAN_0 FAN_1 FAN_2 FAN_3 FAN_4 FAN_5
-----------------------------------------------------------------------------
0/FT0 ASR-9006-FAN-V2 - 7710 7590 8970 7500 7530
0/FT1 ASR-9006-FAN-V2 7590 7560 7590 7590 7560 7560
0/PT0-PM0 PWR-2KW-DC-V2 8022 8559
0/PT0-PM1 PWR-2KW-DC-V2 6280 6237
0/PT0-PM2 PWR-2KW-DC-V2 7914 8559
0/PT0-PM3 PWR-2KW-DC-V2 7978 8516
RP/0/RSP0/CPU0:ASR-9006#
Note: A dash (`-`) or significantly lower RPM values compared to other fans in the same tray can indicate a failed or failing fan.
Step 1.3. Verify Fan Module Failure from Logs: Run this command to check system logs for fan-related alarms.
Sample logs:
RP/0/RSP0/CPU0:ASR-9006# show logging | include FAN
0/RSP0/ADMIN0:2025 Jul 10 07:52:41.797 IST: canbus_driver[4134]: %PLATFORM-CANB_SERVER-3-ALARM_INDICATION : Raise alarm from CBC0 in slot 0/FT0, alarm code CBC_ALRM_AT_LEAST_ONE_FAN_FAILED
0/RSP0/ADMIN0:2025 Jul 10 07:53:42.798 IST: canbus_driver[4134]: %PLATFORM-CANB_SERVER-3-ALARM_INDICATION : Raise alarm from CBC0 in slot 0/FT0, alarm code CBC_ALRM_AT_LEAST_ONE_FAN_FAILED
0/RSP0/ADMIN0:2025 Jul 10 07:54:43.800 IST: canbus_driver[4134]: %PLATFORM-CANB_SERVER-3-ALARM_INDICATION : Raise alarm from CBC0 in slot 0/FT0, alarm code CBC_ALRM_AT_LEAST_ONE_FAN_FAILED
0/RSP0/ADMIN0:2025 Jul 10 07:55:44.799 IST: canbus_driver[4134]: %PLATFORM-CANB_SERVER-3-ALARM_INDICATION : Raise alarm from CBC0 in slot 0/FT0, alarm code CBC_ALRM_AT_LEAST_ONE_FAN_FAILED
Step 2. Environmental and Physical Inspection
Environmental factors can significantly impact fan operation and overall system cooling.
-
Ambient Conditions:
- Verify ambient temperature and airflow around the router to ensure it is within operational limits. High temperatures can cause fans to work harder or fail prematurely.
- Check for any dust filters or air plenums that can be clogged or improperly installed, restricting airflow.
-
Physical Inspection for Obstructions/Damage:
- Inspect the fan module/tray for any visible debris, loose wiring, or obstructions that can prevent fans from spinning freely. Dust accumulation is a common cause of fan issues.
- For models withmodular fan trays(for example, ASR 9006, 9010, ASR 99xx), if safe to do so and within operational guidelines, carefully pull out the suspected fan tray. Visually inspect the individual fans for non-spinning blades or visible damage. While the tray is out, check for dust buildup on the fans and within the chassis slot.
- For models withfixed fan modules(for example, ASR 9001), a physical inspection of the fan module and connectors is limited but must still be performed for any external signs of damage or obstruction.
Step 3. Check for Known Issues and Bugs
Before proceeding with hardware replacement, it is advisable to check if the observed fan failure aligns with any known software or hardware bugs.
- Cisco Bug Search Tool:Search the Cisco Bug Search Tool (BST) using keywords such as "ASR 9000 fan failure," "ASR [model number] fan," and the specific Cisco IOS® XR version running on your device. Look for known issues that can cause fan misreporting or actual failures.
- Cisco Support Documentation:Review Cisco support documentation and community forums for similar reported issues and recommended workarounds or fixes.
Step 4. Remedial Actions and Replacement
The next steps depend on the type of fan module in your ASR 9000 Series router.
For ASR 9000 Series with Fixed Fan Modules (for example, ASR 9001) :
Models like the ASR 9001 have a fixed fan module that is not hot-swappable.
- Power Cycle:If the initial checks and environmental adjustments do not resolve the issue, perform a power cycle of the router. This can sometimes clear transient issues and allow the fan module to re-initialize correctly.
- Replacement (RMA):If the fan module is confirmed failed after a power cycle, it typically requires a Return Merchandise Authorization (RMA) for the entire chassis.
Note: Replacement of a fixed fan module requires planned downtime as the router must be powered down.
For ASR 9000 Series with Modular Fan Trays (for example, ASR 9006, ASR 9010, ASR 99xx models)
These models feature hot-swappable modular fan trays.
-
Reseating (JACK-OUT and JACK-IN - JOJI):
- Carefully perform a JACK-OUT and JACK-IN (JOJI) procedure on the fan tray that contains the failed fan module(s). This involves physically removing the fan tray and then re-inserting it.
- While the fan tray is pulled out, conduct a thorough visual inspection for any debris or loose wiring that can be preventing fans from spinning. You can also observe if all fans attempt to spin upon re-insertion.
- After reseating, verify the status again using "admin show environment fan".
-
Replacement (RMA):If the fan module(s) are still in a failed state or the fan tray remains Non-Operational after reseating, proceed with an RMA for the fan tray.
- Collect Evidence Logs:Run " show logging | include FAN " again to capture logs related to the fan tray JOJI for documentation purposes.
Sample logs:
RP/0/RSP0/CPU0:ASR-9006# show logging | include FAN
0/RSP0/ADMIN0:Jul 12 01:39:25.215 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_REMOVAL : Location: 0/FT0, Serial#:FOC222XXX
0/RSP0/ADMIN0:Jul 12 01:39:26.522 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_INSERTION : Location: 0/FT0, Serial #:FOC222XXX
0/RSP0/ADMIN0:Jul 12 01:39:26.522 : shelf_mgr[4169]: %INFRA-SHELF_MGR-6-CARD_HW_OPERATIONAL : Card: 0/FT0 hardware state going to Operational
0/RSP0/ADMIN0:Jul 12 01:42:23.584 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_REMOVAL : Location: 0/FT0, Serial#:FOC222XXX
0/RSP0/ADMIN0:Jul 12 01:44:40.495 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_INSERTION : Location: 0/FT0, Serial #:FOC222XXX
0/RSP0/ADMIN0:Jul 12 01:44:40.495 : shelf_mgr[4169]: %INFRA-SHELF_MGR-6-CARD_HW_OPERATIONAL : Card: 0/FT0
- Collect Product ID (PID) and Serial Number (SN): Obtain the PID and SN of the faulty fan tray, which are required for the RMA process.
Sample Command Output:
Command Syntax:
RP/0/RSP0/CPU0:ASR-9006# show inventory location <location of failed FAN tray>
Sample command:
RP/0/RSP0/CPU0:ASR-9006# show inventory location 0/FT0
NAME: "0/FT0", DESCR: "ASR-9006 Fan Tray V2"
PID: ASR-9006-FAN-V2 , VID: V02, SN: FOC222XXX
- Proceed with RMA:Initiate the RMA process with Cisco for the faulty fan tray.