Introduction
This document describes how to troubleshoot FAN module failure in NCS XR Platform.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
Note: Cisco recommends that you must have access to Cisco IOS® XR CLI and admin CLI.
Note: Cisco recommends that you must have access to Cisco IOS® XR CLI and admin CLI.
Components Used
The information in this document is based on these software and hardware versions:
This includes, but is not limited to, these series:
- NCS 540 Series
- NCS 560 Series
- NCS 5500 Series
- NCS 5700 Series
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
The Cisco NCS XR router series includes several platforms designed for different use cases and performance levels, each with distinct fan module types and system architectures:
• Cisco NCS 540 Series: This is a small-density XR router aimed at sub-100G bandwidth applications such as 5G NR backhaul, FTTx, and enterprise branch deployments. It uses fan modules with a 3+1 fan redundancy design and side-to-side forced air cooling. Power supplies are fixed with 1+1 AC/DC redundancy, and the system is ruggedized with conformal coating and supports Class C timing compliance.
• Cisco NCS 560 Series: This modular system includes three high-speed modular fan trays that must be populated for operation. These fan trays contain redundant fans and are field-serviceable, thus allowing replacement without system shutdown. The system supports operation with single fan failures and enforces time limits for reinsertion of fan trays based on ambient temperature. It also features a built-in dust filter to optimize airflow. Power supplies are modular with AC and DC options, supporting load-sharing and protection schemes.
• Cisco NCS 5500 Series: This high fault-resilient modular router platform is designed for data center and high-performance networking environments. It features modular, field-replaceable fan modules that support serviceability and redundancy. Troubleshooting involves system logs checks, hardware status, and manages software packages to maintain system stability. The platform supports Cisco IOS® XR software with modular packages and resiliency features.
• Cisco NCS 5700 Series: Building on the NCS 5500 platform, this series includes enhanced forwarding ASIC design and runs Cisco IOS® XR7 OS. It has variants such as NCS-57B1-6D24 and NCS-57B1-5DSE. The system is modular with field-replaceable fan trays and power supplies, and supports high availability and fault resilience. Fan trays are designed for redundancy and hot-swapping.Cisco IOS® XR7 OS provides advanced software features that monitors system and fault management.
Problem
Fan failures in Cisco NCS XR routers impact system cooling and reliability. The nature and severity of problems vary by platform due to differences in fan module design and serviceability. There are several models in the NCS 540 series which use fixed, non-field-replaceable fan modules with a 3+1 redundancy design. Here, fan failure typically requires service or replacement of the entire unit. This potentially leads to longer downtime and more complex troubleshooting.
The NCS 560, 5500, and 5700 series and few models in the NCS 540 series employ modular, field-replaceable fan trays designed for redundancy and hot-swapping. This allows for continued operation during single fan failures and enables easier maintenance without system shutdown.
Fan failures in these modular systems can trigger system alerts, require monitoring of ambient temperature constraints, and necessitate timely fan tray reinsertion to maintain optimal airflow and system stability. Overall, fan failure in NCS XR routers poses risks of overheating, degraded performance, and potential hardware damage. This necessitates prompt detection, diagnosis, and appropriate remedial actions tailored to the specific router series and fan architecture.
Procedure to Resolve FAN Module failure in NCS XR Platform
The troubleshooting procedure for fan module failures in NCS XR platforms generally outlines a consistent approach, with specific physical actions differing based on whether the model uses a fixed fan module or a modular fan tray.
Step 1. Initial CLI Verification
Login to the router in Cisco IOS® XR CLI and execute these commands to identify the status of fan trays and individual fans. These commands are common across all NCS XR platforms running Cisco IOS® XR.
Step 1.1Check Plarform Status: Run this command to identify if it is a FAN Tray failure or ,one or more FAN failure in a FAN Tray.
Sample Command Output:
RP/0/RP0/CPU0:N540X-12Z16G-SYS-D#show platform
Thu Jul 24 12:33:45.143
Node Type State Config state
--------------------------------------------------------------------------------
0/RP0/CPU0 N540X-12Z16G-SYS-D(Active) IOS XR RUN NSHUT
0/PM0 N540-PSU-FIXED-D OPERATIONAL NSHUT
0/PM1 N540-PSU-FIXED-D OPERATIONAL NSHUT
0/FT0 N540-FAN OPERATIONAL NSHUT
RP/0/RP0/CPU0:N540X-12Z16G-SYS-D#
Note: If all the FAN Trays are in "OPERATIONAL" , then you can conclude the FAN Tray works fine. Else if any FAN Tray is Non-Operational,it implies FAN Tray is in failed state.
Step 1.2.Identify Failed Fan Modules: Run this command to check the status and speed of individual fans within a fan tray.
Sample Command Output:
RP/0/RP0/CPU0:N540X-12Z16G-SYS-D#show environment fan
Thu Jul 24 12:33:09.673
=========================================================================================
Fan speed (rpm)
Location FRU Type FAN_0 FAN_1 FAN_2 FAN_3
-----------------------------------------------------------------------------------------
0/FT0 N540-FAN 25680 0 25440 26130
RP/0/RP0/CPU0:N540X-12Z16G-SYS-D#
Note: A value of `0` or significantly lower RPM values compared to other fans in the same tray can indicate a failed or failing fan.
Step 1.3.Verify Fan Module Failure from Alarms: Run this command to check system alarms for fan-related alarms.
Sample logs:
RP/0/RP0/CPU0:N540X-12Z16G-SYS-D#show alarms brief system active
Thu Jul 24 12:33:23.874
------------------------------------------------------------------------------------
Active Alarms
------------------------------------------------------------------------------------
Location Severity Group Set Time Description
------------------------------------------------------------------------------------
0/FT0 Minor Environ 07/24/2025 10:35:44 WIB Fan 1: Out of tolerance
0/FT0 Minor Environ 07/24/2025 10:35:44 WIB Sensor in failed state
0 Minor Environ 07/24/2025 10:35:44 WIB Sensor in failed state
RP/0/RP0/CPU0:N540X-12Z16G-SYS-D#
Note: Alarm messages indicating "Fan X: Out of tolerance" or "Sensor in failed state" confirm fan failures.
Note: Alarm messages indicating "Fan X: Out of tolerance" or "Sensor in failed state" confirm fan failures.
Step 2. Environmental and Physical Inspection
Environmental factors can significantly impact fan operation and overall system cooling.
-
Ambient Conditions:
- Verify ambient temperature and airflow around the router to ensure it is within operational limits. High temperatures can cause fans to work harder or fail prematurely.
- Check for any dust filters or air plenums that can be clogged or improperly installed, restricting airflow.
-
Physical Inspection for Obstructions/Damage:
- Inspect the fan module/tray for any visible debris, loose wiring, or obstructions that can prevent fans from spinning freely. Dust accumulation is a common cause of fan issues.
- For platforms withmodular fan trays(example, NCS 560, NCS 5500, NCS 5700, and some NCS 540 models), if safe to do so and within operational guidelines, consider carefully pulling out the suspected fan tray. Visually inspect the individual fans for non-spinning blades or visible damage. While the tray is out, check for dust buildup on the fans and within the chassis slot.
- For platforms withfixed fan modules (example, some NCS 540 models), a physical inspection of the fan module and connectors is limited but must still be performed for any external signs of damage or obstruction.
Step 3. Check for Known Issues and Bugs
Before proceeding with hardware replacement, it is advisable to check if the observed fan failure aligns with any known software or hardware bugs.
- Cisco Bug Search Tool:Search the Cisco Bug Search Tool (BST) using keywords such as "NCS XR fan failure," "NCS [model number] fan," and the specific Cisco IOS® XR version running on your device. Look for known issues that can cause fan misreporting or actual failures.
- Cisco Support Documentation:Review support documentation of Cisco and community forums for similar reported issues and recommended workarounds or fixes.
Step 4. Remedial Actions and Replacement
The next steps depend on the type of fan module in your NCS XR Platform.
For NCS XR Platforms with Fixed Fan Modules (example, some NCS 540 models)
Models with fixed fan modules are typically not hot-swappable.
- Power Cycle:If the initial checks and environmental adjustments do not resolve the issue, perform a power cycle of the router. This can sometimes clear transient issues and allow the fan module to re-initialize correctly.
- Replacement (RMA):If the fan module is confirmed failed after a power cycle, it typically requires a Return Merchandise Authorization (RMA) for the entire unit or chassis.
Note: Replacement of a fixed fan module requires planned downtime as the router must be powered down.
For NCS XR Platforms with Modular Fan Trays (example, most NCS 540, NCS 560, NCS 5500, NCS 5700 models)
These platforms feature hot-swappable modular fan trays.
-
Reseating (JACK-OUT and JACK-IN - JOJI):
- Carefully perform a JACK-OUT and JACK-IN (JOJI) procedure on the fan tray that contains the failed fan module(s). This involves physically removing the fan tray and then re-inserting it.
- While the fan tray is pulled out, conduct a thorough visual inspection for any debris or loose wiring that can be preventing fans from spinning. You can also observe if all fans attempt to spin upon re-insertion.
- After reseating, verify the status again using "show environment fan".
-
Replacement (RMA):If any of the fan module(s) are still in a failed state or the fan tray remains Non-Operational after reseating, proceed with an RMA for the fan tray.
- Collect Evidence Logs:Run "show logging | include FAN"again to capture logs related to the fan tray JOJI for documentation purposes.
Sample logs:
RP/0/RP0/CPU0:N540-24Z8Q2C-SYS# show logging | include FAN
0/RSP0/ADMIN0:Jul 12 01:39:25.215 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_REMOVAL : Location: 0/FT0, Serial#: N/A
0/RSP0/ADMIN0:Jul 12 01:39:26.522 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_INSERTION : Location: 0/FT0, Serial #: N/A
0/RSP0/ADMIN0:Jul 12 01:39:26.522 : shelf_mgr[4169]: %INFRA-SHELF_MGR-6-CARD_HW_OPERATIONAL : Card: 0/FT0 hardware state going to Operational
0/RSP0/ADMIN0:Jul 12 01:42:23.584 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_REMOVAL : Location: 0/FT0, Serial#: N/A
0/RSP0/ADMIN0:Jul 12 01:44:40.495 : shelf_mgr[4169]: %INFRA-SHELF_MGR-5-CARD_INSERTION : Location: 0/FT0, Serial #:N/A
0/RSP0/ADMIN0:Jul 12 01:44:40.495 : shelf_mgr[4169]: %INFRA-SHELF_MGR-6-CARD_HW_OPERATIONAL : Card: 0/FT0
- Collect Product ID (PID) and Serial Number (SN): Obtain the PID and SN of the faulty fan tray, which are required for the RMA process.
Sample Command Output:
Command Syntax:
RP/0/RP0/CPU0:N540-24Z8Q2C-SYS# show inventory location <location of failed FAN tray>
Sample command:
RP/0/RP0/CPU0:N540-24Z8Q2C-SYS# show inventory location 0/FT0
NAME: "0/FT0", DESCR: "NCS 540 Fan"
PID: N540-FAN , VID: N/A, SN: N/A
- Proceed with RMA:Initiate the RMA process with Cisco for the faulty fan tray.