Valuable time and resources are often wasted to replace hardware that actually functions properly. This document helps troubleshoot common hardware issues with the Cisco 7500 Series Router and, more specifically, its Route Switch Processor (RSP) card. This document provides pointers for the identification of faulty hardware.
Note: This document does not cover any software-related failures except for those that are often mistaken as hardware issues.
The information in this document is based on these software and hardware versions:
All Cisco IOS® software releases
These RSPs in any of the 7500 Series Routers that include the 7505, 7507, 7513, and 7576:
*Supported on 7505, 7507, and 7513. RSP16 is not supported on the 7576.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
The Cisco 7500 series router has at least one RSP and between one and 11 interface processors (Legacy IP or Versatile Interface Processor - VIP).
The RSP handles the main functionalities of the router. It is responsible for handling routing protocol algorithms, packet switching in non-distributed environments, higher-level features, and so forth. The Interface Processors (IPs and VIPs) contain the network interfaces for the router. RSPs can only go into certain slots within the 7500 Series Router as outlined, slot numbering starts at 0:
2 and 3
6 and 7
6 and 7
Note that for the 7507, 7513, and 7576, the lower and higher slot numbers are referred to as the Primary RSP slot and the Secondary RSP slot, respectively.
There are six different versions of the RSPs used for the Cisco 7500 Series Routers:
Router Switch Processor Type
Contains a (MIPS) R4600 CPU that runs at 100 MHz internally, 50 MHz external bus speed and supports memory options from 16 MBs to 128 MBs
Contains a MIPS R4600 CPU that runs at 100 MHz internally, 50 MHz external bus speed, and supports memory options from 32 MBs to 128 MBs
Contains a MIPS R5000 CPU that runs at 200 MHz internally, 100 MHz external bus speed, and supports memory options from 32 MBs to 256 MBs
This RSP is identical to the RSP4 except that it has Error-Correcting Code (ECC) memory protection/correction and an updated version of ROMMON
Contains a MIPS R7000 CPU that runs at 250 MHz internally, 100 MHz external bus speed, and supports memory options from 64 MBs to 256 MBs
Contains a MIPS R7000 CPU that runs at 500 MHz internally, 100 MHz external bus speed, and supports memory options from 64 MBs to 1 GB of Synchronous Dynamic RAM (SDRAM). The RSP16 supports Error-Correcting Code (ECC) in addition to 2 MBs of Statis RAM (SRAM) for Layer 3 cache.
Use the Cisco Download Software Area (registered customers only) in order to check the minimum amount of memory (RAM and Flash) required by the Cisco IOS software, and/or download the Cisco IOS software image. Refer to Memory Requirements in order to determine the amount of memory (RAM and Flash) installed.
In the Cisco Download Software Area (registered customers only) , you need to select the platform and the recommended Cisco IOS software release from step 1 in order to view the memory requirements.
The Error Message Decoder (registered customers only) Tool allows you to check the definition of an error message. Error messages appear on the console of Cisco products, usually in this form:
%XXX-n-YYYY : [text]
This is an example of an error message:
Router# %SYS-2-MALLOCFAIL: Memory allocation of [dec] bytes failed from [hex], pool [chars], alignment [dec]
Some error messages are informational only, while others indicate hardware or software failures and require action. The Error Message Decoder (registered customers only) Tool provides an explanation of the message, a recommended action, if needed, and if available, a link to a document that provides extensive troubleshooting information about that error message.
This show log output shows the low memory error message %SYS-2-MALLOCFAIL: due to the process BGP Router. Verify the show processes memory and show memory summary output in order to verify the memory usage by the BGP process.
Router#show log%SYS-2-MALLOCFAIL: Memory allocation of 32768 bytes failed from 0x403B4650, alignment 0
Pool: Processor Free: 406936 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "BGP Router", ipl= 0, pid= 158
-Traceback= 403B96D0 403BD8BC 403B4658 40DF73C0 402476FC 4064FA10 4061C840 406268A0 40626A4C 40816EC4 408102B0 40ED0820 408103C0 407D46A8
Jun 30 10:27:40.836 UTC: %FIB-3-NORPXDRQELEMS: Exhausted XDR queuing elements while preparing message for slot 4
-Process= "BGP Router", ipl= 0, pid= 158
-Traceback= 40DF74A0 402476FC 4064FA10 4061C840 406268A0 40626A4C 40816EC4 408102B0 40ED0820 408103C0 407D46A8
%BGP-5-ADJCHANGE: neighbor 10.10.10.254 Down BGP Notification sent
%BGP-3-NOTIFICATION: sent to neighbor 10.10.10.254 4/0 (hold time expired) 0 bytes
%BGP-5-ADJCHANGE: neighbor 10.10.10.99 Down BGP Notification sent
%BGP-3-NOTIFICATION: sent to neighbor 10.10.10.99 4/0 (hold time expired) 0 bytes
%BGP-5-ADJCHANGE: neighbor 10.10.10.100 Down BGP Notification sent
%BGP-3-NOTIFICATION: sent to neighbor 10.10.10.100 4/0 (hold time expired) 0 bytes
%BGP-5-ADJCHANGE: neighbor 10.10.10.254 Up
Router#show processes memory
Processor Pool Total: 229224896 Used: 198433716 Free: 30791180
Fast Pool Total: 131072 Used: 131024 Free: 48
!--- Output suppressed.
Router#show memory summary
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 42564E40 229224896 198457508 30767388 22200 196700
Fast 42544E40 131072 131024 48 48 48
In the previous output, the largest available block in Processor memory is 196700. Total free memory is 30767388. The Router needs more than 40 MB of free memory in order to accommodate BGP transient memory usage for neighbors coming up/down. In this scenario, you need to consider to upgrade memory or set up BGP filters or BGP reconfiguration in order to minimize the routing table. This is an example of low memory issues on routers.
An RSP can reboot or reload for various reasons. Several of these are due to potential hardware issues. You can find information on how to capture different types of output that can be helpful for troubleshooting and identifying symptoms that mislead caused by bad hardware. Troubleshooting tips for the symptoms are listed in the Troubleshooting Guidelines section.
The first step is to capture as much information about the problem as possible in order to determine what causes the issue. This information is essential in order to determine the cause of the problem:
Crashinfo file(s)—When a Route Switch Processor (RSP) crashes, it attempts to save a crashinfo file into its bootflash. Refer to Retrieving Information from the Crashinfo File for details about crashinfo files. Note that if the router has dual RSPs, the crashinfo file might be on the standby RSP bootflash if the standby RSP crashed when it was a primary RSP. Usually, if the process of the creation of a crashinfo file is successful, then it is present in the bootflash of the RSP that crashed.
Console logs and/or Syslog information—These can be crucial in the determination of the issue that originated if multiple symptoms occur. This is usually the case with the Cisco 7500 Series Router. Effective troubleshooting can be performed if the console log/syslog is made available. If the router is set up to send logs to a syslog server, check the server for the log. For console logs, make sure you are directly connected to the console port of the router and Apply Correct Terminal Emulator Settings for Console Connections. Make sure, also, that logging is enabled.
show technical-supportoutput—The show technical-support command is a compilation of many different commands that includes show version, show running-config, and show stacks. When an RSP experiences issues, the Technical Support engineer usually asks for this information. It is important to collect the show technical-support before you do a reload or power-cycle as these actions can cause all information about the problem to be lost. This is because the context information saved on the stacks are cleared when the router is reloaded.
show environment commands—The show environment all command is used in order to view the power supply and temperature output of the router. In addition to the show environment all command, the show environment last and show environment table are also helpful.
Some issues might be misinterpreted as hardware problems, when, in fact, they are not. Some of the more common issues are when the router stops responding or hangs, or when the router fails due to new hardware installation. This is a list of symptoms, explanations, and troubleshooting steps for these commonly misinterpreted issues:
This error message might be due to configuration changes, Online Insertion and Removal (OIR) of an interface processor, or other software or bad hardware issues. This error message is discussed in detail in What Causes a "%RSP-3-RESTART: cbus complex"?.
These are some troubleshooting guidelines, which depend on the type of issue you encounter:
Parity Errors - Parity errors on a 7500 are most commonly triggered due to bad hardware. In order to troubleshoot parity errors, capture the output at the time of the crash. Once you have this information, refer to Processor Memory Parity Errors - RSP for detailed troubleshooting steps.
Bus Error at a Valid Address - Refer to Troubleshooting Bus Error Crashes for further information on bus errors. If the address of the bus error is a valid address, then the most likely cause of the problem is a hardware failure.
Continuous Rebooting - If the Cisco 7500 Series Router continuously reboots, even after a power-cycle of the router, complete these steps:
Remove all the cards, except for the RSP. Move it to the Primary RSP if it sits on the Secondary RSP slot, and power-cycle the router. If the router does not boot up with the Primary RSP, move that RSP to the Secondary slot, and reload the router.
If the router still does not work properly, collect the console log/syslog of the boot sequence and create a service request with the Cisco Technical Support.
If you have identified a component that needs to be replaced, contact your Cisco partner or reseller to request a replacement for the hardware component that is causing the issue. If you have a support contract directly with Cisco, use the Cisco Technical Support Service Request Tool (registered customers only) to open a Cisco Technical Support service request for a hardware replacement. Make sure you attach the following information:
Console captures that show the error messages
Console captures that show the troubleshooting steps taken and the boot sequence during each step
The hardware component that failed and the serial number for the chassis