Guest

Cisco 7200 Series Routers

Hardware Troubleshooting for the Cisco 7200 Series Router

Document ID: 16122

Updated: Mar 09, 2009

   Print

Introduction

Valuable time and resources are often wasted replacing hardware that actually functions properly. This document helps troubleshoot potential hardware issues with Cisco 7200 Series Routers, and can help you identify which component may be causing a hardware failure, depending on the type of error that the router is experiencing.

Note: This document does not cover any software-related failures except for those that are often mistaken as hardware issues.

Prerequisites

Background

The Cisco 7200 Series Router contains a single Network Processor Engine (NPE) or Network Services Engine (NSE), an Input/Output (I/O) controller card, and can have up to six Port Adapters (PAs) for the 7206/7206VXR chassis.

For a more detailed understanding of the Cisco 7200 Series Router Architecture, refer to Cisco 7200 Series Router Architecture.

Requirements

Cisco recommends that you have knowledge of these topics:

Components Used

The information in this document is not specific to any Cisco IOS® software release, but applies to all Cisco IOS software versions that run on the 7200 Series Router.

This document covers troubleshooting on the 7200 Series Router for both the standard and VXR chassis including the 7202, 7204/7204VXR, and the 7206/7206VXR.

For hardware troubleshooting assistance on the uBR7200 Series platforms, refer to Hardware Troubleshooting for the Cisco uBR72xx / uBR7246 VXR Universal Broadband Router.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Hardware-Software Compatibility and Memory Requirements

Whenever you install a new card, module, or Cisco IOS software image, it is important to verify that the router has enough memory, and that the hardware and software are compatible with the features you wish to use.

Perform these recommended steps to check for hardware-software compatibility and memory requirements:

  1. Use the Software Advisor tool (registered customers only) to choose software for your network device.

    Tips:

  2. Use the Download Software Area (registered customers only) to check the minimum amount of memory (RAM and Flash) required by the Cisco IOS software, and/or to download the Cisco IOS software image. To determine the amount of memory (RAM and Flash) installed on your router, see Memory Requirements.

    Tips:

    • If you want to keep the same features as the version that is currently running on your router, but don't know which feature set you are using, enter the show version command on your router and paste it on the Output Interpreter (registered customers only) tool to find out. It is important to check for feature support, especially if you plan to use recent software features.

    • If you need to upgrade the Cisco IOS software image to a new version or feature set, refer to How to Choose a Cisco IOS Software Release for more information.

  3. If you determine that a Cisco IOS software upgrade is required, complete the Software Installation and Upgrade Procedure for the Cisco 7200 series router.

Upgrading the Boot Image

Refer to Oversized Cisco 7200/uBR 7200 Boot Image for more information about the upgrade of the boot image on older and newer models of the Cisco 7200 Series Router.

Error Messages

The Error Message Decoder (registered customers only) tool allows you to check the meaning of an error message. Error messages appear on the console of Cisco products, usually in the following form:

%XXX-n-YYYY : [text]

Here is an example error message:

Router# %SYS-2-MALLOCFAIL: Memory allocation of [dec] bytes failed from [hex], pool [chars], alignment [dec]

Some error messages are informational only, while others indicate hardware or software failures and require action. The Error Message Decoder (registered customers only) tool provides an explanation of the message, a recommended action (if needed), and if available, a link to a document that provides extensive troubleshooting information about that error message.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

Identifying the Issue

In order to determine the cause, the first step is to capture as much information about the problem as possible. This information is essential to determine the cause of the problem:

  • Crashinfo file(s): When the router crashes, a file is saved into the bootflash of the I/O controller. That file contains details on the reason why the crash occurred. Refer to Retrieving Information from the Crashinfo File for more details.

  • Console logs and/or Syslog information: These are crucial in determining the originating issue if multiple symptoms are occurring. For more information on how to set up your PC to view console logs, refer to Applying Correct Terminal Emulator Settings for Console Connections. If the router is set up to send logs to a syslog server, you may find some information on what happened. For details, refer to How to Configure Cisco Devices for Syslog. In general, it is best to be directly connected to the router on the console port with logging enabled.

  • Show Technical-Support: The show technical-support command is a compilation of many different commands which includes show version, show running-config and show stacks. When a Cisco 7200 runs into problems, the Cisco Technical Assistance Center (TAC) usually asks for this information. It is important to collect the show technical-support before a reload or power-cycle as either of these can cause all information about the problem to be lost.

  • The complete bootup sequence if the router experiences boot errors.

If you have the output of a show command from your Cisco device, you can use to display potential issues and fixes. To use , you must be a registered customer, be logged in, and have JavaScript enabled.

Common Problems

There are a few issues that can be misinterpreted as hardware problems when, in fact, they are not. For instance, a failure following a new hardware installation is not always a hardware issue. Another example is when the router stops responding or "hangs".

This table provides symptoms, explanations, and troubleshooting steps for these commonly misinterpreted issues:

Symptom Explanation
Router hangs A router might experience a router hang. A hang is when the router boots to a certain point and then no longer accepts any commands or keystrokes. In other words, the console screen hangs after a certain point. Hangs are not necessarily hardware issues and most of the time, they are a software issue. If your router is experiencing a router hang, Troubleshooting Router Hangs helps troubleshoot this issue.
The Port Adapter (PA) is not recognized and comes up with a console message such as:
%PA-2-UNDEFPA: Undefined Port Adapter 
type 106 in bay 2
Boot images do not support crypto engines such as SA-ISA or SA-VAM. If one of them is inserted into the chassis, there will be an "Undefined Port Adapter" message at bootup and the card will only be detected when the main crypto Cisco IOS software image is loaded. Moreover, the boot process will be slowed down by 1-2 minutes. This expected behavior doesn't affect the operation of the router.
Bad CPU ID error messages Bad CPU ID error messages are always due to Cisco IOS software (usually the boot image) which does not recognize either the NPE-300/NPE-400 or the VXR chassis. Refer to What Causes "BAD CPU ID" Messages for this issue. Upgrading the Cisco IOS software or the boot image to a version that supports the unrecognized hardware solves this problem.
CPU (Central Processing Unit) utilization is running very high While there are hardware problems that can cause this, it is much more likely that the router is either misconfigured or something on the network is causing the problem. The Troubleshooting High CPU Utilization on a Cisco Router page should help troubleshoot this.
Memory allocation errors - SYS-2-MALLOCFAIL Memory allocation errors are almost never caused by hardware problems. Troubleshooting tips for memory allocation errors are located on the Troubleshooting Memory Problems page.
Router crashes Not all crashes are caused by bad hardware. Troubleshooting Router Crashes can help you determine whether or not the crash was caused by software.
%PLATFORM-3-PACONFIG and %C7200-3-PACONFIG Error Messages These error messages are often caused by an incorrect port adapter configuration. Refer to What Causes %PLATFORM-3-PACONFIG and %C7200-3-PACONFIG Error Messages? for more information.
What Causes %SYS-3-CPUHOG Messages? This document explains the causes of %SYS-3-CPUHOG error messages, and how to troubleshoot them.
Buffer leaks Buffer leaks are Cisco IOS software bugs. There are two differents kinds of buffer leaks: wedged interface and system buffer leaks. The show interfaces and show buffers commands help determine the type of buffer leak you are encountering. See Troubleshooting Buffer Leaks for more information.
Bus Error Crashes and Bus Error Exceptions
System restarted by bus error at 
PC 0x30EE546, address 0xBB4C4
or
** System received a Bus Error 
exception**
The system encounters a bus error when the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem). For more information regarding this issue, refer to Troubleshooting Bus Error Crashes.
SegV Exceptions
System restarted by error - 
a SegV exception
or
** System received a SegV 
exception ** 
Refer to SegV Exceptions for more information regarding this issue.
System restarted by error
Software-forced crash
or
** System received a Software 
forced crash **
A software-forced crash occurs when the router detects a severe, unrecoverable error and reloads itself to prevent the sending of corrupted data. For more information regarding this issue, refer to Understanding Software-forced Crashes.
%ERR-1GT64120 (PCI0):Fatal error, Memory Parity Error Data with bad parity can be reported by several of the parity checking devices on the C7200/NPE router for any read or write operation. Refer to Cisco 7200 Parity Error Fault Tree for more information.
%RSP-3-RESTART: interface [xxx], output stuck/frozen/not transmitting Messages Refer to What Causes %RSP-3-RESTART: interface [xxx], output stuck/frozen/not transmitting Messages? to troubleshoot this type of error message.
Online Insertion and Removal (OIR) Refer to Online Insertion and Removal (OIR) Support in Cisco Routers for more information.

Step-by-Step Troubleshooting

Parity Errors

This is one of the most common types of errors that is frequently misunderstood and can possibly cause unnecessary down time if appropriate troubleshooting is not performed.

The purpose of this section is to describe what forms of parity errors can be detected by Cisco IOS software, and how to decipher or diagnose a "hard parity error" (one that reoccurs and is due to faulty or damaged hardware) and a "soft parity error" (a transient change in charge in a DRAM cell that is not due to faulty or damaged hardware). There is evidence of significant field returns for "soft parity errors" for which replacing the hardware has no benefit.

Recommended Actions

At the first occurrence of a parity error, it is not possible to distinguish between a "soft parity error" and a "hard parity error". From experience, most parity occurrences are soft parity errors and can usually be dismissed. If you have recently changed some hardware or have moved the chassis, try reseating the affected part (DRAM, SRAM, NPE, PA). Frequent multiple parity occurrences signify faulty hardware. The affected part (DRAM, PA, VIP, or motherboard) should be replaced using the troubleshooting instructions mentioned below.

Understand the Cisco 7200 Series Architecture for Effective Troubleshooting

Refer to Cisco 7200 Series Router Architecture for an overview of this platform.

The Cisco 7200 series uses DRAM, SDRAM and SRAM memory on the NPE in various combinations, depending on the NPE model:

  • PCI Bus—There are three PCI data buses in the Cisco 7200: PCI 0, PCI 1, and PCI 2. PCI 1 and PCI 2 extend from the NPE to the midplane and interconnect the media interfaces (port adapters) to the CPU and the memory on the NPE. PCI 0 is separate and is used to connect the media interface and the PCMCIA on the I/O controller to the CPU and the memory on the NPE. Running at 25 MHz, PCI 0, PCI 1, and PCI 2 provide up to 800 Mbps each in bandwidth.

  • I/O controller—Provides the console connection, the auxiliary connection, the NVRAM, the Boot ROM, the Boot FLASH, and the built-in interface controller (either an Ethernet or Fast Ethernet interface). The I/O controller also provides access to the Flash memory cards in the PCMCIA card slot through PCI bus 0.

  • I/O Bus—Interconnects the non-PCI components on the I/O controller (Console port, AUX port, NVRAM, Boot ROM, and the Boot FLASH) to the CPU and the NPE.

Understand the Different Sources of Parity Errors that may Cause a Reload and the Reporting of a Parity Error

  • DRAM parity error (transient (alpha particle) or hard failure)

  • SRAM parity error (transient or hard failure)

  • Processor internal cache parity exception (instruction or data cache)

  • Interface processor writing bad parity into MEMD (SRAM)

  • Bus parity error (error in CMD, address, or data portion of a bus transaction)

  • Manufacturing defect (bad solder, broken traces, cold solder joint, and so on)

Refer to the Cisco 7200 Parity Error Fault Tree to view the steps to troubleshoot and isolate which part or component of a Cisco 7200 is failing when you identify a variety of parity error messages.

Understand the most Common Reports of Parity Errors

Refer to Processor Memory Parity Errors (PMPEs) for detailed information about parity error reports.

One way to find out where the error has occurred is by looking at the "restart reason" in the console logs, and in the output of the show version command:

Parity Error in DRAM

If you have not manually reloaded the router after the crash, the show version output should look like this:

System restarted by processor memory parity error at PC 0x601799C4, address 0x0 
System image file is "flash:c4500-inr-mz.111-14.bin", booted via flash 

If you have the output of a show command from your Cisco device, you can use to display potential issues and fixes. To use , you must be a registered customer, be logged in, and have JavaScript enabled.

If a crashinfo file is available, or if console logs have been captured, you might also see something similar to this:

 *** Cache Error Exception *** 
Cache Err Reg = 0xa0255c61 
data reference, primary cache, data field error , error on SysAD Bus 
PC = 0xbfc0edc0, Cause = 0xb800, Status Reg = 0x34408007 

Repeated parity error in DRAM means that either the DRAM or the chassis is defective. If the chassis has been recently moved, or if hardware configuration changes have been performed, re-seating the DRAM chips can solve the problem. Otherwise, replace the DRAM as a first step. This should prevent the parity errors. If the router still crashes, replace the chassis only after first exhausting all the infomation in this section and consulting the Cisco TAC.

Parity Error in SRAM

If you have not manually reloaded the router after the crash, you will see something like this in the show version output:

System  restarted by shared memory parity error at PC 0x60130F40 
System image file is "flash:c4500-inr-mz.111-14.bin", booted via flash 

If a crashinfo file is available, or if console logs have been captured, you might also see something similar to this:

*** Shared Memory Parity Error *** 
shared memory control register= 0xffe3 
error(s) reported for: CPU on byte(s): 0/1 

or

%PAR-1-FATAL: Shared memory parity error 
shared memory status register= 0xFFEF 
error(s) reported for: CPU on byte(s): 0/1 2/3 

or

*** Shared Memory Parity Error *** 
shared memory control register= 0xffdf 
error(s) reported for: NIM1 on byte(s): 0/1  2/3

Note: If the error is reported for CPU, replace the SRAM. If the error is reported for NIM(x), replace the network module in slot (x). The SRAM allocated to slot (x) might also be affected, so you might have to replace the SRAM. Repeated parity errors in SRAM most probably indicate either defective SRAM chips, or a defective network module that has written bad parity in the SRAM. If the chassis has been moved recently, or if hardware configuration changes have been performed, re-seating the network modules and the SRAM chips can solve the problem. Otherwise, check where the error was reported in the console logs (see output example above).

Refer to these links for more information:

%IP-3-LOOPPAK: Looping packet detected and dropped

The %IP-3-LOOPPAK: Looping packet detected and dropped error message is received because of a looping packet that has been detected. A common cause is a misconfiguration of an IP helper address. The helper address should be the same address as that of the server of the intended service. Putting the address of the router in the helper address causes a routing loop to be created.

The recommended action is to analyze the source and destination address of the looped packets and verify that the configuration of the IP helper addresses in the router correctly points to the right device and does not point to the local router itself.

System Restarted by Bus Error Exception

The system encounters a bus error when the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem). A bus error can be identified by looking at the output of the show version command provided by the router (if it has not been power-cycled or manually reloaded).

This issue can be either hardware- or software-related. This is an example of such an error message:

*** System received a Bus Error exception ***

signal= 0xa, code= 0x18, context= 0x6206b820

PC = 0x606e356c, Cause = 0x6020, Status Reg = 0x3400800

This is followed by a router reload. In some cases, however, the router goes into a loop of crashes and reloads and manual intervention is required to break out of this loop. Refer to the Troubleshooting Techniques for Bus Error Exception Boot Loops section of Troubleshooting Bus Error Crashes for more information.

For potential hardware-related issues, complete these steps:

  1. Power down the router and remove the Port Adapters (PAs) from the unit. Power the system back up and see if the problem continues.

  2. If the system reloads correctly, place each PA back into the router one at a time, watching for proper installation (no bus error exceptions).

  3. If the system does NOT reload correctly, and continues to reboot or display the Bus Error Exception message, further investigation is necessary to determine the root cause of the errors. The issue might be within the I/O controller or the NPE, or it could be a software error. Refer to Troubleshooting Bus Error Crashes for more information about this issue.

Continuously Rebooting

If the Cisco 7200 series router is continuously rebooting, even after a power-cycle of the router, then something is probably wrong with the hardware. Comeplete these troubleshooting steps:

  1. Remove all the cards, except for the NPE and the I/O controller card; then power-cycle the router.

  2. If it still fails, check if there is a valid image on it. To do this, you must be directly connected to the console port of the router. Send the break key within the first 60 seconds of bootup to go into ROMmon. From there, you can follow the procedures in ROMmon Recovery Procedure to try to recover.

  3. If the router still does not boot, and you are sure that there is a valid image on it, then the NPE and/or the I/O controller card is most likely faulty. However, the fault may be limited to the memory of the NPE or NSE. In this case, replace the memory.

  4. If the router still fails, replace the I/O controller.

  5. If the router still fails, replace the NPE or NSE.

Difference between NPE-G2 and NPE-G1

NPE/DifferencesNPE/Differences NPE-G2 NPE-G1 Impact on system performance
Burst size Burst size is not programmable and always based on the system cache line size Burst size is programmable through MAC registers Throughput differences can be seen for packets that cross the cache line boundary (for example, 128/129B for 32B cache line size
Interrupt coalescing Purely depends on the timer expiration Both the timer and number of received/transmitted packets are used For some low rate(pps) scenario, it is possible to see some extra latency (order of use)
Egress port saturation Re-parenting and en-queuing Re-parenting and en-queuing IOS behavior, and has CPU impact once the port get saturated
Cache line size RX DMA would start to move data to system iomem when it has received a cache line size worth of data Controlled by burst size Lower throughput for packet sizes crossing the cache line boundaries ( n* cache line size +1)
Interrupt level handling ~1/10 of the CPU core speed (for example, at system bus speed) due to external I/O At the CPU core speed (very fast) Features which extensively change interrupt levels (such as IPS/FW, etc.) will not see x2 performance

Troubleshooting Router Hangs

A 7200 Series Router might experience a router hang. A hang is when the router boots to a certain point and then no longer accepts any commands or keystrokes. In other words, the console screen hangs after a certain point.

Hangs are not necessarily hardware issues and, most of the time, are software issues. If your router is experiencing a router hang, refer to Troubleshooting Router Hangs.

Troubleshooting Bandwidth Points

Refer to Bandwidth and Bandwidth Point Requirements for details.

You can use a Cisco 7200 Series Router with a port adapter configuration that exceeds the guidelines listed in this section; however, to prevent anomalies from occurring while the router is in use — for example, High CPU (Sluggish performance) — Cisco strongly recommends restricting the port adapter types installed in the router according to the guidelines listed in the links in this section and based on the hardware you have installed.

Note: Your port adapter configuration must be within the above guidelines before the Cisco Technical Assistance Center will troubleshoot anomalies that are occurring in your Cisco 7200 series router.

Technically, you should not exceed the bandwidth points on a 7200 not because of the bus capacity, but due to instantaneous bus bandwidth and memory latency. In other words, this is not a CPU-loading issue, but a bus bandwidth issue. At some point (regardless of packet throughput), you will get memory requests from all of them at the same time because all have data on them. In this situation, the PCI bus contention cannot guarantee that all the PAs will be serviced before you get overruns and possibly PCI bus timeouts.

The other issue is that this affects SRAM allocation. There is a limited block of SRAM, and this is carved up to the first three fast interfaces, so that one of your fast interfaces will have to use a DRAM memory pool. This increases the memory latency for this interface, and it is likely that overruns will occur. (Note this is only relevant for the NPE-150 and NPE-200).

Port adapters use various types of resources from the chassis and the NPE or NSE. Bandwidth is a term that describes port adapter resource requirements. Bandwidth includes variables such as speed, memory, CPU requirements, and PCI bus bandwidth. Because of changes in architecture in the network processing engines over the years, two methods have been developed to describe port adapter bandwidth requirements. The methods are reflected in the Bandwidth Resource Requirement column and the Bandwidth Points column of Table 1-6. However, the information in these columns must be considered with the information in these sections:

Troubleshooting Port Adapters

Here is a list of troubleshooting resources:

Troubleshooting Serial Interfaces

Here is a list of references to use for troubleshooting serial interfaces:

Information to Collect if You Open a TAC Case

If you have identified a component that needs to be replaced, contact your Cisco partner or reseller to request a replacement for the hardware component that is causing the issue. If you have a support contract directly with Cisco, use the TAC Case Open tool (registered customers only) to open a TAC case and request a hardware replacement. Make sure you attach the following information:
  • Console captures showing the error messages
  • Console captures showing the troubleshooting steps taken and the boot sequence during each step
  • The hardware component that failed and the serial number for the chassis
  • Troubleshooting logs
  • Output from the show technical-support command

Related Information

Updated: Mar 09, 2009
Document ID: 16122