Guest

Cisco 10000 Series Routers

Hardware Troubleshooting for the Cisco 10000 (ESR) Series Router

Cisco - Hardware Troubleshooting for the Cisco 10000 (ESR) Series Router

Document ID: 16321

Updated: Mar 09, 2009

   Print

Introduction

This document explains processes and procedures for user level hardware troubleshooting on the Cisco 10000 Edge Services Router (ESR). These are the troubleshooting steps that you can take before you escalate the problem with the Cisco Technical Support.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

Overview

The Cisco 10000 Edge Services Router (ESR) is a high capacity Layer 3 router optimized to support selected Cisco IOS® software services at wire speed performance on thousands of DS0/DS1/E1 connections. Designed primarily for use in a telecommunications central office environment, it provides interfaces that connect to large numbers of subscribers using low-speed circuits, and aggregates these into a small number of high-speed trunk interfaces. The 10008 chassis has eight line card slots, and the 10005 chassis has five line card slots. Both chassis have two dedicated slots for Performance Routing Engine (PRE) modules.

Components Used

The information in this document is based on these software and hardware versions:

  • Cisco 10008 Series Edge Services Router

  • All Cisco IOS software releases that run on the Cisco 10000 Series Edge Services Routers (ESRs)

The outputs shown in this document are based on Cisco IOS Software Release 12.2(15)BZ.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Supported Cisco IOS Software Releases

When you add new hardware to the Cisco 10000 Edge Services Series Routers, first check to make sure that the hardware is supported for the platform and the Cisco IOS software release. Use the Software Advisor tool (registered customers only) in order to find out which Cisco IOS software release supports your hardware.

Software is stored on the PRE module which includes two PCMCIA slots that are accessible from the front panel. Either slot can store a Cisco IOS software image or configuration file.

The Flash memory present on Cisco 10000 line cards is used to store a simple ROM monitor or boot loader. The loader executes, which follows a system reset, line card reset, or line card insertion.

Line card images might also be stored in PRE Flash memory or on an external Trivial File Transfer Protocol (TFTP) server.

The PRE stores the system configuration in a 512KB nonvolatile RAM (NVRAM) device. Configuration information read from NVRAM is buffered in RAM that follows initialization, and is written to the device when you save the configuration.

Before you upgrade the ESR 10000, use the Download Software Area and the release notes of the new Cisco IOS software release in order to check the memory requirements. Refer to Software Installation and Upgrade Procedures for more information about the upgrade procedure.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

Identify the Issue

These sections contain basic troubleshooting steps for commonly seen issues on the Cisco 10000 ESR platform.

Capture as much information about the problem as possible in order to determine the cause of the issue. This information is essential to determine the cause of the problem:

  • Console logs

  • show technical-support output

  • Complete bootup sequence if the router experiences boot errors

Memory Parity Errors

A router might reload due to a processor memory parity error similar to this example:

10008#show version
Cisco Internetwork Operating System Software
IOS (tm) 10000 Software (C10K-P11-M), Version 12.2(15)BZ,  RELEASE SOFTWARE (fc1)
TAC Support: http://www.cisco.com/tac
Copyright (c) 1986-2003 by cisco Systems, Inc.
Compiled Thu 03-Apr-03 15:12 by leccese
Image text-base: 0x60008954, data-base: 0x61780000
ROM: System Bootstrap, Version 12.0(9r)SL2, RELEASE SOFTWARE (fc1)
ESR10008 uptime is 28 minutes
System returned to ROM by processor memory parity error at PC 0x60301298, 
address 0x0 at 12:05:31 UTC  Sun Oct 12 2003
System restarted at 13:33:29 UTC  Sun Oct 12 2003
System image file is "disk0:c10k-p11-mz.122-15.BZ"

!--- Output suppressed.

Soft Versus Hard Parity Errors

There are two different kinds of parity errors:

  • Soft parity errors—These occur when an energy level within the Dynamic RAM (DRAM) (for example, a one or a zero) changes. When referenced by the CPU, soft parity errors cause the system to either crash (if the error is in an area that is not recoverable), or an attempt is made to recover by restarting the affected subsystem. In case of a soft parity error, there is no need to swap any of the components.

  • Hard parity errors—These occur when there is a DRAM or board failure that causes data to be corrupted. In this case, you should re-seat or replace the affected component. This usually requires swapping the DRAM or board.

    You refer to a hard parity error when you see multiple parity errors at the same address. There are more complicated cases which are harder to identify, but in general, if you see more than one parity error in a particular memory region in a relatively short period of time, several weeks to months, this might be considered a hard parity error.

Studies show that soft parity errors are ten to 100 times more frequent than hard parity errors. Therefore, it is recommended that you wait for a second parity error before you replace anything, as it greatly reduces the impact on your network. This show log message is an example of soft parity error.

%C10720_Access4GE8FE-3-GB_ACC_FPGA_INT: Access FPGA interrupt
VA_TX_PAR_ERR (code 0x4)

%C10720_Access4GE8FE-3-GB_BUF_FPGA_INT: Buffer FPGA#1 interrupt
TX_DDR_PARITY_INT_STATUS (code 0x)

The course of action for this type of problem is to monitor the router for several weeks after the first incident and if the problem occurs again, replace the defective hardware.

Refer to Processor Memory Parity Errors for more information about parity errors.

Refer to 10000 ESR PRE1 Parity Error Fault Tree in order to troubleshoot and isolate which parts of the ESR 10000 fail when you identify a variety of parity error messages.

Bus Errors

Either a hardware failure or a software bug can cause bus errors. Examine the output of a show version from the router in order to determine the cause. This is an excerpt from the show version command:

System returned to ROM by bus error at PC 0x0, address 0x0 at 
04:15:55 UTC Thu Oct 9 2003
System restarted at 04:18:56 UTC Thu Oct 9 2003
System image file is "disk0:c10k-p11-mz.122-15.BZ"
cisco C10008 (PRE1-RP) processor with 458751K/65536K bytes of memory.

If the address accessed, which, in this example, is 0x0, is a valid address, then the problem is most likely hardware. You would map the address to a memory map or show region command from the router in order to determine which hardware component is defective. If the address is an invalid address such as in this case, the problem is software-related. Decode the stack trace and search for a bug. Registered CCO users who are logged in can use the Output Interpreter tool (registered customers only) in order to decode the show stacks output and search for a known bug.

10008#show region

Region Manager:

     Start         End      Size(b)   Class   Media     Name
0x08000000  0x0FFFFFFF   134217728    Iomem     R/W    iomem
0x28000000  0x2FFFFFFF   134217728    Iomem     R/W    iomem:(iomem_cwt)
0x60000000  0x67FFFFFF   134217728    Local     R/W    main
0x60008900  0x60C57FFF    12908288    IText     R/O    main:text
0x60C58000  0x60D4AFDF      995296    IData     R/W    main:data
0x60D4AFE0  0x6106825F     3265152     IBss     R/W    main:bss
0x61068260  0x61068260   117013920    Local     R/W    main:heap
0x70000000  0x7FFFFFFF   268435456    Local     R/W    heap2
0x80000000  0x87FFFFFF   134217728    Local     R/W    main:(main_k0)
0xA0000000  0xA7FFFFFF   134217728    Local     R/W    main:(main_k1)

In the previous example, the memory address does not fall into a valid memory range, so a software bug most likely caused the problem. If the address falls within a hardware range, you can replace the memory in order to resolve this issue. In some cases, the replacement of the processor might also be necessary. Refer to Troubleshooting Bus Error Crashes for more information on how to troubleshoot bus errors.

Router Hangs

Router hangs can be either software- or hardware-related. A router hang occurs when the router stops switching traffic, and might also be unresponsive on the console (you do not get a router prompt). Refer to Troubleshooting Router Hangs for details on how to troubleshoot a router hang in this case.

Parallel Express Forwarding (PXF) Errors

PXF issues can be difficult to diagnose and might be hardware or software issues. Such troubleshooting goes beyond the scope of this documentation. If you receive any PXF error messages in the logging buffer or on the console, you should create a service request with the Cisco Technical Support for further troubleshooting.

Basic Troubleshooting on PREs

Troubleshooting PREs describes how to troubleshoot Performance Routing Engines (PREs). It provides information on how to troubleshoot PRE fault states, the management Ethernet port, and the serial port.

Basic Troubleshooting on Line Cards

These links provide troubleshooting help for Cisco 10000 ESR line cards:

PEM Faults and Blower Failures

PEM Faults and Blower Failures discusses troubleshooting faults on the Cisco 10000 ESR Power Entry Modules (PEMs) and blower modules.

Alarms and Error Messages

Cisco 10000 ESR Alarms and Error Messages provides troubleshooting steps for alarms and error messages on the Cisco 10000 ESR.

Related Information

Updated: Mar 09, 2009
Document ID: 16321