Guest

Cisco IOS Software Releases 12.1 Mainline

Troubleshooting Bus Error Crashes

Document ID: 7949

Updated: Nov 29, 2006

   Print

Introduction

This document explains how to identify bus error crashes and how to troubleshoot those crashes depending on the type of processor you have in your Cisco router.

Prerequisites

Requirements

Cisco recommends that you read Troubleshooting Router Crashes before proceeding with this document.

Components Used

The information in this document is based on these software and hardware versions:

  • All Cisco IOS® software versions

  • All Cisco routers

Note: This document does not apply to Cisco Catalyst switches or MGX platforms.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

Refer to the Cisco Technical Tips Conventions for more information on document conventions.

Identifying Bus Error Crashes

The system encounters a bus error when the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem). A bus error can be identified from the output of the show version command provided by the router if not power-cycled or manually reloaded.

If you have the output of a show version or show technical-support command (from enable mode) from your Cisco device, you can use it to display potential issues and fixes. In order to use it, you must be a registered customer, be logged in, and have JavaScript enabled.

Router uptime is 2 days, 21 hours, 30 minutes 

System restarted by bus error at PC 0x30EE546, address 0xBB4C4 

System image file is "flash:igs-j-l.111-24.bin", booted via flash 
.........

At the console prompt, this error message can also be seen during a bus error:

*** System received a Bus Error exception *** 
signal= 0xa, code= 0x8, context= 0x608c3a50
PC = 0x60368518, Cause = 0x20, Status Reg = 0x34008002
.........

After this, the router reloads. In some cases, however, the router goes into a loop of crashes and reloads and manual intervention is required to break out of this loop.

Another related issue is a Versatile Interface Processor (VIP) crash. If this problem occurs, error messages similar to these are logged:

%VIP2 R5K-1-MSG: slot0 System reloaded by a Bus Error exception
%VIP2 R5K-1-MSG: slot0 caller=0x600BC974
%VIP2 R5K-1-MSG: slot0 System exception: sig=10, code=0x408,
context=0x605B51E0

Finally, another bus error crash type is a line card crash on a Cisco 12000 Series Internet Router. If this problem occurs, error messages similar to these are logged in the show context output:

Router#show context

... 

CRASH INFO: Slot 1, Index 1, Crash at 11:27:15 utc  Wed May 16 2001
 VERSION: 

 GS Software (GLC1-LC-M), Version 12.0(16.5)S, EARLY DEPLOYMENT MAINTENANCE 

 INTERIM SOFTWARE 

 TAC Support: http://www.cisco.com/pcgi-bin/ibld/view.pl?i=support    

 Compiled Thu 29-Mar-01 17:12 by ninahung 

 Card Type: 3 Port Gigabit Ethernet, S/N 

 System exception: SIG=10, code=0x2008, context=0x40D8DF44 

 System restarted by a Bus Error exception 

 STACK TRACE: 

 -Traceback= 40165800 4038D0FC 4025C7BC 4026287C 4029581C 402EECF8 400C0144    

 CONTEXT: 

 $0 : 00000000, AT : 00000000, v0 : 00000044, v1 : 0FE00020 

 a0 : 00000000, a1 : 0FE00000, a2 : 00000000, a3 : 39EC6AAB 

 t0 : 00000030, t1 : 34008D01, t2 : 34008100, t3 : FFFF00FF 

 t4 : 400C01E8, t5 : 00000001, t6 : 00000001, t7 : 00000001 

 s0 : 40DCDD20, s1 : 0FE00000, s2 : 00000000, s3 : 000005DC 

 s4 : 00000000, s5 : 0FE00020, s6 : 00000004, s7 : 414CF120 

 t8 : 41680768, t9 : 00000000, k0 : 00000000, k1 : FFFF8DFD 

 gp : 40CB9780, sp : 4105BFE8, s8 : 41652BA0, ra : 4038D0FC 

 EPC : 0x40165800, SREG : 0x34008D03, Cause : 0x00002008 

 ErrorEPC : 0xBFC22B94
 -Process Traceback= No Extra Traceback

See Troubleshooting Line Card Crashes on the Cisco 12000 Series Internet Router for more details.

If you have the output of a show context command from your Cisco device, you can use to display potential issues and fixes. To use , you must be a registered customer, be logged in, and have JavaScript enabled.

Troubleshooting Bus Error Crashes

The first thing to do is to find out which memory location (also known as the "address" or "address operand") the router tried to access when the bus error occurred. With this information, you have an indication as to whether the fault lies with the Cisco IOS Software or the router hardware. In the example, "System restarted by bus error at PC 0x30EE546, address 0xBB4C4", the memory location that the router tried to access is 0xBB4C4. Do not confuse this with the program counter (PC) value above.

The second thing to do is determine the type of processor in the router. Memory address locations for routers differ depending on the type of processor. There are two main types of processors in Cisco routers:

  • 68000 Processors

    This is part of a show version output that indicates that the router has a 68000 processor:

    cisco 2500 (68030) processor (revision D) with 8192K/2048K bytes of memory.
    

    Router platforms that have 68000 processors include:

    • Cisco 1000 Series Routers

    • Cisco 1600 Series Routers

    • Cisco 2500 Series Routers

    • Cisco 4000 Series Routers

    • Route Processor (RP) Modules on Cisco 7000 (RP) Series Routers

  • Reduced Instruction Set Computing (RISC) Processors

    This is part of a show version output that indicates that the router has a RISC processor:

    cisco 3640 (R4700) processor (revision 0x00) with 49152K/16384K bytes of memory.

    The R in (R4700) indicates a RISC processor.

    Router platforms that have RISC processors include:

    • Cisco 3600 Series Routers

    • Cisco 4500 Series Routers

    • Cisco 4700 Series Routers

    • Route Switch Processor (RSP) Modules on Cisco 7500 Series and Cisco 7000 (RSP7000) Series Routers

    • Network Processor Engine (NPE) Modules on Cisco 7200 Series Routers

    • Multilayer Switch Feature Card (MSFC) on the Cisco 7600 Series Routers or Catalyst 6000 Switch

    • Performance Routing Engine (PRE) Modules on Cisco 10000 Series Internet Routers

    • Gigabit Route Processor (GRP) Modules on Cisco 12000 Series Internet Routers

Once you have determined the address and the processor type, you can start with more detailed troubleshooting.

Troubleshooting Bus Error Crashes on 68000 Processor Platforms

With the address accessed by the router when the bus error occurred, use the show region command to determine the memory location the address corresponds to. If the address reported by the bus error does not fall within the ranges displayed in the show region output, this means that the router tried to access an address that is not valid. This indicates that it is a Cisco IOS Software problem. Use the Output Interpreter Tool (registered customers only) to decode the output of the show stacks command and identify the Cisco IOS Software bug that causes the bus error.

On the other hand, if the address falls within one of the ranges in the show region output, it means that the router accessed a valid memory address, but the hardware corresponding to that address does not respond properly. This indicates a hardware problem.

Here is an example of the show region output:

Router#show region 

Region Manager: 

     Start         End     Size(b)  Class  Media  Name 

0x00000000  0x007FFFFF     8388608  Local  R/W    main 

0x00001000  0x0001922F       98864  IData  R/W    main:data 

0x00019230  0x000666B3      316548  IBss   R/W    main:bss 

0x000666B4  0x007FEFFF     7965004  Local  R/W    main:heap 

0x007FF000  0x007FFFFF        4096  Local  R/W    main:flhlog 

0x00800000  0x009FFFFF     2097152  Iomem  R/W    iomem 

0x03000000  0x037FFFFF     8388608  Flash  R/O    flash 

0x0304033C  0x037A7D3F     7764484  IText  R/O    flash:text

Note: In some earlier Cisco IOS Software versions, this command is not available. The show region output is part of the show tech-support output from Cisco IOS Software Release 12.0(9).

Addresses are displayed in hexadecimal format. The addresses that fall within the "Start" and "End" ranges are valid memory addresses.

Main corresponds to main memory or dynamic RAM (DRAM).

iomem corresponds to input/output (I/O) memory, which means different parts for different platforms. For example, DRAM for the Cisco 2500, shared RAM (SRAM) for the Cisco 4000.

Still using the previous example, System restarted by bus error at PC 0x30EE546, address 0xBB4C4, this bus error crash comes from a Cisco 2500 router with the show region output. The address 0xBB4C4 is equivalent to 0x000BB4C4. Using the show region output, this address falls within the range of "main", or more specifically, "main:heap" or 0x000666B4-0x007FEFFF. As mentioned earlier, "main" corresponds to the main memory or the DRAM, so the DRAM chips need to be checked.

If this is a new router, or if the router has been moved from one location to another, the memory chips often become loose. It's a good idea to reseat or firmly push the memory chips into the slot. Most of the time, this is sufficient for solving this type of crash.

For bus error crashes with addresses that do not fall within the show region address ranges, use the Output Interpreter Tool to decode the output of the show stacks command and identify the Cisco IOS Software bug that is causing the bus error. If you are uncertain which bug ID may match or which Cisco IOS software version contains the fix for the problem, upgrading your Cisco IOS software to the latest version in your release train is one option that often resolves the issue since this usually contains the fix for a large number of bugs.

If you have the output of a show stacks or show technical-support (from enable mode) command from your Cisco device, you can use to display potential issues and fixes. To use , you must be a registered customer, be logged in, and have JavaScript enabled.

Troubleshooting Bus Error Crashes on RISC Processor Platforms

It is recommended that you read the section on Troubleshooting Bus Error Crashes on 68000 Processor Platforms before you proceed with this section.

On RISC processors, Cisco IOS Software uses virtual addresses through the use of the Translation Lookaside Buffer (TLB) that translates virtual addresses into physical addresses. The address reported by bus errors on RISC processors is therefore the virtual address as opposed to the physical address used by the 68000 processors.

The output of the show region command must be used to check the address reported by the bus error. To illustrate this, let's take the following example:

System was restarted by bus error at PC 0x60104864, address 0xC

Using the show region command output below, you can verify that 0xC is not a valid virtual address, and you can conclude that the bus error was caused by a software problem. Use the Output Interpreter Tool (registered customers only) to decode the output of the show stacks or show technical-support (from enable mode) command and identify the Cisco IOS Software bug that is causing the bus error.

Another advantage of using the show region command is that the memory mapping depends on the amount of memory installed on the router. For example, if you have 64 MB of DRAM (64 x 1024 x 1024 = 67108864 bytes = 0x4000000 bytes), the DRAM range is 0x60000000 - 0x63FFFFFF for 64 MB. This is confirmed with the show region command:

Router#show version | i of memory 

cisco RSP2 (R4700) processor with 65536K/2072K bytes of memory. 


Router#show region 

Region Manager: 


     Start         End     Size(b)  Class  Media  Name 

0x40000000  0x40001FFF        8192  Iomem  REG    qa 

0x40002000  0x401FFFFF     2088960  Iomem  R/W    memd 

0x48000000  0x48001FFF        8192  Iomem  REG    QA:writethru 

0x50002000  0x501FFFFF     2088960  Iomem  R/W    memd:(memd_bitswap) 

0x58002000  0x581FFFFF     2088960  Iomem  R/W    memd:(memd_uncached) 

0x60000000  0x63FFFFFF    67108864  Local  R/W    main 

0x60010908  0x60C80B11    13042186  IText  R/O    main:text 

0x60C82000  0x60F5AF1F     2985760  IData  R/W    main:data 

0x60F5AF20  0x610E35FF     1607392  IBss   R/W    main:BSS 

0x610E3600  0x611035FF      131072  Local  R/W    main:fastheap 

0x61103600  0x63FFFFFF    49269248  Local  R/W    main:heap 

0x80000000  0x83FFFFFF    67108864  Local  R/W    main:(main_k0) 

0x88000000  0x88001FFF        8192  Iomem  REG    QA_k0 

0x88002000  0x881FFFFF     2088960  Iomem  R/W    memd:(memd_k0) 

0xA0000000  0xA3FFFFFF    67108864  Local  R/W    main:(main_k1) 

0xA8000000  0xA8001FFF        8192  Iomem  REG    QA_k1 

0xA8002000  0xA81FFFFF     2088960  Iomem  R/W    memd:(memd_k1)

If you have a bus error at 0x65FFFFFF, the show region output takes the amount of memory into account and tells you that it's an illegal address (software bug).

In summary:

  • Use the show region command to verify whether the address indicated by the bus error is within the address ranges used by the router.

  • If the address falls within a virtual address range, replace the hardware corresponding to this range.

  • If the address does not fall within a virtual address range, use the Output Interpreter Tool (registered customers only) to decode the output of the show stacks or the show technical-support (from enable mode) command and identify the Cisco IOS software bug that is causing the bus error.

  • Give serious consideration to installing the most recent maintenance release of the Cisco IOS Software train that you are currently running.

Special Types of Bus Error Crashes

A special type of bus error crash is when the crash is caused by a corrupted program counter (PC). The PC value is the location of the instruction which the processor was executing when the bus error occured. When a bus error caused by a corrupted PC occurs, the following message appears on the console:

%ALIGN-1-FATAL: Corrupted program counter 

  pc=0x0, ra=0x601860BC, sp=0x60924540, at=0x60224854

In this case, the PC has jumped to the address 0x0 (probably because of a null pointer), but this is not where the instruction is located. This is a software problem so there is no need to check with the show region command.

On other RISC platforms (Cisco 3600, 4500, and so forth), you get a SegV exception when jumping to an illegal PC, not a bus error.

Another type of bus error crash that occurs from time to time is when the PC value is equal to the address value. For instance:

System returned to ROM by bus error at PC 0x606B34F0, address 0x606B34F0

From the crashinfo file:

Unexpected exception, CPU signal 10, PC = 0x606B34F0
   
  $0 : 00000000, AT : A001A24A, v0 : 00000000, v1 : 00000000
  a0 : 00000000, a1 : 429CC394, a2 : 00000000, a3 : 62544344
  t0 : 6069F424, t1 : 3400FF00, t2 : FFFFFFFB, t3 : 00000000
  t4 : 606B8E68, t5 : 80000000, t6 : AA5C1022, t7 : 62FDE9D4
  s0 : 62300000, s1 : 6281A1B8, s2 : 80007E20, s3 : 00000001
  s4 : 00000001, s5 : 00000000, s6 : 62310000, s7 : 62544344
  t8 : 62FDEA1C, t9 : 0D0D0D0D, k0 : 623079C0, k1 : 00000014
  gp : 620B9E20, sp : 61E7E300, s8 : 00000000, ra : 606B8E68
  EPC : 606B34F0, ErrorEPC : 606B8E68, SREG : 3400FF02
  Cause 00004018 (Code 0x6): Instruction Bus Error exception
   
  -Traceback= 606B34F0 606B8E68

Notice the k1 register value is 0x14 (hexadecimal) which is equal to 20 in decimal. This points to a Cache Parity Exception. In this particular case, the parity error is not handled properly and is being masked by a bus error. The router has crashed due to a software bus error in the function handling a Cache Parity Exception.

You should consider this crash as a regular processor memory parity error crash and follow the recommendations given in Processor Memory Parity Errors (PMPEs).

You should also consider upgrading the Cisco IOS software release to a version which has a fix for CSCdv68388 - "Change cache error exception handler to resume not crash" which has been fixed since Cisco IOS Software Release 12.2(10).

Troubleshooting Techniques for Bus Error Exception Boot Loops

This section focuses on general troubleshooting techniques for bus error exception boot loops:

  • Cisco IOS software loaded does not support installed hardware

  • Software Failure

  • Mis-seated Hardware

  • Hardware Failure

Cisco IOS Software Loaded Does Not Support Installed Hardware

Verify that all network cards are supported by the Cisco IOS software. The Software Advisor (registered customers only) gives you the minimum versions of Cisco IOS software needed for hardware. Verify, also, that the bootflash image supports the hardware installed if you have a router that supports a boot image such as the Cisco 7200 or Cisco 7500 series router.

Software Failure

On 2600 and 3600 routers, the router's I/O memory is configurable as a percentage of the main memory. If the I/O memory settings are inappropriate for the installed network modules or WAN interface cards (WICs), the 2600/3600 platform may have trouble booting and may crash with bus errors.

If a software configuration change has recently been made, and the router is in a booting loop, a software bug may be causing this issue.

If the router is not able to boot up, you can bypass the configuration to identify whether that is causing the issue. Follow these steps:

  1. Break into ROMMON by sending the break sequence to the router during the first 60 seconds of boot up.

  2. From ROM Monitor, use the confreg command to change the configuration register to a setting, such as 0x2142, to ignore the router's configuration:

    rommon 1 > confreg 0x2142
    
    You must reset or power cycle for new config to take effect
    
    rommon 2 > reset

If the router boots without any errors, there is a configuration issue causing the problem. Verify that your configuration is supported in the Cisco IOS software and by the hardware. If it is supported, use the Bug Toolkit (registered customers only) to identify any software bugs that you may be experiencing. Give serious consideration to installing the most recent maintenance release of the Cisco IOS software train that you are currently running.

Mis-seated Hardware

If you are experiencing a bus error exception booting loop, it may be caused by mis-seated hardware. For lower-end platforms such as the 3600 or 4000 router, reseat the network modules/network processors.

For higher-end platforms such as the 7200 or 7500 routers, reseat the processor, VIP, port adapters, or line card that is reloading due to a bus error exception.

Hardware Failure

The information contained in the bus error does not help to isolate the hardware. Therefore, it is important to remove and reinsert cards to find the problem hardware. Here are some recommended steps to isolate the problem:

flow1.jpg

**If the router does not experience the continuous loop after following the troubleshooting steps above, then the problem may have been caused by a mis-seated network module. It is recommended that you monitor the router for 24 hours to be sure that the router continues to function without experiencing the issue again.

Information to Collect if You Open a Service Request

If you still need assistance after following the troubleshooting steps above and want to open a case with Cisco Technical Support, be sure to include the following information for troubleshooting a bus error or bus error exception:
  • Troubleshooting performed before opening the case
  • show technical-support output (if possible, in enable mode)
  • show log output or console captures, if available
  • crashinfo file (if present and not already included in the show technical-support output)
  • show region output (if not already included in the show technical-support output)
Attach the collected data to your case in non-zipped, plain text format (.txt). You can attach information to your case by uploading it using the Case Query Tool (registered customers only) . If you cannot access the Case Query Tool, you can attach the relevant information to your case by sending it to attach@cisco.com with your case number in the subject line of your message.

Note: Do not manually reload or power-cycle the router before collecting the above information unless required to troubleshoot a bus error exception as this can cause important information to be lost that is needed for determining the root cause of the problem.

Related Information

Updated: Nov 29, 2006
Document ID: 7949