This document explains how to identify bus error crashes and how to troubleshoot those crashes depending on the type of processor you have in your Cisco router.
Cisco recommends that you read Troubleshooting Router Crashes before proceeding with this document.
The information in this document is based on these software and hardware versions:
All Cisco IOS® software versions
All Cisco routers
Note: This document does not apply to Cisco Catalyst switches or MGX platforms.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Refer to the Cisco Technical Tips Conventions for more information on document conventions.
The system encounters a bus error when the processor tries to access a memory location that either does not exist (a software error) or does not respond properly (a hardware problem). A bus error can be identified from the output of the show version command provided by the router if not power-cycled or manually reloaded.
Router uptime is 2 days, 21 hours, 30 minutes System restarted by bus error at PC 0x30EE546, address 0xBB4C4 System image file is "flash:igs-j-l.111-24.bin", booted via flash .........
At the console prompt, this error message can also be seen during a bus error:
*** System received a Bus Error exception *** signal= 0xa, code= 0x8, context= 0x608c3a50 PC = 0x60368518, Cause = 0x20, Status Reg = 0x34008002 .........
After this, the router reloads. In some cases, however, the router goes into a loop of crashes and reloads and manual intervention is required to break out of this loop.
Another related issue is a Versatile Interface Processor (VIP) crash. If this problem occurs, error messages similar to these are logged:
%VIP2 R5K-1-MSG: slot0 System reloaded by a Bus Error exception %VIP2 R5K-1-MSG: slot0 caller=0x600BC974 %VIP2 R5K-1-MSG: slot0 System exception: sig=10, code=0x408, context=0x605B51E0
Finally, another bus error crash type is a line card crash on a Cisco 12000 Series Internet Router. If this problem occurs, error messages similar to these are logged in the show context output:
Router#show context ... CRASH INFO: Slot 1, Index 1, Crash at 11:27:15 utc Wed May 16 2001 VERSION: GS Software (GLC1-LC-M), Version 12.0(16.5)S, EARLY DEPLOYMENT MAINTENANCE INTERIM SOFTWARE TAC Support: http://www.cisco.com/pcgi-bin/ibld/view.pl?i=support Compiled Thu 29-Mar-01 17:12 by ninahung Card Type: 3 Port Gigabit Ethernet, S/N System exception: SIG=10, code=0x2008, context=0x40D8DF44 System restarted by a Bus Error exception STACK TRACE: -Traceback= 40165800 4038D0FC 4025C7BC 4026287C 4029581C 402EECF8 400C0144 CONTEXT: $0 : 00000000, AT : 00000000, v0 : 00000044, v1 : 0FE00020 a0 : 00000000, a1 : 0FE00000, a2 : 00000000, a3 : 39EC6AAB t0 : 00000030, t1 : 34008D01, t2 : 34008100, t3 : FFFF00FF t4 : 400C01E8, t5 : 00000001, t6 : 00000001, t7 : 00000001 s0 : 40DCDD20, s1 : 0FE00000, s2 : 00000000, s3 : 000005DC s4 : 00000000, s5 : 0FE00020, s6 : 00000004, s7 : 414CF120 t8 : 41680768, t9 : 00000000, k0 : 00000000, k1 : FFFF8DFD gp : 40CB9780, sp : 4105BFE8, s8 : 41652BA0, ra : 4038D0FC EPC : 0x40165800, SREG : 0x34008D03, Cause : 0x00002008 ErrorEPC : 0xBFC22B94 -Process Traceback= No Extra Traceback
See Troubleshooting Line Card Crashes on the Cisco 12000 Series Internet Router for more details.
The first thing to do is to find out which memory location (also known as the "address" or "address operand") the router tried to access when the bus error occurred. With this information, you have an indication as to whether the fault lies with the Cisco IOS Software or the router hardware. In the example, "System restarted by bus error at PC 0x30EE546, address 0xBB4C4", the memory location that the router tried to access is 0xBB4C4. Do not confuse this with the program counter (PC) value above.
The second thing to do is determine the type of processor in the router. Memory address locations for routers differ depending on the type of processor. There are two main types of processors in Cisco routers:
This is part of a show version output that indicates that the router has a 68000 processor:
cisco 2500 (68030) processor (revision D) with 8192K/2048K bytes of memory.
Router platforms that have 68000 processors include:
Cisco 1000 Series Routers
Cisco 1600 Series Routers
Cisco 2500 Series Routers
Cisco 4000 Series Routers
Route Processor (RP) Modules on Cisco 7000 (RP) Series Routers
Reduced Instruction Set Computing (RISC) Processors
This is part of a show version output that indicates that the router has a RISC processor:
cisco 3640 (R4700) processor (revision 0x00) with 49152K/16384K bytes of memory.
The R in (R4700) indicates a RISC processor.
Router platforms that have RISC processors include:
Cisco 3600 Series Routers
Cisco 4500 Series Routers
Cisco 4700 Series Routers
Route Switch Processor (RSP) Modules on Cisco 7500 Series and Cisco 7000 (RSP7000) Series Routers
Network Processor Engine (NPE) Modules on Cisco 7200 Series Routers
Multilayer Switch Feature Card (MSFC) on the Cisco 7600 Series Routers or Catalyst 6000 Switch
Performance Routing Engine (PRE) Modules on Cisco 10000 Series Internet Routers
Gigabit Route Processor (GRP) Modules on Cisco 12000 Series Internet Routers
Once you have determined the address and the processor type, you can start with more detailed troubleshooting.
With the address accessed by the router when the bus error occurred, use the show region command to determine the memory location the address corresponds to. If the address reported by the bus error does not fall within the ranges displayed in the show region output, this means that the router tried to access an address that is not valid. This indicates that it is a Cisco IOS Software problem. Use the Output Interpreter Tool (registered customers only) to decode the output of the show stacks command and identify the Cisco IOS Software bug that causes the bus error.
On the other hand, if the address falls within one of the ranges in the show region output, it means that the router accessed a valid memory address, but the hardware corresponding to that address does not respond properly. This indicates a hardware problem.
Here is an example of the show region output:
Router#show region Region Manager: Start End Size(b) Class Media Name 0x00000000 0x007FFFFF 8388608 Local R/W main 0x00001000 0x0001922F 98864 IData R/W main:data 0x00019230 0x000666B3 316548 IBss R/W main:bss 0x000666B4 0x007FEFFF 7965004 Local R/W main:heap 0x007FF000 0x007FFFFF 4096 Local R/W main:flhlog 0x00800000 0x009FFFFF 2097152 Iomem R/W iomem 0x03000000 0x037FFFFF 8388608 Flash R/O flash 0x0304033C 0x037A7D3F 7764484 IText R/O flash:text
Note: In some earlier Cisco IOS Software versions, this command is not available. The show region output is part of the show tech-support output from Cisco IOS Software Release 12.0(9).
Addresses are displayed in hexadecimal format. The addresses that fall within the "Start" and "End" ranges are valid memory addresses.
Main corresponds to main memory or dynamic RAM (DRAM).
iomem corresponds to input/output (I/O) memory, which means different parts for different platforms. For example, DRAM for the Cisco 2500, shared RAM (SRAM) for the Cisco 4000.
Still using the previous example, System restarted by bus error at PC 0x30EE546, address 0xBB4C4, this bus error crash comes from a Cisco 2500 router with the show region output. The address 0xBB4C4 is equivalent to 0x000BB4C4. Using the show region output, this address falls within the range of "main", or more specifically, "main:heap" or 0x000666B4-0x007FEFFF. As mentioned earlier, "main" corresponds to the main memory or the DRAM, so the DRAM chips need to be checked.
If this is a new router, or if the router has been moved from one location to another, the memory chips often become loose. It's a good idea to reseat or firmly push the memory chips into the slot. Most of the time, this is sufficient for solving this type of crash.
For bus error crashes with addresses that do not fall within the show region address ranges, use the Output Interpreter Tool to decode the output of the show stacks command and identify the Cisco IOS Software bug that is causing the bus error. If you are uncertain which bug ID may match or which Cisco IOS software version contains the fix for the problem, upgrading your Cisco IOS software to the latest version in your release train is one option that often resolves the issue since this usually contains the fix for a large number of bugs.
It is recommended that you read the section on Troubleshooting Bus Error Crashes on 68000 Processor Platforms before you proceed with this section.
On RISC processors, Cisco IOS Software uses virtual addresses through the use of the Translation Lookaside Buffer (TLB) that translates virtual addresses into physical addresses. The address reported by bus errors on RISC processors is therefore the virtual address as opposed to the physical address used by the 68000 processors.
The output of the show region command must be used to check the address reported by the bus error. To illustrate this, let's take the following example:
System was restarted by bus error at PC 0x60104864, address 0xC
Using the show region command output below, you can verify that 0xC is not a valid virtual address, and you can conclude that the bus error was caused by a software problem. Use the Output Interpreter Tool (registered customers only) to decode the output of the show stacks or show technical-support (from enable mode) command and identify the Cisco IOS Software bug that is causing the bus error.
Another advantage of using the show region command is that the memory mapping depends on the amount of memory installed on the router. For example, if you have 64 MB of DRAM (64 x 1024 x 1024 = 67108864 bytes = 0x4000000 bytes), the DRAM range is 0x60000000 - 0x63FFFFFF for 64 MB. This is confirmed with the show region command:
Router#show version | i of memory cisco RSP2 (R4700) processor with 65536K/2072K bytes of memory. Router#show region Region Manager: Start End Size(b) Class Media Name 0x40000000 0x40001FFF 8192 Iomem REG qa 0x40002000 0x401FFFFF 2088960 Iomem R/W memd 0x48000000 0x48001FFF 8192 Iomem REG QA:writethru 0x50002000 0x501FFFFF 2088960 Iomem R/W memd:(memd_bitswap) 0x58002000 0x581FFFFF 2088960 Iomem R/W memd:(memd_uncached) 0x60000000 0x63FFFFFF 67108864 Local R/W main 0x60010908 0x60C80B11 13042186 IText R/O main:text 0x60C82000 0x60F5AF1F 2985760 IData R/W main:data 0x60F5AF20 0x610E35FF 1607392 IBss R/W main:BSS 0x610E3600 0x611035FF 131072 Local R/W main:fastheap 0x61103600 0x63FFFFFF 49269248 Local R/W main:heap 0x80000000 0x83FFFFFF 67108864 Local R/W main:(main_k0) 0x88000000 0x88001FFF 8192 Iomem REG QA_k0 0x88002000 0x881FFFFF 2088960 Iomem R/W memd:(memd_k0) 0xA0000000 0xA3FFFFFF 67108864 Local R/W main:(main_k1) 0xA8000000 0xA8001FFF 8192 Iomem REG QA_k1 0xA8002000 0xA81FFFFF 2088960 Iomem R/W memd:(memd_k1)
If you have a bus error at 0x65FFFFFF, the show region output takes the amount of memory into account and tells you that it's an illegal address (software bug).
Use the show region command to verify whether the address indicated by the bus error is within the address ranges used by the router.
If the address falls within a virtual address range, replace the hardware corresponding to this range.
If the address does not fall within a virtual address range, use the Output Interpreter Tool (registered customers only) to decode the output of the show stacks or the show technical-support (from enable mode) command and identify the Cisco IOS software bug that is causing the bus error.
Give serious consideration to installing the most recent maintenance release of the Cisco IOS Software train that you are currently running.
A special type of bus error crash is when the crash is caused by a corrupted program counter (PC). The PC value is the location of the instruction which the processor was executing when the bus error occured. When a bus error caused by a corrupted PC occurs, the following message appears on the console:
%ALIGN-1-FATAL: Corrupted program counter pc=0x0, ra=0x601860BC, sp=0x60924540, at=0x60224854
In this case, the PC has jumped to the address 0x0 (probably because of a null pointer), but this is not where the instruction is located. This is a software problem so there is no need to check with the show region command.
On other RISC platforms (Cisco 3600, 4500, and so forth), you get a SegV exception when jumping to an illegal PC, not a bus error.
Another type of bus error crash that occurs from time to time is when the PC value is equal to the address value. For instance:
System returned to ROM by bus error at PC 0x606B34F0, address 0x606B34F0
From the crashinfo file:
Unexpected exception, CPU signal 10, PC = 0x606B34F0 $0 : 00000000, AT : A001A24A, v0 : 00000000, v1 : 00000000 a0 : 00000000, a1 : 429CC394, a2 : 00000000, a3 : 62544344 t0 : 6069F424, t1 : 3400FF00, t2 : FFFFFFFB, t3 : 00000000 t4 : 606B8E68, t5 : 80000000, t6 : AA5C1022, t7 : 62FDE9D4 s0 : 62300000, s1 : 6281A1B8, s2 : 80007E20, s3 : 00000001 s4 : 00000001, s5 : 00000000, s6 : 62310000, s7 : 62544344 t8 : 62FDEA1C, t9 : 0D0D0D0D, k0 : 623079C0, k1 : 00000014 gp : 620B9E20, sp : 61E7E300, s8 : 00000000, ra : 606B8E68 EPC : 606B34F0, ErrorEPC : 606B8E68, SREG : 3400FF02 Cause 00004018 (Code 0x6): Instruction Bus Error exception -Traceback= 606B34F0 606B8E68
Notice the k1 register value is 0x14 (hexadecimal) which is equal to 20 in decimal. This points to a Cache Parity Exception. In this particular case, the parity error is not handled properly and is being masked by a bus error. The router has crashed due to a software bus error in the function handling a Cache Parity Exception.
You should consider this crash as a regular processor memory parity error crash and follow the recommendations given in Processor Memory Parity Errors (PMPEs).
You should also consider upgrading the Cisco IOS software release to a version which has a fix for CSCdv68388 - "Change cache error exception handler to resume not crash" which has been fixed since Cisco IOS Software Release 12.2(10).
This section focuses on general troubleshooting techniques for bus error exception boot loops:
Cisco IOS software loaded does not support installed hardware
Verify that all network cards are supported by the Cisco IOS software. The Software Advisor (registered customers only) gives you the minimum versions of Cisco IOS software needed for hardware. Verify, also, that the bootflash image supports the hardware installed if you have a router that supports a boot image such as the Cisco 7200 or Cisco 7500 series router.
On 2600 and 3600 routers, the router's I/O memory is configurable as a percentage of the main memory. If the I/O memory settings are inappropriate for the installed network modules or WAN interface cards (WICs), the 2600/3600 platform may have trouble booting and may crash with bus errors.
If a software configuration change has recently been made, and the router is in a booting loop, a software bug may be causing this issue.
If the router is not able to boot up, you can bypass the configuration to identify whether that is causing the issue. Follow these steps:
Break into ROMMON by sending the break sequence to the router during the first 60 seconds of boot up.
From ROM Monitor, use the confreg command to change the configuration register to a setting, such as 0x2142, to ignore the router's configuration:
rommon 1 > confreg 0x2142 You must reset or power cycle for new config to take effect rommon 2 > reset
If the router boots without any errors, there is a configuration issue causing the problem. Verify that your configuration is supported in the Cisco IOS software and by the hardware. If it is supported, use the Bug Toolkit (registered customers only) to identify any software bugs that you may be experiencing. Give serious consideration to installing the most recent maintenance release of the Cisco IOS software train that you are currently running.
If you are experiencing a bus error exception booting loop, it may be caused by mis-seated hardware. For lower-end platforms such as the 3600 or 4000 router, reseat the network modules/network processors.
For higher-end platforms such as the 7200 or 7500 routers, reseat the processor, VIP, port adapters, or line card that is reloading due to a bus error exception.
The information contained in the bus error does not help to isolate the hardware. Therefore, it is important to remove and reinsert cards to find the problem hardware. Here are some recommended steps to isolate the problem:
**If the router does not experience the continuous loop after following the troubleshooting steps above, then the problem may have been caused by a mis-seated network module. It is recommended that you monitor the router for 24 hours to be sure that the router continues to function without experiencing the issue again.
|If you still need assistance after following the troubleshooting steps above and want to open a case with Cisco Technical Support, be sure to include the following information for troubleshooting a bus error or bus error exception:|
Note: Do not manually reload or power-cycle the router before collecting the above information unless required to troubleshoot a bus error exception as this can cause important information to be lost that is needed for determining the root cause of the problem.
The Cisco Support Community is a forum for you to ask and answer questions, share suggestions, and collaborate with your peers.
Refer to Cisco Technical Tips Conventions for information on conventions used in this document.