The information in this document is based on these software and hardware versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
In order to interpret a VIP crash, it is important to first understand the basic architecture of the VIP. The figure in this section shows the functional block diagram of the VIP2, which includes these components:
Orion reduced instruction set computing (RISC) CPU and associated circuitry, which includes Dynamic RAM (DRAM), L2 cache, RENO application-specific integrated circuit (ASIC), and Boot ROM.
CyBus ASIC—The component that controls and transfers packets between the VIP2 Static RAM (SRAM) and the system packet memory (MEMD) across the CyBus or CxBus.
Packet memory ASIC—Responsible for moving packets between port adapters and SRAM.
Peripheral Component Interconnect (PCI) buses—Data paths between the port adapters and VIP2 SRAM.
Bridges—Responsible for isolating the individual PCI buses of the port adapters.
The VIP2 microcode (firmware) is an image that provides card-specific software instructions. A programmable read-only memory (PROM) device on the VIP2 contains a default microcode boot image that assists the system in finding and loading the microcode image from the Cisco IOS® software bundle or from Flash memory. The microcode boot image in the PROM initializes the VIP2, and then assists downloading the VIP2 microcode image. All interfaces of the same type load the same microcode image, either from the Cisco IOS software bundle or from Flash memory. Although Flash memory can store multiple microcode versions for a specific interface type, only one image can load at startup.
The show controllers cbus command displays the currently loaded and running microcode version for each interface processor and the VIP2. The show startup-config command shows the current system instructions for loading microcode at startup.
When you troubleshoot, you can use the figure in this section as a guide to read information from a VIP crashinfo file or the syslog. As an example, look at this syslog output that shows bad parity is found when read from the VIP SRAM:
From this output, you can see that this VIP is a VIP2-50.
The difference between a VIP2-10, VIP2-15, VIP2-20, and a VIP2-40 is the amount of DRAM and SRAM on each. The various VIP2s (if they have not been upgraded) can be distinguished in the show diag command output by the memory configurations shown in this table:
The information contained in the crashinfo file can prove to be invaluable when you try to resolve software issues or attempt to diagnose the underlying cause of system crashes. Not only does the crashinfo file contain logging information and a stack trace for the VIP, it also contains extensive memory and context information. Each time a VIP crashes, the VIP attempts to write a crashinfo file to the bootflash of the RSP. Crashinfo files are stored in this format:
You can issue the dir command in order to locate VIP crashinfo files as shown here:
Directory of bootflash:/
1 -rw- 3951876 Jan 01 2000 00:01:22 rsp-boot-mz.111-22.CA
2 -rw- 162641 Jun 21 2000 12:53:40 crashinfo_vip_0_20000621-125340
3 -rw- 162778 Jun 21 2000 13:00:10 crashinfo_vip_0_20000621-130010
7602176 bytes total (3324492 bytes free)
This bootflash of the router contains two VIP crashinfo files. Issue the show file or more commands in order to view and capture the contents of these files in accordance to this procedure:
Start logging with your terminal program.
Issue the term length 0 command.
Issue the more bootflash:<crashinfo filename> command.
VIP crashes are classified into several categories based on the cause of the crash. Anytime a non-recoverable error is found, the VIP crashes. These errors can be the result of parity errors, software or hardware that cause a negative acknowledgement message (NACK) to be present on the CyBus, or software problems. This section provides information on each of these error types.
Parity errors occur on a VIP when the hardware attempts to check the validity of data by comparing computed parity values to previous parity values for the same data. A single bit flip in the data can result in a parity error. When you diagnose parity errors on a VIP, it is important to understand each location at which parity is checked and at which parity errors can potentially occur. This diagram outlines this information. In addition, refer to Cisco 7500 VIP Fault Tree Analysis for more information about parity errors.
As shown in this diagram, there are seven different types of parity errors that can occur on a VIP. Note that errors can be received from another source and might not have originated within the VIP itself. The source of the parity error can be from the Route/Switch Processor (RSP), another VIP, or from poorly seated or faulty port adapters. In order to properly understand a VIP crash, it is important to diagnose the source of the crash.
It is also important to understand that data with bad parity can be reported by several of the parity checking devices on the VIP and Cisco 7500 Series Router for any single read or write operation. For example, if the VIP reads a packet on a transmit queue on the RSP into its own SRAM, and there is a parity error in the SRAM of the RSP, then you see error messages from the MD ASIC on the RSP, the CYA ASIC on the VIP, and also the PCI/packet memory ASIC on the VIP.
This diagram shows fault-tree analysis for VIP crashes:
VIP4 and VIP6 Parity Errors and ECC Detection
The VIP4-50, VIP4-80, and VIP6-80 use Single Bit Error Correction and Double Bit Error Detection Error Code Correction (ECC) for CPU Memory and Packet Memory. Both are Synchronous Dynamic RAM (SDRAM). A single bit error in SDRAM is corrected and the system continues to operate normally.
Multibit parity errors at numbers 2 or 3 in this table are a fatal event which cause ECC multibit errors to occur. The CPU internal cache and buses in the system use single bit parity detection. As shown here, the architecture of the VIP4 and VIP6 are different from the VIP2. Therefore, some error messages are not seen and other error messages are reported differently than they are on the VIP2. In this parity error section, differences between the VIP2, the VIP4, and the VIP6 are denoted and explained.
Cache parity error exceptions occur when bad parity is discovered in the CPU or in the primary data cache. The parity error might have occurred in the VIP DRAM, the DRAM controller, the primary cache, or in the CPU itself. Parity errors discovered in this location are also referred to as processor memory parity errors (PMPEs). These errors result in an immediate crash of the VIP and the output looks similar on both VIPs and RSPs. A sig value of twenty (sig=20) indicates that a cache parity error exception has occurred. The sig value is displayed in the system log messages for the crash.
Recent code also provides a meaningful verbose line as shown here:
Oct 21 00:11:14.913: %VIP2-1-MSG: slot0 System reloaded by a Cache Parity Exception
Oct 21 00:11:14.913: %VIP2-1-MSG: slot0 System Reload called from 0x60125C8C,
Oct 21 00:11:14.913: %VIP2-1-MSG: slot0 System exception: sig=20, code=0x20025B69,
Information contained in the VIP crashinfo file also points to the same parity error location in the primary data cache:
Error: primary data cache, fields: data,
virtual addr 0x6058A000, physical addr(21:3) 0x18A000, vAddr(14:12) 0x2000
virtual address corresponds to main:data, cache word 0
Low Data High Data Par Low Data High Data Par
L1 Data : 0:0xFEFFFEFE 0x65776179 0x13 1:0x20536572 0x76657220 0x89
2:0x646F6573 0x206E6F74 0x9C 3:0x20737570 0x706F7274 0xF8
Low Data High Data Par Low Data High Data Par
Mem Data: 0:0xFEFFFEFE 0x65776179 0x13 1:0x20536572 0x76657220 0x89
2:0x646F6573 0x206E6F74 0x9C 3:0x20737570 0x706F7274 0xF8
A primary cache or PMPE can be a transient error. If this is the first instance of a PMPE, you can usually safely ignore it. However, if the same VIP experiences a second or subsequent PMPEs, you should replace the VIP. Sometimes the replacement of the DRAM itself can also resolve the issue.
VIP4 and VIP6 Note—Parity errors which occur in the CPU internal cache and in the CyAsic are detected as cache parity error exceptions. Single bit parity errors in the CPU memory are corrected and no action needs to be taken. Multi-bit parity errors in the CPU memory are detected as a procmem ecc multi-bit parity error. The CPU memory in the VIP should be replaced if a procmem ecc multi-bit parity error is reported.
Oct 25 09:30:54.708: %VIP4-50 RM5271-1-MSG: slot4 PMA error register1 00000000
Oct 25 09:30:54.716: %VIP4-50 RM5271-1-MSG: slot4 Procmem ECC multi-bit error
Oct 25 09:30:54.724: %VIP4-50 RM5271-1-MSG: slot4 PCI1 master address 00000000
Oct 25 09:30:54.732: %VIP4-50 RM5271-1-MSG: slot4 PCI1 slave address 00000000
Oct 25 09:30:54.740: %VIP4-50 RM5271-1-MSG: slot4 Latched Addresses
Oct 25 09:30:54.748: %VIP4-50 RM5271-1-MSG: slot4 Procmem ECC multi-bit exception
addr 22220000 025F0860
Oct 25 09:30:54.756: %VIP4-50 RM5271-1-MSG: slot4 Procmem ECC multi-bit exception
data 00000000 00000000
Oct 25 09:30:54.764: %VIP4-50 RM5271-1-MSG: slot4 MPU addr exception/WPE address
Oct 25 09:30:54.772: %VIP4-50 RM5271-1-MSG: slot4 MPU WPE addr/WPE data 00000000
Oct 25 09:30:54.780: %VIP4-50 RM5271-1-MSG: slot4 ProcMem addr exception 0 0000000
Oct 25 09:30:54.788: %VIP4-50 RM5271-1-MSG: slot4 Pakmem addr exception 00000000
Oct 25 09:31:15.824: %VIP4-50 RM5271-1-MSG: slot4 System reloaded by a fatal
Oct 25 09:31:15.836: %VIP4-50 RM5271-1-MSG: slot4 caller=0x600BCE18
Oct 25 09:31:15.844: %VIP4-50 RM5271-1-MSG: slot4 System exception: sig22,
code 0x0, context=0x60615F28
Parity Error from CyBus
When a VIP downloads from the MEMD in the RSP and these errors are seen, this usually indicates that another VIP has written bad parity to the MEMD, or the MEMD has been corrupted. If the source is from the MEMD and it continues, you need to replace the RSP. Conversely, if the source of the bad parity is another VIP, you should reseat and, if necessary, replace the VIP that writes the bad parity.
%VIP2-1-MSG: slot1 Nevada Error Interrupt Register 0x3
%VIP2-1-MSG: slot1 CYASIC Error Interrupt register 0x2020000C
%VIP2-1-MSG: slot1 Parity Error internal to CYA
%VIP2-1-MSG: slot1 Parity Error in data from CyBus
!--- Bad parity is received by the VIP from the CyBus.
%VIP2-1-MSG: slot1 CYASIC Other Interrupt register 0x200100
%VIP2-1-MSG: slot1 QE HIGH Priority Interrupt
%VIP2-1-MSG: slot1 CYBUS Error register 0xD001A02, PKT Bus Error register 0x0
%VIP2-1-MSG: slot1 PMA error register = 0070000440000000
%VIP2-1-MSG: slot1 Packet Bus Write Parity error
!--- The bad parity that was received from the CyBus is written to SRAM.
%VIP2-1-MSG: slot1 PCI master address = 0700004
%VIP2-1-MSG: slot1 PA Bay 0 Upstream PCI-PCI Bridge, Handle=0
%VIP2-1-MSG: slot1 DEC21050 bridge chip, config=0x0
%VIP2-1-MSG: slot1 (0x00): cfid = 0x00011011
%VIP2-1-MSG: slot1 (0x04): cfcs= 0x02800147
%VIP2-1-MSG: slot1 (0x08):cfccid = 0x06040002
%VIP2-1-MSG: slot1 (0x0C):cfpmlt = 0x00010000
%VIP2-1-MSG: slot1 (0x18): cfsmlt = 0x00010100
%VIP2-1-MSG: slot1 (0x1C): cfsis = 0x22807020
%VIP2-1-MSG: slot1 Received Master Abort on secondary bus
%VIP2-1-MSG: slot1 (0x20): cfmla = 0x01F00000
Note: The VIP4 and VIP6 show the same error messages about the CyBus parity error, but the packet bus write parity error message is not displayed.
VIP I/O Controller and Reno Read Parity Error
Both DRAM controller parity errors and Input/Output (I/O) controller parity errors are detected by the RENO ASIC. A parity error that originates in DRAM or in the DRAM controller is reported as a cache parity exception. A parity error detected by the I/O controller is reported, as shown in this output. Often, parity errors reported by the I/O controller have originated elsewhere and are reported by the I/O controller in addition to messages from other locations.
SRAM parity errors can also be transient, so treat the first occurrence the same way as DRAM parity errors. If the errors persist, replace the SRAM or the VIP.
VIP4 and VIP6 Note—Single bit parity errors in the packet memory are corrected. Multi-bit parity errors in the packet memory are detected as a pakmem ecc multi-bit parity error. The VIP packet memory should be replaced if a pakmem ecc multi-bit parity error is reported.
The PMA ASIC reports a packet bus write parity error any time a parity error is being written to packet memory. In this example, the VIP is only the messenger and the problem does not exist with the memory of this VIP.
May 10 09:22:14.520: %VIP2-1-MSG: slot11 PMA error register = 2080002800800200
May 10 09:22:15.520: %VIP2-1-MSG: slot11 Packet Bus Write Parity error
Note: The VIP4 and VIP6 do not show this error message.
VIP PCI Bus Parity Error
Parity errors can be detected in PCI buses 1 and 2, both of which directly interface with the port adapters. These buses are bridged together by a third PCI bus, bus 0, on which parity errors can also be detected. Parity errors that originate from any of the PCI buses are most commonly caused by poorly seated or faulty port adapters. Any time you see these messages in the syslog output of a VIP crash, you need to reseat the port adapter in order to resolve the issue.
PCI bus <num> parity error
PCI bus <num> system error
Detected Parity Error on secondary bus
If reseating the port adapter does not solve the issue, the problem lies with either the port adapter or the VIP. Move the port adapter to another bay and insert a second port adapter into the original bay in order to troubleshoot. This usually points to the offending hardware. An example is shown here:
Mar 16 19:34:54: %GEIP-1-MSG: slot9 Nevada Error Interrupt Register = 0x6
Mar 16 19:34:54: %GEIP-1-MSG: slot9 PCI bus 0 system error
Mar 16 19:34:54: %GEIP-1-MSG: slot9 PMA error register = 0080043800100000
Mar 16 19:34:54: %GEIP-1-MSG: slot9 PCI IRDY time-out
Mar 16 19:34:54: %GEIP-1-MSG: slot9 PCI master address = 0800438
Mar 16 19:34:54: %GEIP-1-MSG: slot9 PA Bay 0 Upstream PCI-PCI Bridge, Handle=0
Note: The same errors occur with the VIP4 and the VIP6, but the error message is different. It is detected as a PCI master parity error and a PCI slave parity error. Perform the same steps as outlined for VIP PCI Bus Parity Errors to troubleshoot this problem.
When the VIP tries to write to an invalid address in MEMD, the RSP places a NACK on the CyBus for that slot. This is usually a software problem, but can also be a hardware issue. For example, in this output, the VIP writes 4 bytes to an invalid address, so the RSP places a NACK on the CyBus for that slot.
VIP crashes not caused by any of the reasons in this document are most commonly due to other software issues. These crashes can be manifested in a variety of different ways. These are general suggestions to reduce the risk of VIP crashes due to software problems and to cope with them if they occur:
Always make sure that the Cisco IOS software image supports the VIP.
Always keep the RSP-BOOT image and the main Cisco IOS software image at the same version.
Ensure that the VIP configuration and the port adapter are supported by the current version of Cisco IOS software.
Check the release notes for the correct Cisco IOS software level and memory requirements.
This is an example of a system log output of a VIP crash due to a software problem:
Possibly the most important piece of information to obtain in the event of a software problem is the crashinfo file for the VIP. See the Obtain a VIP Crashinfo File section for instructions to capture this information.
The VIP crashes many times and when you review the crash info file, you might see this message:
00:00:11: %LINK-3-UPDOWN: Interface POS1/0, changed state to up
IOBUS Error Interrupt Status register 0x0
Unexpected exception, CPU signal 10, PC = 0x602A7660
-Traceback= 602A7660 602AB238
The CPU signal 10 error message means bus exception error. Bus errors can be either software or hardware issues. The workaround for this problem is to reseat the module and monitor the router. If the module keeps crashing after you reseat the module, contact TAC Case Open tool (registered customers only) with the crash info file.
It is a good idea to create a VIP crash summary file with this information before you open a case. Include this information in the Problem Description field of the TAC Case Open tool (registered customers only) .
Attach the collected data to your case in non-zipped, plain text format (.txt). You can attach information by uploading it using the Case Query tool (registered customers only) . If you cannot access the Case Query tool, you can attach the relevant information to your case by sending it to email@example.com with your case number in the subject line of your message.
Note: If possible, do not manually reload or power-cycle the router before you collect this information as this can cause important information to be lost that is needed to determine the root cause of the problem.