Guest

Cisco Port Adapters

Troubleshooting SAR Crashes on the PA-A3

Cisco - Troubleshooting SAR Crashes on the PA-A3

Document ID: 10501

Updated: Nov 15, 2007

   Print

Introduction

In rare circumstances, the main processor on the PA-A3 ATM port adapters may crash and print to the console a "crashdump" with output similar to this:

%ATMPA-3-SARCRASH: ATM1/0: SAR1 Chip Crashdump: 
%ATMPA-7-REG00: status 0xF040FF00, cause 0x00008018, epc 0xBFC002EC    
%ATMPA-7-REG01: ccc 0x03E7B620, eepc 0x00000000, apu_status 0x00015010    
%ATMPA-7-REG02: edma_src 0x4B050964, edma_dest 0xA0820968, edma_cntl 0x00280000    
%ATMPA-7-REG03: edma_count 0x060001E0, edma_status 0x00000000, aci_cntrl 0x44400540 
%ATMPA-7-CWREG00: zero 0xBABEBABE, at 0x10000000, v0 0xBFC002EC, v1 0xF040FF00 
%ATMPA-7-CWREG01: a0 0xB8000804, a1 0x08000000, a2 0x00000190, a3 0x10338530    
%ATMPA-7-CWREG02: t0 0x8066B590, t1 0x00015010, t2 0x4B050964, t3 0xA0820968    
%ATMPA-7-CWREG03: t4 0x060001E0, t5 0x00280000, t6 0x00000000, t7 0x44400540    
%ATMPA-7-CWREG04: s0 0xC0000000, s1 0x00008001, s2 0x00000000, s3 0x00000000    
%ATMPA-7-CWREG05: s4 0xB8100000, s5 0x4B01EA44, s6 0x88800000, s7 0x008002F4    
%ATMPA-7-CWREG06: t8 0xF557C400, t9 0xB8000000, k0 0x00000000, k1 0xAB0DE6D4    
%ATMPA-7-CWREG07: gp 0x8080309C, sp 0x8080398C, fp/s8 0xCCCCCCCD, ra 0x80801440    
%ATMPA-7-MISC0: 00 0x00008001, 01 0x00000000, 02 0x00000000, 03 0xB8100000    
%ATMPA-7-MISC1: 04 0x4B01EA44, 05 0x88800000, 06 0x008002F4, 07 0x00000000    
%ATMPA-7-MISC2: 08 0x00000000, 09 0x00000000, 10 0x00000000, 11 0x00000000    
%ATMPA-7-MISC3: 12 0x00000000, 13 0x00000000, 14 0x00000000, 15 0x00000000

This document explains how to troubleshoot segmentation and reassembly (SAR) crashes on the PA-A3.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

This document is not restricted to specific software and hardware versions.

Conventions

For more information on document conventions, refer to the Cisco Technical Tips Conventions.

PA-A3 Architecture

The PA-A3 uses a chip called LSI ATMizer II to provide SAR as well as other key functions. The name of the SAR is displayed in the output of the show controllers atm command.

router# show controller atm 3/0 
Interface ATM3/0 is up 
Hardware is ENHANCED ATM PA - DS3 (45Mbps)    
Lane client mac address is 0030.7b1e.9054 
Framer is PMC PM7345 S/UNI-PDH, SAR is LSI ATMIZER II 
Firmware rev: G119, Framer rev: 1, ATMIZER II rev: 3 

!--- Output suppressed.

The ATMizer microcode (firmware) is an image that provides SAR-specific software instructions. The Versatile Interface Processor (VIP) IOS® on the Cisco 7500 platform and the system IOS on the 7200 platform contain the SAR firmware, which is downloaded to the SAR when it comes out of reset. Use the following commands depending on the platform used to display the currently loaded and running microcode version for your ATM interface:

  • 7200 series - show controller atm (see sample output above)

  • 7500 series - show controller vip slot# tech

The PA-A3 uses two SARs to provide the processing power necessary to handle and transmit simultaneously for a high-speed OC-3 and OC-12 link.

Note: A single SAR is sufficient for a DS-3/E-3, but the PA-A3-T3 also uses two SARs for consistency.

The SAR crashdump indicates which SAR is experiencing the problem.

%ATMPA-3-SARCRASH: ATM1/0: SAR1 Chip Crashdump:   
 SAR0 = receive
 SAR1 = transmit

The PA-A3 is supported in 7xxx router series. The 7200 and 7500 routers use peripheral component interconnect (PCI) buses as a data path between the port adapters and "host" memory. Host memory is the local SRAM on the VIP or the SRAM on the Network Processing Engine (NPE) of the 7200.

This diagram illustrates the architecture of the VIP2 and the location of the PCI buses:

vip_crash1.gif

The SAR provides connectivity to the PCI bus for transfers into packet memory. It also provides SAR functionality for ATM cell processing and a PHY or physical interface to the external wire.

Crash Types

We can classify SAR crashes into several categories based on the cause of the crash. Anytime a non-recoverable error is found, the SAR crashes. These errors can be the result of software or hardware. To determine the cause, consult the cause register that appears in the second line of the crashdump output. The Exception-Code is recorded in bits two through six of the cause register value. Start from the right-most bit as bit zero. For example:

%ATMPA-7-REG00: status 0xF040FF10, cause 0x00004018, epc 0x80802F68
  1. Translate the hexadecimal value of 0x00004018 into binary. Note that each hex value represents four bits. The binary equivalent in our example is 4 = 0100, 0 = 0000, 1 =0001, and 8 = 10000, then 0x00004018 = 0100 0000 0001 1000.

  2. Locate bits two through six by counting from right to left. In our example, bits two through six equate to 00110

  3. Convert these five bits back into hexadecimal. In our example, 00110 converts to 0x06.

  4. Consult the exception code table. In the example, the SAR crashed in response to a secondary bus error exception.

Exception Code Description Likely Cause
0x00 Interrupt Interrupt condition asserted.
0x01 TLB modification exception
0x02 TLB exception (load/fetch)
0x03 TLB exception (store)
0x04 Address error (load/fetch) Unaligned address (software).
0x05 Address error (store) Unaligned address (software).
0x06 Bus error Bus timeout, parity errors, etc. (hardware).
0x07 Reserved
0x08 Syscall Attempt to execute SYSCALL instruction.
0x09 Breakpoint Attempt to execute BREAK instruction.
0x0a Reserved instruction Attempt to execute invalid instruction.
0x0b Coprocessor unusable Attempt to execute on unusable coprocessor.
0x0c Arithmetic overflow
0x0d Trap
0x0e Reserved
0x0f Floating-point Attempt to access non-existant FPU.
0x10-1f Reserved

When any value cause register has bit 15 set to one, the cause of the SAR crash is a PCI abort or parity error due to hardware. Specifically, the cause register will appear in the crashdump as:

cause 0x00008000

Troubleshoot this cause value by replacing the ATM port adapter. If the problem persists, replace the versatile interface processor (VIP) if using a 7500 series router, or the network processing engine (NPE) / network services engine (NSE) if using a 7200 or 7400 series router.

Known Issues

Cisco Bug ID CSCdr09895 prevents repeated crashdumps from being printed to the console since only the first crashdump is relevant to troubleshooting. The following Bug IDs resolve rare conditions that cause SAR crashes. Please use the Bug Toolkit (registered customers only) to determine if your Cisco IOS Software release is affected by these Bug IDs.

Cisco Bug ID Explanation
CSCdp62791 Prevents SAR1 crashes by not sending packets to the SAR on an unconfigured VC or sending packets to the SAR with invalid encapsulation.

Note: Although it reports a different symptom, CSCdp01166 dupes to and is fixed via CSCdp62791.

CSCdp42529 Prevents SAR1 crashes caused by receiving a cell on a VPI/VCI pair that does not exist on the transmit SAR. This problem can occur when a large number of SVCs are being created or torn down, so the transmit SAR misses a command interrupt from the host CPU on the VIP or NPE. When this occurs, a VC is defined on the receive SAR only, and the transmit SAR crashes if an OAM loopback or resource management cell is received on the undefined VPI/VCI pair.
CSCdr09895 Prevents SAR0 crashes that occur when, with heavy transit traffic, the SAR tries to access an illegal address in secondary memory, where packets waiting for segmentation and reassembly are stored. This condition is called a bus error.
CSCdp64588 Prevents a router crash due to repeated SAR (0 or 1) crashes. When the SAR crashes, the PCI host-driver (which provides an interface between the PA-A3 and the PCI bus in the router) tries to start the PA again. If there are repeated SAR crashes and the PA does not respond to the host-driver, the host-driver tries to shutdown (power-down) the PA, and the PA is switched off. In some cases, if the SAR has crashed and the host-driver has already cleared the memory related to this PA, the router crashes due to bus-error.

Troubleshooting

The following points summarize how to troubleshoot SAR crashes on a PA-A3 ATM port adapter:

  • Determine whether SAR0 (receive SAR) or SAR1 (transmit SAR) crashed. The first line of the crashdump will tell you.

    %ATMPA-3-SARCRASH: ATM1/0: SAR1 Chip Crashdump
  • Use the exception code table to decode the cause register value in the crashdump output.

  • If the cause register is a value with bit 15 set to one, replace the hardware.

  • If the cause register is any other value, collect the following information for Cisco Technical Support:

    • crashdump output

    • show controller atm (7200 series)

    • show controller vip slot# tech (7500 series)

    • show tech-support

  • Give serious consideration to installing the most recent maintenance release of the Cisco IOS Software train that you are currently running.

Related Information

Updated: Nov 15, 2007
Document ID: 10501