Guest

Cisco 7200 Series Routers

Processor Memory Parity Errors (PMPEs)

Document ID: 6345

Updated: Jan 31, 2006

   Print

Introduction

This document explains what causes parity errors on Cisco routers, and how to troubleshoot them.

Prerequisites

Requirements

Cisco recommends that you have knowledge of how to troubleshoot router crashes.

Refer to Troubleshooting Router Crashes for more information.

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

Identify a Parity Error

Memory parity errors occur in MultiChannel Interface Processor (MIPS)-based processor products such as:

  • Cisco 4500/4700 Series Routers

  • Cisco 7500 Series Routers (RSP1, RSP2, RSP4, RSP8, VIP2-10, VIP2-15, VIP2-20, VIP2-40, VIP2-50)

  • Cisco 7000 Series Routers (RSP 7000)

  • Cisco 7200 Series Routers (NPE-100, NPE-150, NPE-175, NPE-200, NPE-225, NPE-300)

  • Cisco 12000 Series Internet Router

Here are some messages, which are all related to the detection of bad parity somewhere in the system (the list is not exhaustive, but contains the most common messages):

  • In the show version command output:

    System restarted by processor memory parity error at PC 0x6014F7C0,
     address 0x0

    or

    System restarted by shared memory parity error at PC 0x60130F40

    If you have the output of a show version command from your Cisco device, you can use to display potential issues and fixes. In order to use , you must be a registered customer, be logged in, and have JavaScript enabled.

  • In the console logs, or in the crashinfo files:

    -  *** Cache Error Exception ***
       Cache Err Reg = 0xa401a65a
       data reference, primary cache, data field error , error on SysAD Bus
       PC = 0xbfc17950, Cause = 0x0, Status Reg = 0x3040d007
    
    
    -  Error: primary data cache, fields: data,
       virtual addr 0x6058A000, physical addr(21:3) 0x18A000, vAddr(14:12) 0x2000
       virtual address corresponds to main:data, cache word 0
                                      
                    Low Data   High Data  Par    Low Data   High Data  Par
       L1 Data  : 0:0xFEFFFEFE 0x65776179 0x13 1:0x20536572 0x76657220 0x89
                  2:0x646F6573 0x206E6F74 0x9C 3:0x20737570 0x706F7274 0xF8  
                                            
                    Low Data   High Data  Par    Low Data   High Data  Par
       Mem Data : 0:0xFEFFFEFE 0x65776179 0x13 1:0x20536572 0x76657220 0x89
                  2:0x646F6573 0x206E6F74 0x9C 3:0x20737570 0x706F7274 0xF8
    
    
    -  *** Shared Memory Parity Error ***
       shared memory control register= 0xffe3
       error(s) reported for: CPU on byte(s): 0/1
    
    -  %PAR-1-FATAL: Shared memory parity error
       shared memory status register= 0xFFEF
       error(s) reported for: CPU on byte(s): 0/1 2/3
    
    
    -  %RSP-3-ERROR: MD error 0000008000000200
       %RSP-3-ERROR: QA parity error (bytes 0:3) 02
       %RSP-3-ERROR: MEMD parity error condition
       %RSP-2-QAERROR: reused or zero link error, write at addr 0100 (QA) 
           log 22010000, data 00000000 00000000
       %RSP-3-RESTART: cbus complex
    
    
    -  %RSP-3-ERROR: CyBus error 01
       %RSP-3-ERROR: read data parity
       %RSP-3-ERROR: read parity error (bytes 0:7) 20
       %RSP-3-ERROR: physical address (bits 20:15) 000000
    
    -  %RSP-3-ERROR: MD error 00800080C000C000
       %RSP-3-ERROR: SRAM parity error (bytes 0:7) F0
       %RSP-3-RESTART: cbus complex

Soft Versus Hard Parity Errors

There are two kinds of parity errors:

  • Soft parity errors

    These errors occur when an energy level within the chip (for example, a one or a zero) changes. When referenced by the CPU, such errors cause the system to either crash (if the error is in an area that is not recoverable) or they recover other systems (for example, a CyBus complex restarts if the error was in the packet memory (MEMD)). In case of a soft parity error, there is no need to swap the board or any of the components. See the Related Information

Related Cisco Support Community Discussions section for additional information about soft parity errors.

  • Hard parity errors

    These errors occur when there is a chip or board failure that corrupts data. In this case, you need to re-seat or replace the affected component, which usually involves a memory chip swap or a board swap. There is a hard parity error when multiple parity errors occur at the same address. There are more complicated cases that are harder to identify. In general, if you see more than one parity error in a particular memory region in a relatively short period, you can consider it to be a hard parity error.

  • Studies have shown that soft parity errors are 10 to 100 times more frequent than hard parity errors. Therefore, Cisco highly recommends you to wait for a second parity error before you replace anything. This greatly reduces the impact on your network.

    Isolate the Problem

    A router has memory in different locations. Theoretically, any memory location can be affected by the parity error, but most memory problems occur in dynamic RAM (DRAM) or shared RAM (SRAM). Based on the platform, here is how you can find out what memory location has been affected, and, if it turns out to be a hard parity error, what part you must replace:

    Cisco 4500 and 4700 Platforms

    On the Cisco 4500 and 4700 platforms, the crashinfo file is not available in versions earlier than Cisco IOS® Software Release 12.2(10) and 12.2(10)T.

    One way to find out where the error occurred is to look at the "restart reason" in the console logs, and in the output of the show version command:

    • Parity Error in DRAM:

      If you did not manually reload the router after the crash, the show version output looks like this:

      System restarted by processor memory parity error at PC 0x601799C4,
       address 0x0 
      System image file is "flash:c4500-inr-mz.111-14.bin", booted via flash

      If a crashinfo file is available, or if console logs have been captured, you can also see something like this:

      *** Cache Error Exception *** 
       Cache Err Reg = 0xa0255c61 
       data reference, primary cache, data field error , error on SysAD Bus 
       PC = 0xbfc0edc0, Cause = 0xb800, Status Reg = 0x34408007

      Repeated occurrence of parity errors in DRAM indicates that either the DRAM or the chassis is defective. If you recently removed the chassis, or if you performed any hardware configuration changes, re-seat the DRAM chips to solve the problem. Otherwise, replace the DRAM as a first step. This must prevent the parity errors. If the router still crashes, replace the chassis.

    • Parity Error in SRAM:

      If you did not manually reload the router after the crash, the show version command output looks like this:

      System  restarted by shared memory parity error at PC 0x60130F40 
      System image file is "flash:c4500-inr-mz.111-14.bin", booted via flash

      If a crashinfo file is available, or if console logs have been captured, you can also see something like this:

      *** Shared Memory Parity Error *** 
      shared memory control register= 0xffe3 
      error(s) reported for: CPU on byte(s): 0/1

      or

      %PAR-1-FATAL: Shared memory parity error 
      shared memory status register= 0xFFEF 
      error(s) reported for: CPU on byte(s): 0/1 2/3

      or

      *** Shared Memory Parity Error *** 
      shared memory control register= 0xffdf 
      error(s) reported for: NIM1 on byte(s): 0/1  2/3

    Note:

    • If the error is reported for the CPU, replace the SRAM.

    • If the error is reported for NIM(x), replace the network module in slot (x). The SRAM allocated to slot (x) can also be affected. In this case, replace the SRAM.

      Repeated parity errors in SRAM indicate either defective SRAM chips, or a defective network module that has written bad parity in the SRAM. If you removed the chassis recently, or if you made any hardware configuration changes, re-seat the network modules and the SRAM chips to solve the problem. Otherwise, check where the error is reported in the console logs (see the output example above).

    Route/Switch Processor (RSP), Network Processing Engine (NPE), and Route Processor (RP) Platforms

    As with the Cisco 4000 series, the problem can be due to faulty DRAM or SRAM for these platforms. The problem can also be because of a defective processor card (RP, RSP or NPE). The Cisco 7000 and 7500 can also report parity errors generated by a faulty or badly seated Interface Processor (legacy xIP or VIP).

    Check the crashinfo file and the console logs for one of these error messages:

    Parity Error in DRAM or SRAM (MEMD)

    For the RP, RSP and NPE, you usually see something like this:

    Error: primary data cache, fields: data, (SysAD) 
    virtual addr 0x6058A000, physical addr(21:3) 0x18A000, vAddr(14:12) 0x2000 
    virtual address corresponds to main:data, cache word 0

    or simply:

    Error: primary data cache, fields: data, SysAD
    phy21:3 0x201880, va14:12 0x1000, addr 63E01880
    

    This indicates a problem on the RSP itself. If the problem only occurs once, it is most probably a transient issue.

    Parity Error Pulled from SRAM

    For the RSP, the message can look like this:

    %RSP-3-ERROR: MD error 0000008000000200 
    %RSP-3-ERROR: QA parity error (bytes 0:3) 02 
    %RSP-3-ERROR: MEMD parity error condition 
    %RSP-2-QAERROR: reused or zero link error, write at addr 0100 (QA) 
        log 22010000, data 00000000 00000000 
    %RSP-3-RESTART: cbus complex

    or

    %RSP-3-ERROR: CyBus error 01 
    %RSP-3-ERROR: read data parity 
    %RSP-3-ERROR: read parity error (bytes 0:7) 20 
    %RSP-3-ERROR: physical address (bits 20:15) 000000

    If there is no indication of another interface processor that writes bad parity into the SRAM (for example, VIP2-1-MSG error messages), the most likely reason for the parity error is the SRAM itself. In this case, replace the RSP.

    If other error messages indicate that an interface processor writes bad parity, it can be a faulty or badly-seated card.

    Versatile Interface Processor

    If you receive %VIP2-1-MSG: slot(x) messages in the logs or in the crashinfo file, refer to Troubleshooting VIP Crashes.

    Recommended Actions

    At the first occurrence of a parity error, it is not possible to distinguish between a soft or hard parity error. From experience, most parity occurrences are soft parity errors, and you can usually dismiss them. If you have recently changed some hardware or have moved the box, try to re-seat the affected part (DRAM, SRAM, NPE, RP, RSP, or VIP). Frequent multiple parity occurrences signify faulty hardware. Replace the affected part (DRAM, RSP, VIP, or motherboard) with the help of the instructions mentioned in this document.

    Information to Collect if You Open a TAC Service Request

    If you still need assistance after you follow the troubleshooting steps above and want to open a service request with the Cisco TAC, be sure to include this information:
    • Troubleshooting performed before you opened the service request.
    • show technical-support command output (in enable mode if possible).
    • show log command output or console captures if available.
    • crashinfo file (if it is present, and not already included in the show technical-support command output. If multiple crashinfo files exist, include all of them).
    • Number of reloads due to processor memory parity errors that you have seen and when they have occurred.
    Please attach the collected data to your case in non-zipped, plain text format (.txt). In order to attach information to your service request, upload it through the TAC Service Request Tool (registered customers only) . If you cannot access the Service Request Tool, attach the relevant information to your service request, and send it to attach@cisco.com with your service request number in the subject line of your message.

    Note: Do not manually reload or power-cycle the router before you collect the above information unless required to troubleshoot a processor memory parity error, because this can cause important information to be lost that is needed to determine the root cause of the problem.

    Related Information

    Updated: Jan 31, 2006
    Document ID: 6345