Router Hardware Management Guide for Cisco 8000 Series Routers

PDF

Machine check error notifications

Want to summarize with AI?

Log in

Describes how the router logs machine check errors to the MCE log file and emits syslog messages with error details so that an operator can identify and act on hardware-detected processor faults.


The Machine Check Error (MCE) is a hardware error detection mechanism that

  • identifies hardware failures in CPUs, memory, power, or other critical components

  • triggers system logs in /var/log/mcelog.log to record the event, and

  • initiates corrective actions such as restarting affected line cards, route processors, or the entire router.

Before Release 24.4.1, you must manually check the MCE error logs in the location /var/log/mcelog.log or on the syslog server to determine whether the router reboot was due to a MCE or another issue.

From Release 24.4.1 onwards, the Cisco IOS XR Software logs the error in the MCE log file and notifies you by displaying a syslog message.

This is an example of an MCE that the router displays:

Router:Oct 28 22:37:44.293 UTC: shelfmgr[377]: %PLATFORM-CPA_INTF_SHELFMGR-3-CPU_MCERR : CPU Machine Check Error 
condition reported for node0_RP0_CPU0: corrected DIMM memory error count exceeded threshold: 10 in 24h . Reported at 2024-10-28 22:37:44.00000 UTC
Table 1. Feature history table

Feature Name

Release Information

Feature Description

Machine check error notifications

Release 26.1.1

Introduced in this release on: Centralized Systems (8400 [ASIC: K100])(select variants only*)

* This feature is now supported on Cisco 8404-SYS-D routers.

Machine check error notifications

Release 25.4.1

Introduced in this release on: Fixed Systems (8010 [ASIC: A100])(select variants only*)

*This feature is supported on:

  • 8011-32Y8L2H2FH

  • 8011-12G12X4Y-A/D

Machine check error notifications

Release 25.1.1

Introduced in this release on: Fixed Systems (8700 [ASIC: K100], 8010 [ASIC: A100])(select variants only*)

*This feature is supported on:

  • 8712-MOD-M

  • 8011-4G24Y4H-I

Machine check error notifications

Release 24.4.1

Introduced in this release on: Fixed Systems (8200, 8700); Centralized Systems (8600); Modular Systems (8800 [LC ASIC: Q100, Q200, P100])

You can now identify and resolve MCE-related issues quickly and easily because Cisco IOS XR Software displays a syslog notification for MCE errors, eliminating the need to manually check for them in the MCE log file.

Syslog message information

The syslog message displays this information about the error:

  • Error title - CPA_INTF_SHELFMGR-3-CPU_MCERR

  • Error description - CPU Machine Check Error

  • Error location - RP/0/RP0/CPU0

  • Error type - DIMM memory error

  • Error time - 2024-10-28 22:37:44.00000 UTC

Error detail and recommended action

MCE major errors in a router

These are some of the MCE major errors that occurs in a router:

  • Card power zone error: Displays under voltage or over voltage failure condition on the Line Card (LC) or Fabric Card (FC). During such an error, the system will attempt to recover by power-cycling the LC or FC.

  • Single Event Upset (SEU) error: Displays corrected and uncorrected SEU events that can happen in FPGA devices.

  • Central Processing Unit (CPU) error: Displays all CPU errors.

If these errors occur in a router, you can see the occurrence of these errors using the show alarms command. For more information, refer to Monitoring Alarms and Implementing Alarm Log Correlation section in the System Monitoring Configuration Guide for Cisco 8000 Series Routers.


Restrictions for MCE major errors

From Release 24.2.11, show alarm command output includes only the power zone errors.


View error details in the Cisco Feature Navigator error messages tool

Perform these steps to see error details in the cisco feature navigator error messages tool:

Procedure

1.

Login to Cisco Feature Navigator Error Messages Tool. The cisco feature navigator error messages tool provides these search options:

  • Release - Displays error details based on specific Cisco IOS XR Release.

  • Error - Displays the error details based on the provided error title.

  • Compare - Displays the error details by comparing different Cisco IOS XR Releases.

2.

Click on Error option.

3.

Enter the error title, for example, CPA_INTF_SHELFMGR-3-CPU_MCERR.

4.

Click Submit to view the error details.

The error details contain these sections:

  • Error

  • Severity

  • Limit

  • Format

  • Explanation

  • Recommended action

For more information about error details sections and Cisco Feature Navigator Error Messages Tool, refer to Cisco IOS XR System Error Message Reference Guide.


View error details in the MCE log file

Perform these steps to see error details in the MCE log file:

Procedure

1.

Navigate to MCE log file located at /var/log/mcelog.log.

2.

Open mcelog.log file to view the error details.