Describes how the router logs machine check errors to the MCE log file and emits syslog messages with error details so that an operator can identify and act on hardware-detected processor faults.
The Machine Check Error (MCE) is a hardware error detection mechanism that
-
identifies hardware failures in CPUs, memory, power, or other critical components
-
triggers system logs in /var/log/mcelog.log to record the event, and
-
initiates corrective actions such as restarting affected line cards, route processors, or the entire router.
Before Release 24.4.1, you must manually check the MCE error logs in the location /var/log/mcelog.log or on the syslog server to determine whether the router reboot was due to a MCE or another issue.
From Release 24.4.1 onwards, the Cisco IOS XR Software logs the error in the MCE log file and notifies you by displaying a syslog message.
This is an example of an MCE that the router displays:
Router:Oct 28 22:37:44.293 UTC: shelfmgr[377]: %PLATFORM-CPA_INTF_SHELFMGR-3-CPU_MCERR : CPU Machine Check Error
condition reported for node0_RP0_CPU0: corrected DIMM memory error count exceeded threshold: 10 in 24h . Reported at 2024-10-28 22:37:44.00000 UTC
| Feature Name |
Release Information |
Feature Description |
|---|---|---|
| Machine check error notifications |
Release 26.1.1 |
Introduced in this release on: Centralized Systems (8400 [ASIC: K100])(select variants only*) * This feature is now supported on Cisco 8404-SYS-D routers. |
| Machine check error notifications |
Release 25.4.1 |
Introduced in this release on: Fixed Systems (8010 [ASIC: A100])(select variants only*) *This feature is supported on:
|
| Machine check error notifications |
Release 25.1.1 |
Introduced in this release on: Fixed Systems (8700 [ASIC: K100], 8010 [ASIC: A100])(select variants only*) *This feature is supported on:
|
| Machine check error notifications |
Release 24.4.1 |
Introduced in this release on: Fixed Systems (8200, 8700); Centralized Systems (8600); Modular Systems (8800 [LC ASIC: Q100, Q200, P100]) You can now identify and resolve MCE-related issues quickly and easily because Cisco IOS XR Software displays a syslog notification for MCE errors, eliminating the need to manually check for them in the MCE log file. |
Syslog message information
The syslog message displays this information about the error:
-
Error title - CPA_INTF_SHELFMGR-3-CPU_MCERR
-
Error description - CPU Machine Check Error
-
Error location - RP/0/RP0/CPU0
-
Error type - DIMM memory error
-
Error time - 2024-10-28 22:37:44.00000 UTC
Error detail and recommended action
-
Cisco feature navigator error messages tool - Provides detailed error information and recommended actions. For more information, refer to View error details in the Cisco Feature Navigator error messages tool.
-
MCE log file - Stores all past errors in the MCE log file located at /var/log/mcelog.log. You can determine if the current error has occurred in the past using the MCE log file and troubleshoot accordingly. For more information, refer to View error details in the MCE log file
MCE major errors in a router
These are some of the MCE major errors that occurs in a router:
-
Card power zone error: Displays under voltage or over voltage failure condition on the Line Card (LC) or Fabric Card (FC). During such an error, the system will attempt to recover by power-cycling the LC or FC.
-
Single Event Upset (SEU) error: Displays corrected and uncorrected SEU events that can happen in FPGA devices.
-
Central Processing Unit (CPU) error: Displays all CPU errors.
If these errors occur in a router, you can see the occurrence of these errors using the show alarms command. For more information, refer to Monitoring Alarms and Implementing Alarm Log Correlation section in the System Monitoring Configuration Guide for Cisco 8000 Series Routers.