THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Affected Product Name | Description | Comments |
---|---|---|
A99-RP3-SE | ASR 9900 Route Processor 3 for Service Edge | |
A99-RP3-TR | ASR 9900 Route Processor 3 for Packet Transport | |
A9K-RSP5-SE | ASR 9000 Route Switch Processor 5 for Service Edge | |
A9K-RSP5-TR | ASR 9000 Route Switch Processor 5 for Packet Transport | |
A9K-RSP880-LT-SE | ^ASR 9000 Route Switch Processor 880-LT for Service Edge | |
A9K-RSP880-LT-TR | ^ASR 9000 Route Switch Processor 880-LT for Packet Trans | |
ASR-9901 | ASR 9901 Compact Chassis, 2RU |
Defect ID | Headline |
CSCwc26739 | ASR9K- Some DIMMs failing at higher than expected rate |
A limited number of DIMM (Dual In-line Memory Modules) and DRAM (Dynamic Random Access Memory) components in Cisco ASR 9000 Series Aggregation Routers that were shipped from Cisco are impacted by an issue in the manufacturing process of the memory supplier.
This issue may lead to a DIMM module not being recognized during the bootup process and/or lead to hardware failure following a reboot, although the expected failure rate is low.
In this case, a manufacturing deviation in specific DIMM and DRAM components was contained to a specific date range. Since the discovery of this issue, additional limits have been imposed on the manufacturing process to help prevent future components from experiencing this issue.
The impacted Cisco ASR 9000 Series Aggregation Routers may reload or fail to boot. The following failure signatures may also be seen on the routers that reboot.
The following syslog indicates that the line card failed to boot successfully:
LC/0/0/CPU0:Aug 10 13:34:34.277 UTC: l2fib[177]: %OS-SHMWIN-2-ERROR_ENCOUNTERED : SHMWIN: Error encountered: System memory state is severe, please check the availability of the system memory
The following machine check errors may be shown on the kernel logs:
Aug 30 12:43:26 host kernel: [98198866.920521] mce_notify_irq: 1 callbacks suppressed
Aug 30 12:43:26 host kernel: [98198866.920530] mce: [Hardware Error]: Machine check events logged
Aug 30 12:43:26 host kernel: [98198866.920831] mce: [Hardware Error]: Machine check events logged
Aug 30 12:43:26 host kernel: [98198866.925193] CMCI storm detected: switching to poll mode
The following snippet is from the show tech os, which is captured from sysadmin login that indicates memory error:
-----------------------------snip log--------------------------------
mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 9
MISC 228aa040101086 ADDR ff007f40
TIME 1643221467 Wed Jan 26 18:24:27 2022
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000003110a MCGSTATUS 0
MCGCAP 7000c14 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 85
Hardware event. This is not a software error.
-------------------------------end log-------------------------------
The following snippet shows memory failures or MCE failure signatures:
MCE Logs
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
M2M: MscodDataRdErr
STATUS 8c00004001010090 MCGSTATUS 0
MCGCAP 7000c14 APICID a SOCKETID 0
CPUID Vendor Intel Family 6 Model 85
Hardware event. This is not a software error.
Cisco recommends replacement of affected RPs or Route-Switch Processors (RSPs) on the Cisco ASR 9000 Series Aggregation Services Routers that exhibit the failure symptoms mentioned in the field notice and match the impacted serial number. Serial numbers can be identified using the serial number validation section of this field notice.
To determine if a serial number is affected, see the Serial Number Validation section of this field notice.
Collect show tech os, which is captured from sysadmin login.
To check the number of DIMMs recognized, connect to the host operating system of the card on the suspected Cisco ASR 9000 Series Aggregation Services Routers and run the dmidecode -t 17 command. This following example shows log output that results from checking this on the active Route Processor (RP):
RP/0/RSP1/CPU0:ASR9910-1-xrg-403#admin
sysadmin-vm:0_RSP0# run chvrf 0 ssh my_host
Thu May 22 13:47:00.719 UTC+00:00
Last login: Thu May 22 13:42:24 2025 from 10.0.2.15
[ios:~]$ dmidecode -t 17 | egrep "Size: [0-9]|^\s+Locator:"
Size: 8192 MB
Locator: Slot0
Locator: Slot1
Size: 8192 MB
Locator: Slot2
Locator: Slot3
Size: 8192 MB
Locator: Slot4
Locator: Slot5
Size: 8192 MB
Locator: Slot6
Locator: Slot7
[ios:~]$
To check the total memory, run the show system resources command. The following example shows output of this command:
ASR-9903-A# show system resources
--snip--
Node Physical Application Boot Partition CPUs Shmwin
Total Available Cached Total Available Cached Total Available Total Available
0/RP1-Host 32.34G^ 492.92M^ 250.04M 31.58G^ 481.00M^ 243.94M 1.86G 1.33G 6 N/A N/A
0/RP1-Admin 1.93G 820.64M 173.03M 1.88G 801.00M 167.94M 7.50G 6.05G 2 N/A N/A
0/RP1-XR 27.61G 18.84G 3.01G 26.96G 8.39G 2.94G N/A N/A N/A 6.65G 5.97G
0/RP0-Host 32.34G^ 304.66M^ 174.20M 31.58G^ 297.00M^ 169.94M 1.86G 1.33G 6 N/A N/A
0/RP0-Admin 1.93G 1.09G 88.04M 1.88G 1.07G 84.90M 7.50G 6.05G 2 N/A N/A
0/RP0-XR 27.61G 18.71G 2.38G 26.96G 18.27G 2.33G N/A N/A N/A 6.65G 5.97G
0/0-Host 27.61G^ 1.66G^ 211.03M 26.97G^ 1.62G^ 205.94M 1.86G 1.34G 6 N/A N/A
0/0-Admin 1.00G 460.43M 62.00M 984.00M 449.00M 59.96M 7.50G 6.00G 1 N/A N/A
0/0-XR 22.96G 14.57G 4.30G 22.42G 14.23G 4.20G N/A N/A N/A 6.65G 5.73G
--snip--
Follow these steps on the affected device to collect MCE logs from the host shell:
admin
run chvrf 0 bash -l
ssh <specific node sysadmin IP or host name>
ssh my_host
cd /var/log/
cat mcelog.log
Cisco provides the Serial Number Validation Tool to verify whether a device is impacted by this issue. To check the device, enter the serial number in the Serial Number Validation Tool.
Important: For security reasons, you must click the Serial Number Validation Tool link that is provided in this section. Do not copy and paste the link into a browser. Use of the Serial Number Validation Tool URL external to this field notice will fail.
Version | Description | Section | Date |
1.0 | Initial Release | — | 2025-JUN-16 |
For further assistance or for more information about this field notice, contact the Cisco Technical Assistance Center (TAC) using one of the following methods:
To receive email updates about Field Notices (reliability and safety issues), Security Advisories (network security issues), and end-of-life announcements for specific Cisco products, set up a profile in My Notifications.
Unleash the Power of TAC's Virtual Assistance