Report Problems of Memory Utilization via EEM

Available Languages

Download Options

PDF (32.3 KB)
View with Adobe Reader on a variety of devices
ePub (84.6 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (72.2 KB)
View on Kindle device or Kindle app on multiple devices

Updated:November 13, 2024

Document ID:222582

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Background Information

Symptoms of IOS XE Routers Running out of Memory

Information TAC Needs for Initial Triage

Understanding High Memory Usage

EEM to Monitor Memory Utilization

Triggers

Core File

Introduction

This document describes general troubleshooting tips for collecting additional information for a memory leak problem.

Prerequisites

Requirements

Cisco recommends that you have basic knowledge of these topics:

Basic knowledge of Cisco IOS® XE
Basic knowledge in Embedded Event Manager (EEM)

Components Used

This document is not restricted to specific software and hardware versions. It applies for any routing Cisco IOS XE platforms like ASR1000, ISR4000, ISR1000, Cat8000 or Cat8000v.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information

In this document, you can find common logs that the device generates in case of a high memory utilization.

Also, you can see how you can benefit from the Embedded Event Manager feature to help TAC to monitor and get data on situations where the IOS XE router is frequently running out of memory.

The purpose of this document is not to explain any troubleshooting procedures extensively, if available, but only references to more in depth troubleshooting guides are provided.

Symptoms of IOS XE Routers Running out of Memory

When dealing with high memory usage problems, typically, you see a log message indicating that the warning limit of 85% has been reached. This value vary depending the version. Different logs are generated depending on where the system found the problem:

TCAM problems:

CPP_FM-3-CPP_FM_TCAM_WARNING

IOSd (Control plane):

SYS-2-MALLOCFAIL

SYS-2-CHUNKEXPANDFAIL

SYS-4-CHUNKSIBLINGSEXCEED

QFP (Data plane):

QFPOOR-4-LOWRSRC_PERCENT_WARN

QFPOOR-4-TOP_EXMEM_USER

CPPEXMEM-3-NOMEM

CPPEXMEM-3-TOPUSER

Temporal file system (TMPFS):

PLATFORM-3- ELEMENT_TMPFS_WARNING

General system log (need isolation):

PLATFORM-4-ELEMENT_WARNING

PLATFORM-3-ELEMENT_CRITICAL

Note: Log improvements are available from version 16.12 and later.

Information TAC Needs for Initial Triage

Commands output:

show clock

show version

show platform resources

show platform software status control-processor brief

show process memory sorted

show memory statistics

show memory allocating-process totals

show process memory platform sorted

show logging

In case of an unexpected reload due to a low memory condition:

core file/system report

Graph of memory utilization over time.

Attaching a show tech is desirable, it is helpful for TAC, and you can benefit from the automation TAC has developed to help you find issues faster.

Conditions leading to a high memory utilization are always software related. However, not all instances of high memory usage are unexpected. It is important to consider the available DRAM and the mix of features running on the device.

Troubleshooting high memory utilization is smoother, more effective, and with a better TAC interaction if you use Radkit. This tool, developed by Cisco, provides TAC a highly secure and easy way to access to the devices you select in your network. For more information, visit: Cisco RADKit

Note: Make sure you are running a supported version. Look for the End-of-Sale and End-of-Life document for the release. If needed, move to a version that is currently under Software Maintenance Releases. Otherwise, TAC can be limited on the troubleshooting and resolution options.

For a complete document around memory troubleshooting you can refer to these guides:

On ISR4K: Memory Troubleshooting Guide for Cisco 4000 Series ISRs.

On ASR1K: ASR 1000 Series Router Memory Troubleshoot Guide.

Understanding High Memory Usage

In Cisco IOS XE routers, DRAM is one of the most important resources that supports core functionality. DRAM is employed to store different data types and processes/features information that are essential for both the control plane and data plane operations.

Main uses of DRAM in IOS XE routers include:

IOSd Memory (Control Plane Structures): Stores processes related information related to control plane of the device, such as: routing information/protocols, network management structures, system configurations and feature information.

QFP Memory (Data Plane Structures): Stores everything around QFP operations handled by the microcode, such as key structures of features stored in the QFP, microcode instruction,s and forwarding instructions.

Temporary File System (TMPFS): Mounted in DRAM and managed by IOSd, TMPFS serves as a quick-access storage area for files needed by the processes. In case those files are persistent, they are moved to a harddisk/bootflash. It enhances system performance by reducing read/write times for temporary data.

General Processes Running on the Linux Kernel: Since IOS XE operates on a Linux-based kernel, DRAM also supports various system processes that run on top of this kernel.

High memory utilization of greater than 85% typically indicates significant DRAM consumption, which can impact router performance. This elevated usage can be a result from legitimate demands, such as storing extensive routing tables, or enabling resource-intensive features. However, it can also signal issues like inefficient memory management by certain features or memory leaks, where memory is not properly released back to the system after use.

By monitoring memory utilization across IOSd memory, QFP memory, TMPFS, and general Linux processes, you and TAC can identify potential problems early.

EEM to Monitor Memory Utilization

For memory troubleshooting, TAC needs to collect a set of commands over a period of time to identify the offending process. Sometimes, after the culprit process is identified, additional specific commands are needed, making memory troubleshooting one of the most time-consuming types of troubleshooting.

In order make this troubleshooting easier, you can use the EEM feature to monitor and automatically collect information. There are two main considerations for writing the EEM script: trigger and commands to be collected.

Triggers

Pattern. You can use the pattern of section Symptoms of Cisco IOS XE routers running out of memory. The format looks like this:

event syslog pattern <pattern> ratelimit 300 maxrun 180

One of the considerations while using a pattern as a trigger, is that the log is generated once the warning threshold is reached, depending on the memory consumption rate, trying to do it manually, you or TAC do not have enough time for a more detailed troubleshooting.

Cron timer. Example of a cron timer to be activated every 30 minutes:

event timer cron name HalfHour cron-entry "*\30 * * * *"

One of the advantages of a cron timer over a pattern is that you do not need to wait until the device almost run out of memory resources to collect information. Depending on the memory consumption rate, with proper monitoring and information, TAC can identify the offended process before reaching warning threshold.

Note: Ratelimit and maxrun options are used to guarantee that the entire set of outputs are collected. They also help to avoid additional noise or EEM activation in situations where multiple logs appear in a short period of time.

EEM examples with general commands for initial triage:

configure terminal
event manager applet TAC_EEM authorization bypass
event syslog pattern " PLATFORM-4-ELEMENT_WARNING" ratelimit 300 maxrun 180
action 0.1 cli command "enable"
action 0.2 cli command "term exec prompt timestamp"
action 0.3 cli command "term length 0"
action 0.4 cli command "show process memory platform sorted | append bootflash:TAC_EEM.txt"
action 0.5 cli command "show processes memory platform sorted location chassis 1 R0 | append bootflash:TAC_EEM.txt"
action 0.9 cli command "show platform resources | append bootflash:TAC_EEM.txt"
action 1.0 cli command "show platform software status control-processor brief | append bootflash:TAC_EEM.txt"
action 1.1 cli command "show clock | append bootflash:TAC_EEM.txt"
action 1.3 cli command "show platform software process memory chassis active r0 all sorted | append bootflash:TAC_EEM.txt"
action 1.5 cli command "show process memory platform accounting | append bootflash:TAC_EEM.txt"

Monitor daily with a cron timer:

configure terminal
event manager applet TAC_EEM2 authorization bypass
event timer cron name DAYLY cron-entry "0 0 * * *"
action 0.1 cli command "enable"
action 0.2 cli command "term exec prompt timestamp"
action 0.3 cli command "term length 0"
action 0.4 cli command "show process memory platform sorted | append bootflash:TAC_EEM2.txt"
action 0.5 cli command "show processes memory platform sorted location chassis 1 R0 | append bootflash:TAC_EEM2.txt"
action 0.6 cli command "show processes memory platform sorted location chassis 2 R0 | append bootflash:TAC_EEM2.txt"
action 0.9 cli command "show platform resources | append bootflash:TAC_EEM2.txt"
action 1.0 cli command "show platform software status control-processor brief | append bootflash:TAC_EEM2.txt"
action 1.1 cli command "show log | append bootflash:TAC_EEM2.txt"
action 1.2 cli command "show clock | append bootflash:TAC_EEM2.txt"
action 1.3 cli command "show platform software process memory chassis active r0 all sorted | append bootflash:TAC_EEM2.txt"
action 1.5 cli command "show process memory platform accounting | append bootflash:TAC_EEM2.txt"

For a more comprehensive list of commands, please refer to the guides from the section for Information TAC needs for initial triage.

Core File

When the memory utilization reaches a critical level, chances are that the operating system forces a crash in order to recover from this condition, generating a system report which that contains a core file.

The core file is the full dump of memory for a particular process that crashed at certain point in time. This core file is critical for TAC to inspect memory and analyze source code to understand the conditions and potential reasons of the unexpected reload/crash of the process.

The core file helps TAC and developers to find the root cause of the problem, debug, and fix the issue.

Note: Even though TAC and developers strive for getting a root cause, there are times where the crash was a consequence of a network event, or a timing issue which makes it virtually impossible to reproduce it in the lab.

For more information about unexpected reloads and how to retrieve a core file refer to Troubleshoot Unexpected Reloads in Cisco IOS® Platforms with TAC.

Revision History

Revision	Publish Date	Comments
1.0	13-Nov-2024	Initial Release

Contributed by Cisco Engineers

Ivan Hernandez
Technical Consulting Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Report Problems of Memory Utilization via EEM

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components Used

Background Information

Symptoms of IOS XE Routers Running out of Memory

Information TAC Needs for Initial Triage

Understanding High Memory Usage

EEM to Monitor Memory Utilization

Triggers

Core File

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco