Guest

Cisco IOS Software Releases 12.1 Mainline

Troubleshooting Memory Problems

Document ID: 6507

Updated: Nov 28, 2006

   Print

Introduction

This document explains the symptoms and possible causes of memory allocation failure (MALLOCFAIL), and offers guidelines for troubleshooting memory problems.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

The information in this document is based on these software and hardware versions:

  • All Cisco IOS® software versions

  • All Cisco routers

    Note: This document does not apply to Cisco Catalyst switches that utilize CatOS or MGX platforms.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

What is Memory Allocation Failure?

Memory allocation failure means either:

  • The router has used all available memory (temporarily or permanently), or

  • The memory has fragmented into such small pieces that the router cannot find a usable available block. This can happen with the processor memory (used by the Cisco Internet Operating System [IOS]) or with the packet memory (used by incoming and outgoing packets).

Symptoms

Symptoms of memory allocation failure include, but are not limited to:

  • The console or log message: "%SYS-2-MALLOCFAIL: Memory allocation of 1028 bytes failed from 0x6015EC84, Pool Processor, alignment 0"

  • Refused Telnet sessions

  • The show processor memory command is displayed no matter what command you type on a console

  • No output from some show commands

  • "Low on memory" messages

  • The console message "Unable to create EXEC - no memory or too many processes"

  • Router hanging, no console response.

"Unable to Create EXEC" Error or When the Console Does Not Respond

When a router is low on memory, in some instances it is not possible to Telnet to the router. At this point, it is important to get access to the console port to collect data for troubleshooting. When connecting to the console port, however, you might see this:

%% Unable to create EXEC - no memory or too many processes

If you see the above message, there is not even enough available memory to allow for a console connection. There are steps you can take to allow data capture through the console. If you help the router free some memory, the console may respond, and you can then capture the necessary data from the router for troubleshooting.

Note: If Border Gateway Protocol (BGP) is configured on the router, you should refer to Achieve Optimal Routing and Reduce BGP Memory Consumption to reduce the memory consumption related to this process.

These are the steps for trying to capture data using the console port under very low memory conditions:

  1. Disconnect the LAN and WAN cables from the interfaces on the router. This will cause the router to stop passing packets.

  2. Recheck the console. Are you able to get a response and execute commands? After a few moments, there should be enough memory available to allow the console to respond.

  3. Collect the needed information from the privileged EXEC mode (Router#). At minimum, you want to collect the complete output of the following commands: show memory allocating-process totals (or show memory summary if show memory allocating-process totals is not available), show logging, and if possible, show technical-support.

  4. After you have collected the necessary data, reconnect all of the LAN and WAN links and continue to monitor the memory usage of the router.

Understanding the Error Message

When you do a show logging command, you should see something like this:

%SYS-2-MALLOCFAIL: Memory allocation of [X] bytes failed from
0x6015EC84, pool [Pool], alignment 0 -Process= 
"[Process]" ipl= 6, pid=5

[X] = the number of bytes the router tried to allocate, but could not find enough free memory to do so

[Pool] indicates whether the processor memory ('Pool Processor') or the packet memory ('pool I/O') is affected. High end routers (7000, 7500 series) have their buffers in main dynamic random-access memory (DRAM), so a lack of packet memory will be reported as "pool processor". 7200 series and Versatile Interface Processor (VIP) cards may also report errors in pool Protocol Control Information ('pool PCI')" for the packet memory.

[Process] is the process that was affected by the lack of memory.

Possible Causes

In Processor Memory ("Pool Processor" on all platforms)

Memory Size Does not Support the Cisco IOS Software Image

Memory Leak Bug

Large Quantity of Memory Used for Normal or Abnormal Processes

Memory Fragmentation Problem or Bug

Memory Allocation Failure at Process = <interrupt level>

Known Issues

Known 70x0 Issue when Loading Large Cisco IOS Software from Flash or Netboot

IP Input and CiscoWorks UT Discovery

In Packet Memory ("I/O" or " Processor" on high-end routers, "PCI" on 7200 series and VIP cards)

Not Enough Shared Memory for the Interfaces

Buffer Leak Bug

Router Running Low on Fast Memory

Troubleshooting

Security-related Problem

Commonly, MALLOCFAIL errors are caused by a security issue, such as a worm or virus operating in your network. This is especially likely to be the cause if there have not been recent changes to the network, such as a router Cisco IOS upgrade. Usually, a configuration change, such as adding additional lines to your access lists can mitigate the effects of this problem. The Cisco Product Security Advisories and Notices page contains information on detection of the most likely causes and specific workarounds.

For additional information, refer to:

Memory Size Does not Support the Cisco IOS Software Image

First, check the Download Software Area (registered customers only) for the minimum memory size for the feature set and version that you are running. Make sure it is sufficient. The memory requirements on Cisco.com are the minimum recommended sizes for the correct functioning of the router in most company networks. The actual memory requirements vary according to protocols, routing tables, and traffic patterns.

Memory Leak Bug

If you have the output of a show memory allocating-process totals command, a show memory summary command, or show technical-support command (in enable mode) from your Cisco device, you can use Output Interpreter (registered customers only) to display potential issues and fixes. To use Output Interpreter (registered customers only) , you must be a registered customer, be logged in, and have JavaScript enabled.

A memory leak occurs when a process requests or allocates memory and then forgets to free (de-allocate) the memory when it is finished with that task. As a result, the memory block is reserved until the router is reloaded. Over time, more and more memory blocks are allocated by that process until there is no free memory available. Depending on the severity of the low memory situation at this point, the only option you may have is to reload the router to get it operational again.

This is a Cisco Internet Operating System (IOS) bug. To get rid of it, upgrade to the latest version in your release train (for example, if you are running Cisco IOS Software release 11.2(14), upgrade to the latest 11.2(x) image.

If this doesn't solve the problem, or if you do not want to upgrade the router, enter the show processes memory command at regular intervals over a period of time (for example, every few hours or days depending on whether you have a fast or slow leak). Check to see if free memory continues to decrease and is never returned. The rate at which free memory disappears depends on how often the event occurs that leads to the leak. Since the memory is never freed, you can track the process that is using the memory by taking memory snapshots over time. Keep in mind that different processes allocate and de-allocate memory as needed, so you will see differences, but as the leak continues, you should see one process that is continually consuming more memory (Note: it is normal for some processes, such as Border Gateway Protocol (BGP) or Open Shortest Path First (OSPF) router, to use more than one megabyte of memory; this does not mean they are leaking).

To identify the process that is consuming more memory, compare the Holding column of the show processes memory command over the time interval. Sometimes you can very clearly see that one process is holding several megabytes of memory. Sometimes it takes several snapshots to find the culprit. When a significant amount of memory has been lost, collect a show memory allocating-process totals command or show memory summary command for additional troubleshooting. Then contact the Cisco Technical Assistance Center (TAC) and provide the information you collected, along with a show technical-support summary of the router.

The Output Interpreter tool allows you to receive an analysis of the show memory allocating-process totals command or show memory summary output.

The table gives the first three lines of the show memory summary command output:

Router>show memory summary 

            Head       Total (b)   Used (b)  Free (b)   Lowest (b)  Largest (b)
Processor   60AB4ED0   5550384     2082996   3467388    3464996     3454608
I/O         40000000   16777216    1937280   14839936   14839936    14838908

Total = the total amount of memory available after the system image loads and builds its data structures.

Used = the amount of memory currently allocated.

Free = the amount of memory currently free.

Lowest = the lowest amount of free memory recorded by the router since it was last booted.

Largest = the largest free memory block currently available.

The show memory allocating-process totals command contains the same information as the first three lines of the show memory summary command.

Here is what you can learn from the show processes memory command output:

Router>show processes memory 
Total: 3149760, Used: 2334300, Free: 815460

PID   TTY   Allocated    Freed      Holding    Getbufs    Retbufs   Process
0     0     226548       1252       1804376    0          0         *Initialization*
0     0     320          5422288    320        0          0         *Scheduler*
0     0     5663692      2173356    0          1856100    0         *Dead*
1     0     264          264        3784       0          0         Load Meter
2     2     5700         5372       13124      0          0         Virtual Exec
3     0     0            0          6784       0          0         Check heaps
4     0     96           0          6880       0          0         Pool Manager
5     0     264          264        6784       0          0         Timers
6     0     2028         672        8812       0          0         ARP Input
7     0     96           0          6880       0          0         SERIAL A' detect
8     0     504          264        7024       0          0         ATM ILMI Input
9     0     0            0          6784       0          0         ILMI Process
10    0     136          0          6920       0          0         M32_runts pring
11    0     136          0          6920       0          0         Call drop procs
12    0     340          340        12784      0          0         ATMSIG Timer
13    0     445664       442936     13904      0          0         IP Input
14    0     2365804      2357152    17992      0          0         CDP Protocol
15    0     528          264        7048       0          0         MOP Protocols
16    0     188          0          9972       0          0         IP Background
17    0     0            1608       6784       0          0         TCP Timer
18    0     5852116      0          14236      0          0         TCP Protocols

Allocated = the total amount of bytes that have been allocated by the process since the router booted.

Freed = the total amount of bytes that have been released by this process.

Holding = the total amount of bytes currently held by this process. This is the most important column for troubleshooting because it shows the actual amount of memory attributed to this process. Holding does not necessarily equal Allocated minus Freed because some processes allocate a block of memory that is later returned to the free pool by another process.

The *Dead* Process

The *dead* process is not a real process. It's there to account for the memory allocated under the context of another process which has terminated. The memory allocated to this process is reclaimed by the kernel and returned to the memory pool by the router itself when required. This is the way IOS handles memory. A memory block is considered as dead if the process which created the block exits (no longer running). Each block keeps track of the address and pid of the process which created it. During periodic memory tallying, if the process that the scheduler finds out from a block pid does not match the the process that the block remembered, the block is marked as dead.

Therefore, memory marked as belonging to process *Dead* was allocated under the control of a process that no longer runs. It is normal to have a significant chunk of memory in such a state. Here is an example:

Memory is allocated when configuring Network Address Translation (NAT) during a Telnet session. That memory is accounted for under the Telnet process ("Virtual Exec"). Once this process is terminated, the memory for the NAT configuration is still in use. This is shown using the *dead* process.

You can see under which context the memory was allocated using the show memory dead command, under the "What" column:

Router#show memory dead 
               Head   Total(b)    Used(b)    Free(b)  Lowest(b) Largest(b) 
      I/O    600000    2097152     461024    1636128    1635224    1635960 
  
          Processor memory 
  
 Address  Bytes Prev.    Next     Ref  PrevF   NextF   Alloc PC  What 
1D8310       60 1D82C8   1D8378     1                  3281FFE   Router Init 
2CA964       36 2CA914   2CA9B4     1                  3281FFE   Router Init 
2CAA04      112 2CA9B4   2CAAA0     1                  3A42144   OSPF Stub LSA RBTree 
2CAAA0       68 2CAA04   2CAB10     1                  3A420D4   Router Init 
2ED714       52 2ED668   2ED774     1                  3381C84   Router Init 
2F12AC       44 2F124C   2F1304     1                  3A50234   Router Init 
2F1304       24 2F12AC   2F1348     1                  3A420D4   Router Init 
2F1348       68 2F1304   2F13B8     1                  3381C84   Router Init 
300C28      340 300A14   300DA8     1                  3381B42   Router Init 

If a memory leak is detected, and the *Dead* process seems to be the one consuming the memory, include a show memory dead in the information provided to the Cisco TAC.

Large Quantity of Memory Used for Normal or Abnormal Processes

This is one of the most difficult causes to verify. The problem is characterized by a large amount of free memory, but a small value in the "Lowest" column. In this case, a normal or abnormal event (for example, a large routing instability) causes the router to use an unusually large amount of processor memory for a short period of time, during which the memory has run out. During that period, the router reports MALLOCFAIL. It might happen that soon after, the memory is freed and the problem disappears (for example, the network stabilizes). The memory shortage may also be due to a combination of factors, such as:

  • a memory leak that has consumed a large amount of memory, and then a network instability pushes the free memory to zero

  • the router does not have enough memory to begin with, but the problem is discovered only during a rare network event.

If the router was not rebooted, enter the show memory allocating-process totals command (or the show memory summary if show memory allocating-process totals is not available) and look at the first three lines. The log messages may provide clues about what process was consuming a lot of memory:

If large memory usage was due to a:

  • normal event, the solution is to install more memory.

  • rare or abnormal event, fix the related problem. You may then decide to purchase extra memory for future "insurance".

Memory Fragmentation Problem or Bug

This situation means that a process has consumed a large amount of processor memory and then released most or all of it, leaving fragments of memory still allocated either by this process, or by other processes that allocated memory during the problem. If the same event occurs several times, the memory may fragment into very small blocks, to the point where all processes requiring a larger block of memory cannot get the amount of memory that they need. This may affect router operation to the extent that you cannot connect to the router and get a prompt if the memory is badly fragmented.

This problem is characterized by a low value in the "Largest" column (under 20,000 bytes) of the show memory command, but a sufficient value in the "Freed" column (1MB or more), or some other wide disparity between the two columns. This may happen when the router gets very low on memory, since there is no defragmentation routine in the IOS.

If you suspect memory fragmentation, shut down some interfaces. This may free the fragmented blocks. If this works, the memory is behaving normally, and all you have to do is add more memory. If shutting down interfaces doesn't help, it may be a bug. The best course of action is to contact your Cisco support representative with the information you have collected.

Memory Allocation Failure at Process = interrupt level

This situation can be identified by the process in the error message. If the process is listed as <interrupt level>, as in the following example, then the memory allocation failure is being caused by a software problem.

"%SYS-2-MALLOCFAIL: Memory allocation of 68 bytes failed from 0x604CEF48, 
pool Processor, alignment 0-Process= <interrupt level>, ipl= 3"

This is a Cisco Internet Operating System (IOS) bug. You can use the Bug Toolkit (registered customers only) to search for a matching software bug ID for this issue. Once the software bug has been identified, upgrade to a Cisco IOS software version that contains the fix to resolve the problem.

Memory Depletion Due to Downloading per User Access Lists

Access lists can consume a lot of memory when they are used on a per user basis. The access lists are too large to be classified as mini access control lists (ACLs) and are now compiled as turbo ACLs. Each time this occurs, the TACL process has to kick in and process the new ACL. This can result in traffic being permitted or denied based on the compile time and available processing time.

Compiled ACLs have to be sent down to XCM. When there is only limited space available and once the memory is run of it, the console messages are seen and the memory defragger starts.

This is the workaround:

  • Use of concise ACLs, less number of Application Control Engines (ACEs) that will compile as mini ACLs, and that will reduce in both memory consumption and processing power for compilation.

  • Use of predefined ACLs on the router that are referenced via radius attribute filterID.

Known Issues

Known 70x0 Problem when Loading Large Cisco IOS Software from Flash or Netboot

When a 7000 Route Processor (RP) boots an image from Flash, it first loads the ROM image and then the flash image into memory. The old RP only has 16 MB of memory, and the Enterprise versions of Cisco IOS Software release later than version 11.0 are larger than 8 MB when uncompressed. Therefore, when you load the image from ROM and then Flash, the 7000 RP may run out of memory, or the memory may become fragmented during the boot-up process so that the router has memory-related error messages.

The solution is to enable Fast Boot from the configuration register so that the RP only loads a minimum subset of the Cisco IOS Software image in ROM, and then loads the complete Cisco IOS Software from Flash. To enable Fast Boot, set the configuration register to 0x2112. This will also speed up the boot process.

IP Input and CiscoWorks UT Discovery

Using the UT Discovery feature of CiscoWorks may cause the amount of free memory to become very small on some of your routers. The show proc memory command may indicate a lot of memory held up by the "IP input" process. This is a particular case of the Large Quantity of Memory Used for Normal or Abnormal Processes problem for the "IP input" process, which can also result in a Memory Fragmentation issue, if the low memory condition causes the memory to be fragmented.

The UT Discovery feature causes the Network Management Station to send out a sweep of ping for all IPs in every discovered subnet. The memory issues are caused by the growing size of the IP fast-switching cache on the router, because new cache entries are created for every new destination. Since the mask used for the entries in the cache depends on how it is subnetted, the presence of an address using a 32 bit mask (for example, a loopback address) in a major network causes all entries for that network to use a 32 bit mask. This results in a huge number of cache entries to be created, using a large amount of memory.

The best solution is to disable UT Discovery. You can do this by following the steps below:

  1. Go to C:\Program Files\CSCOpx\etc\cwsi\ANIServer.properties.

  2. Add "UTPingSweep=0".

  3. Restart ANI.

This may cause the User Tracking table to miss some end servers, or go out of date (this might be an issue with another Cisco application called User Registration Tool, which relies on UT), but it does not affect the Campus Discovery which uses only SNMP traffic. CEF switching may also improve this situation (with CEF, the IP cache is created from the routing table at bootup). Refer to How to Choose the Best Router Switching Path for Your Network for more information about CEF and other available switching paths.

There are many other applications that can result in similar low memory situations. In most cases, the root cause of the problem is not the router, but the application itself. Normally you should be able to prevent those packet storms by checking the configuration of the application.

Not Enough Shared Memory for the Interfaces

Some routers (for example, 2600, 3600, and 4000 Series) require a minimum amount of I/O memory to support certain interface processors.

If the router is running low on shared memory, even after a reload, physically removing interfaces solves the problem.

On 3600 Series Routers, the global configuration command memory-size iomem i/o-memory-percentage can be used to reallocate the percentage of DRAM to use for I/O memory and processor memory. The values permitted for i/o-memory-percentage are 10, 15, 20, 25 (the default), 30, 40, and 50. A minimum of 4 MB of memory is required for I/O memory.

In order to troubleshoot this problem, refer to:

Buffer Leak Bug

If you have the output of a show buffers command or show technical-support command (in enable mode) from your Cisco device, you can use Output Interpreter (registered customers only) to display potential issues and fixes. To use Output Interpreter (registered customers only) , you must be a registered customer, be logged in, and have JavaScript enabled.

When a process is finished with a buffer, the process should free the buffer. A buffer leak occurs when the code forgets to process a buffer, or forgets to free it after it is done with the packet. As a result, the buffer pool continues to grow as more and more packets are stuck in the buffers.

You can identify a buffer leak using the show buffers command. Some of the Public Buffer pools should be abnormally large with few free buffers. After a reload, you may see that the number of free buffers never gets close to the number of total buffers.

The Output Interpreter tool allows you to receive an analysis of the show buffers output.

In the example below, the Middle buffers are affected. The show buffers command indicates that nearly 8094 buffers are being used and not freed (8122 total minus 28 free):

Public buffer pools: Small buffers, 104 bytes (total 50, permanent 50): 
     50 in free list (20 min, 150 max allowed)
     403134 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Middle buffers, 600 bytes (total 8122, permanent 200):
     28 in free list (10 min, 300 max allowed)
     154459 hits, 41422 misses, 574 trims, 8496 created
Big buffers, 1524 bytes (total 50, permanent 50):
     50 in free list (5 min, 150 max allowed)
     58471 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 10, permanent 10): 
     10 in free list (0 min, 100 max allowed)
     0 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Large buffers, 5024 bytes (total 0, permanent 0)
     0 in free list (0 min, 10 max allowed) 
     0 hits, 0 misses, 0 trims, 0 created  
     0 failures (0 no memory)
Huge buffers, 18024 bytes (total 0, permanent 0): 
     0 in free list (0 min, 4 max allowed)
     0 hits, 0 misses, 0 trims, 0 created  
     0 failures (0 no memory) 

This is a Cisco IOS software bug. Upgrade to the latest version in your release train to fix known buffer leak bugs (for example, if you are running Cisco IOS Software Release 11.2(14), upgrade to the latest 11.2(x) image. If this doesn't help, or if it's not possible to upgrade the router, issue the following commands for the problem pool when the router is low on memory. These commands will display additional information about the content of the buffers:

  • show buffer old: shows allocated buffers more than one minute old

  • show buffer pool (small - middle - big - verybig - large - huge): shows a summary of the buffers for the specified pool

  • show buffer pool (small - middle - big - verybig - large - huge) dump: shows a hex/ASCII dump of all the buffers in use of a given pool.

Refer to Troubleshooting Buffer Leaks for additional details.

Router Running Low on Fast Memory

This problem is specific to the 7500 series. If the router runs out of "fast" memory, it will use its main Dynamic RAM (DRAM) instead. No action is required.

IPFAST-4-RADIXDELETE: Error trying to delete prefix entry [IP_address]/[dec] (expected [hex], got [hex])

The IPFAST-4-RADIXDELETE: Error trying to delete prefix entry [IP_address]/[dec] (expected [hex], got [hex]) error message indicates that the Routers Fast Switching Cache table in memory is corrupt. When the router tries to clear the cache table under normal processing or the clear ip cache command is entered, the system fails to delete entries due to the memory corruption. When the router fails to delete such an entry, the IPFAST-4-RADIXDELETE message is reported.

In order to resolve a cache table memory corruption issue, a hard reboot of the router is needed. A reboot will recarve the system memory structures and allow the fast cache to rebuild corruption-free.

%SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for TACL Bitmap. No memory available

The reason for the %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for TACL Bitmap. No memory available error message is that there is not enough processor memory left to grow the chunk pool specified. It is possibly caused by a process that behaves abnormally.

The workaround is to periodically capture (depending on the frequency of the issue) the output of these commands so that memory usage of the router can be monitored:

  • show processes memory sorted

  • show memory statistics

  • show memory allocating-process totals

Troubleshooting Summary

Pool "Processor" Memory Allocation Failures

Follow these steps.

  1. Check the memory requirements for your Cisco IOS software release version or feature set.

  2. If possible, upgrade to the latest Cisco IOS software release version in your release train.

  3. Check for a large quantity of memory used for normal or abnormal processes. If required, add more memory .

  4. Check whether this is a leak or a fragmentation (buffer leak on high-end routers).

  5. Collect the relevant information and contact the TAC.

Pool "I/O" Memory Allocation Failures ("Processor" on high-end routers, "PCI" on 7200 series)

Follow these steps:

  1. Check the shared memory requirements (see Not Enough Shared Memory for the Interfaces).

  2. If possible, upgrade to the latest Cisco IOS Software release version in your release train.

  3. Determine which buffer pool is affected, collect the relevant information, and contact the Cisco TAC.

Information to Collect if You Open a TAC Service Request

If you still need assistance after following the troubleshooting steps above and want to open a TAC Service Request (registered customers only) , be sure to include the following information:
  • Troubleshooting performed before opening the case
  • show technical-support output (in enable mode if possible) - multiple captures to show how router use of memory has changed over time
  • show log output or console captures, if available
  • show memory allocating-pool totals or show memory summary - multiple captures to show how router use of memory has changed over time
You might need to use the techniques in "Unable to Create EXEC" Error or When the Console Does Not Respond to get the information. Multiple captures of the information may be necessary to determine the cause of the problem. As there are several types of memory leaks, the TAC engineer may need additional information once the type of memory leak is identified. If you suspect a memory fragmentation problem, please include:
  • show memory free
  • show memory bigger
If you suspect a buffer leak, please include:
  • show buffer old
  • show buffer pool (small - middle - big - verybig - large - huge): for the problem pool. For example, if you suspect a leak in the middle pool, include the command show buffer pool middle
  • show buffer pool (small - middle - big - verybig - large - huge) packet: for the problem pool. For example, if you suspect a leak in the middle pool, include the command show buffer pool middle packet
You can attach information to your case by uploading it using the TAC Service Request Tool (registered customers only) . If you cannot access the Service Request Tool, you can send the information in an email attachment to attach@cisco.com with your case number in the subject line of your message to attach the relevant information to your case.

Note: Please do not manually reload or power-cycle the router before collecting the above information unless required to troubleshoot memory problems as this can cause important information to be lost that is needed for determining the root cause of the problem. Your TAC engineer may suggest reloading the router, and collecting additional information after the reload as part of the troubleshooting, depending on the cause.

Related Information

Updated: Nov 28, 2006
Document ID: 6507