This document explains the symptoms and possible causes of memory allocation failure (MALLOCFAIL), and offers guidelines for troubleshooting memory problems.
There are no specific requirements for this document.
The information in this document is based on these software and hardware versions:
All Cisco IOS® software versions
All Cisco routers
Note: This document does not apply to Cisco Catalyst switches that utilize CatOS or MGX platforms.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
Memory allocation failure means either:
The router has used all available memory (temporarily or permanently), or
The memory has fragmented into such small pieces that the router cannot find a usable available block. This can happen with the processor memory (used by the Cisco Internet Operating System [IOS]) or with the packet memory (used by incoming and outgoing packets).
Symptoms of memory allocation failure include, but are not limited to:
The console or log message: "%SYS-2-MALLOCFAIL: Memory allocation of 1028 bytes failed from 0x6015EC84, Pool Processor, alignment 0"
Refused Telnet sessions
The show processor memory command is displayed no matter what command you type on a console
No output from some show commands
"Low on memory" messages
The console message "Unable to create EXEC - no memory or too many processes"
Router hanging, no console response.
When a router is low on memory, in some instances it is not possible to Telnet to the router. At this point, it is important to get access to the console port to collect data for troubleshooting. When connecting to the console port, however, you might see this:
%% Unable to create EXEC - no memory or too many processes
If you see the above message, there is not even enough available memory to allow for a console connection. There are steps you can take to allow data capture through the console. If you help the router free some memory, the console may respond, and you can then capture the necessary data from the router for troubleshooting.
Note: If Border Gateway Protocol (BGP) is configured on the router, you should refer to Achieve Optimal Routing and Reduce BGP Memory Consumption to reduce the memory consumption related to this process.
These are the steps for trying to capture data using the console port under very low memory conditions:
Disconnect the LAN and WAN cables from the interfaces on the router. This will cause the router to stop passing packets.
Recheck the console. Are you able to get a response and execute commands? After a few moments, there should be enough memory available to allow the console to respond.
Collect the needed information from the privileged EXEC mode (Router#). At minimum, you want to collect the complete output of the following commands: show memory allocating-process totals (or show memory summary if show memory allocating-process totals is not available), show logging, and if possible, show technical-support.
After you have collected the necessary data, reconnect all of the LAN and WAN links and continue to monitor the memory usage of the router.
When you do a show logging command, you should see something like this:
%SYS-2-MALLOCFAIL: Memory allocation of [X] bytes failed from 0x6015EC84, pool [Pool], alignment 0 -Process= "[Process]" ipl= 6, pid=5
[X] = the number of bytes the router tried to allocate, but could not find enough free memory to do so
[Pool] indicates whether the processor memory ('Pool Processor') or the packet memory ('pool I/O') is affected. High end routers (7000, 7500 series) have their buffers in main dynamic random-access memory (DRAM), so a lack of packet memory will be reported as "pool processor". 7200 series and Versatile Interface Processor (VIP) cards may also report errors in pool Protocol Control Information ('pool PCI')" for the packet memory.
[Process] is the process that was affected by the lack of memory.
Commonly, MALLOCFAIL errors are caused by a security issue, such as a worm or virus operating in your network. This is especially likely to be the cause if there have not been recent changes to the network, such as a router Cisco IOS upgrade. Usually, a configuration change, such as adding additional lines to your access lists can mitigate the effects of this problem. The Cisco Product Security Advisories and Notices page contains information on detection of the most likely causes and specific workarounds.
For additional information, refer to:
First, check the Download Software Area (registered customers only) for the minimum memory size for the feature set and version that you are running. Make sure it is sufficient. The memory requirements on Cisco.com are the minimum recommended sizes for the correct functioning of the router in most company networks. The actual memory requirements vary according to protocols, routing tables, and traffic patterns.
A memory leak occurs when a process requests or allocates memory and then forgets to free (de-allocate) the memory when it is finished with that task. As a result, the memory block is reserved until the router is reloaded. Over time, more and more memory blocks are allocated by that process until there is no free memory available. Depending on the severity of the low memory situation at this point, the only option you may have is to reload the router to get it operational again.
This is a Cisco Internet Operating System (IOS) bug. To get rid of it, upgrade to the latest version in your release train (for example, if you are running Cisco IOS Software release 11.2(14), upgrade to the latest 11.2(x) image.
If this doesn't solve the problem, or if you do not want to upgrade the router, enter the show processes memory command at regular intervals over a period of time (for example, every few hours or days depending on whether you have a fast or slow leak). Check to see if free memory continues to decrease and is never returned. The rate at which free memory disappears depends on how often the event occurs that leads to the leak. Since the memory is never freed, you can track the process that is using the memory by taking memory snapshots over time. Keep in mind that different processes allocate and de-allocate memory as needed, so you will see differences, but as the leak continues, you should see one process that is continually consuming more memory (Note: it is normal for some processes, such as Border Gateway Protocol (BGP) or Open Shortest Path First (OSPF) router, to use more than one megabyte of memory; this does not mean they are leaking).
To identify the process that is consuming more memory, compare the Holding column of the show processes memory command over the time interval. Sometimes you can very clearly see that one process is holding several megabytes of memory. Sometimes it takes several snapshots to find the culprit. When a significant amount of memory has been lost, collect a show memory allocating-process totals command or show memory summary command for additional troubleshooting. Then contact the Cisco Technical Assistance Center (TAC) and provide the information you collected, along with a show technical-support summary of the router.
The Output Interpreter tool allows you to receive an analysis of the show memory allocating-process totals command or show memory summary output.
The table gives the first three lines of the show memory summary command output:
Router>show memory summary Head Total (b) Used (b) Free (b) Lowest (b) Largest (b) Processor 60AB4ED0 5550384 2082996 3467388 3464996 3454608 I/O 40000000 16777216 1937280 14839936 14839936 14838908
Total = the total amount of memory available after the system image loads and builds its data structures.
Used = the amount of memory currently allocated.
Free = the amount of memory currently free.
Lowest = the lowest amount of free memory recorded by the router since it was last booted.
Largest = the largest free memory block currently available.
The show memory allocating-process totals command contains the same information as the first three lines of the show memory summary command.
Here is what you can learn from the show processes memory command output:
Router>show processes memory Total: 3149760, Used: 2334300, Free: 815460 PID TTY Allocated Freed Holding Getbufs Retbufs Process 0 0 226548 1252 1804376 0 0 *Initialization* 0 0 320 5422288 320 0 0 *Scheduler* 0 0 5663692 2173356 0 1856100 0 *Dead* 1 0 264 264 3784 0 0 Load Meter 2 2 5700 5372 13124 0 0 Virtual Exec 3 0 0 0 6784 0 0 Check heaps 4 0 96 0 6880 0 0 Pool Manager 5 0 264 264 6784 0 0 Timers 6 0 2028 672 8812 0 0 ARP Input 7 0 96 0 6880 0 0 SERIAL A' detect 8 0 504 264 7024 0 0 ATM ILMI Input 9 0 0 0 6784 0 0 ILMI Process 10 0 136 0 6920 0 0 M32_runts pring 11 0 136 0 6920 0 0 Call drop procs 12 0 340 340 12784 0 0 ATMSIG Timer 13 0 445664 442936 13904 0 0 IP Input 14 0 2365804 2357152 17992 0 0 CDP Protocol 15 0 528 264 7048 0 0 MOP Protocols 16 0 188 0 9972 0 0 IP Background 17 0 0 1608 6784 0 0 TCP Timer 18 0 5852116 0 14236 0 0 TCP Protocols
Allocated = the total amount of bytes that have been allocated by the process since the router booted.
Freed = the total amount of bytes that have been released by this process.
Holding = the total amount of bytes currently held by this process. This is the most important column for troubleshooting because it shows the actual amount of memory attributed to this process. Holding does not necessarily equal Allocated minus Freed because some processes allocate a block of memory that is later returned to the free pool by another process.
The *dead* process is not a real process. It's there to account for the memory allocated under the context of another process which has terminated. The memory allocated to this process is reclaimed by the kernel and returned to the memory pool by the router itself when required. This is the way IOS handles memory. A memory block is considered as dead if the process which created the block exits (no longer running). Each block keeps track of the address and pid of the process which created it. During periodic memory tallying, if the process that the scheduler finds out from a block pid does not match the the process that the block remembered, the block is marked as dead.
Therefore, memory marked as belonging to process *Dead* was allocated under the control of a process that no longer runs. It is normal to have a significant chunk of memory in such a state. Here is an example:
Memory is allocated when configuring Network Address Translation (NAT) during a Telnet session. That memory is accounted for under the Telnet process ("Virtual Exec"). Once this process is terminated, the memory for the NAT configuration is still in use. This is shown using the *dead* process.
You can see under which context the memory was allocated using the show memory dead command, under the "What" column:
Router#show memory dead Head Total(b) Used(b) Free(b) Lowest(b) Largest(b) I/O 600000 2097152 461024 1636128 1635224 1635960 Processor memory Address Bytes Prev. Next Ref PrevF NextF Alloc PC What 1D8310 60 1D82C8 1D8378 1 3281FFE Router Init 2CA964 36 2CA914 2CA9B4 1 3281FFE Router Init 2CAA04 112 2CA9B4 2CAAA0 1 3A42144 OSPF Stub LSA RBTree 2CAAA0 68 2CAA04 2CAB10 1 3A420D4 Router Init 2ED714 52 2ED668 2ED774 1 3381C84 Router Init 2F12AC 44 2F124C 2F1304 1 3A50234 Router Init 2F1304 24 2F12AC 2F1348 1 3A420D4 Router Init 2F1348 68 2F1304 2F13B8 1 3381C84 Router Init 300C28 340 300A14 300DA8 1 3381B42 Router Init
If a memory leak is detected, and the *Dead* process seems to be the one consuming the memory, include a show memory dead in the information provided to the Cisco TAC.
This is one of the most difficult causes to verify. The problem is characterized by a large amount of free memory, but a small value in the "Lowest" column. In this case, a normal or abnormal event (for example, a large routing instability) causes the router to use an unusually large amount of processor memory for a short period of time, during which the memory has run out. During that period, the router reports MALLOCFAIL. It might happen that soon after, the memory is freed and the problem disappears (for example, the network stabilizes). The memory shortage may also be due to a combination of factors, such as:
a memory leak that has consumed a large amount of memory, and then a network instability pushes the free memory to zero
the router does not have enough memory to begin with, but the problem is discovered only during a rare network event.
If the router was not rebooted, enter the show memory allocating-process totals command (or the show memory summary if show memory allocating-process totals is not available) and look at the first three lines. The log messages may provide clues about what process was consuming a lot of memory:
If large memory usage was due to a:
normal event, the solution is to install more memory.
rare or abnormal event, fix the related problem. You may then decide to purchase extra memory for future "insurance".
This situation means that a process has consumed a large amount of processor memory and then released most or all of it, leaving fragments of memory still allocated either by this process, or by other processes that allocated memory during the problem. If the same event occurs several times, the memory may fragment into very small blocks, to the point where all processes requiring a larger block of memory cannot get the amount of memory that they need. This may affect router operation to the extent that you cannot connect to the router and get a prompt if the memory is badly fragmented.
This problem is characterized by a low value in the "Largest" column (under 20,000 bytes) of the show memory command, but a sufficient value in the "Freed" column (1MB or more), or some other wide disparity between the two columns. This may happen when the router gets very low on memory, since there is no defragmentation routine in the IOS.
If you suspect memory fragmentation, shut down some interfaces. This may free the fragmented blocks. If this works, the memory is behaving normally, and all you have to do is add more memory. If shutting down interfaces doesn't help, it may be a bug. The best course of action is to contact your Cisco support representative with the information you have collected.
This situation can be identified by the process in the error message. If the process is listed as <interrupt level>, as in the following example, then the memory allocation failure is being caused by a software problem.
"%SYS-2-MALLOCFAIL: Memory allocation of 68 bytes failed from 0x604CEF48, pool Processor, alignment 0-Process= <interrupt level>, ipl= 3"
This is a Cisco Internet Operating System (IOS) bug. You can use the Bug Toolkit (registered customers only) to search for a matching software bug ID for this issue. Once the software bug has been identified, upgrade to a Cisco IOS software version that contains the fix to resolve the problem.
Access lists can consume a lot of memory when they are used on a per user basis. The access lists are too large to be classified as mini access control lists (ACLs) and are now compiled as turbo ACLs. Each time this occurs, the TACL process has to kick in and process the new ACL. This can result in traffic being permitted or denied based on the compile time and available processing time.
Compiled ACLs have to be sent down to XCM. When there is only limited space available and once the memory is run of it, the console messages are seen and the memory defragger starts.
This is the workaround:
Use of concise ACLs, less number of Application Control Engines (ACEs) that will compile as mini ACLs, and that will reduce in both memory consumption and processing power for compilation.
Use of predefined ACLs on the router that are referenced via radius attribute filterID.
When a 7000 Route Processor (RP) boots an image from Flash, it first loads the ROM image and then the flash image into memory. The old RP only has 16 MB of memory, and the Enterprise versions of Cisco IOS Software release later than version 11.0 are larger than 8 MB when uncompressed. Therefore, when you load the image from ROM and then Flash, the 7000 RP may run out of memory, or the memory may become fragmented during the boot-up process so that the router has memory-related error messages.
The solution is to enable Fast Boot from the configuration register so that the RP only loads a minimum subset of the Cisco IOS Software image in ROM, and then loads the complete Cisco IOS Software from Flash. To enable Fast Boot, set the configuration register to 0x2112. This will also speed up the boot process.
Using the UT Discovery feature of CiscoWorks may cause the amount of free memory to become very small on some of your routers. The show proc memory command may indicate a lot of memory held up by the "IP input" process. This is a particular case of the Large Quantity of Memory Used for Normal or Abnormal Processes problem for the "IP input" process, which can also result in a Memory Fragmentation issue, if the low memory condition causes the memory to be fragmented.
The UT Discovery feature causes the Network Management Station to send out a sweep of ping for all IPs in every discovered subnet. The memory issues are caused by the growing size of the IP fast-switching cache on the router, because new cache entries are created for every new destination. Since the mask used for the entries in the cache depends on how it is subnetted, the presence of an address using a 32 bit mask (for example, a loopback address) in a major network causes all entries for that network to use a 32 bit mask. This results in a huge number of cache entries to be created, using a large amount of memory.
The best solution is to disable UT Discovery. You can do this by following the steps below:
Go to C:\Program Files\CSCOpx\etc\cwsi\ANIServer.properties.
This may cause the User Tracking table to miss some end servers, or go out of date (this might be an issue with another Cisco application called User Registration Tool, which relies on UT), but it does not affect the Campus Discovery which uses only SNMP traffic. CEF switching may also improve this situation (with CEF, the IP cache is created from the routing table at bootup). Refer to How to Choose the Best Router Switching Path for Your Network for more information about CEF and other available switching paths.
There are many other applications that can result in similar low memory situations. In most cases, the root cause of the problem is not the router, but the application itself. Normally you should be able to prevent those packet storms by checking the configuration of the application.
Some routers (for example, 2600, 3600, and 4000 Series) require a minimum amount of I/O memory to support certain interface processors.
If the router is running low on shared memory, even after a reload, physically removing interfaces solves the problem.
On 3600 Series Routers, the global configuration command memory-size iomem i/o-memory-percentage can be used to reallocate the percentage of DRAM to use for I/O memory and processor memory. The values permitted for i/o-memory-percentage are 10, 15, 20, 25 (the default), 30, 40, and 50. A minimum of 4 MB of memory is required for I/O memory.
In order to troubleshoot this problem, refer to:
Shared memory requirements for the 4000/4500/4700 routers.
When a process is finished with a buffer, the process should free the buffer. A buffer leak occurs when the code forgets to process a buffer, or forgets to free it after it is done with the packet. As a result, the buffer pool continues to grow as more and more packets are stuck in the buffers.
You can identify a buffer leak using the show buffers command. Some of the Public Buffer pools should be abnormally large with few free buffers. After a reload, you may see that the number of free buffers never gets close to the number of total buffers.
The Output Interpreter tool allows you to receive an analysis of the show buffers output.
In the example below, the Middle buffers are affected. The show buffers command indicates that nearly 8094 buffers are being used and not freed (8122 total minus 28 free):
Public buffer pools: Small buffers, 104 bytes (total 50, permanent 50): 50 in free list (20 min, 150 max allowed) 403134 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Middle buffers, 600 bytes (total 8122, permanent 200): 28 in free list (10 min, 300 max allowed) 154459 hits, 41422 misses, 574 trims, 8496 created Big buffers, 1524 bytes (total 50, permanent 50): 50 in free list (5 min, 150 max allowed) 58471 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) VeryBig buffers, 4520 bytes (total 10, permanent 10): 10 in free list (0 min, 100 max allowed) 0 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Large buffers, 5024 bytes (total 0, permanent 0) 0 in free list (0 min, 10 max allowed) 0 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory) Huge buffers, 18024 bytes (total 0, permanent 0): 0 in free list (0 min, 4 max allowed) 0 hits, 0 misses, 0 trims, 0 created 0 failures (0 no memory)
This is a Cisco IOS software bug. Upgrade to the latest version in your release train to fix known buffer leak bugs (for example, if you are running Cisco IOS Software Release 11.2(14), upgrade to the latest 11.2(x) image. If this doesn't help, or if it's not possible to upgrade the router, issue the following commands for the problem pool when the router is low on memory. These commands will display additional information about the content of the buffers:
show buffer old: shows allocated buffers more than one minute old
show buffer pool (small - middle - big - verybig - large - huge): shows a summary of the buffers for the specified pool
show buffer pool (small - middle - big - verybig - large - huge) dump: shows a hex/ASCII dump of all the buffers in use of a given pool.
Refer to Troubleshooting Buffer Leaks for additional details.
This problem is specific to the 7500 series. If the router runs out of "fast" memory, it will use its main Dynamic RAM (DRAM) instead. No action is required.
IPFAST-4-RADIXDELETE: Error trying to delete prefix entry [IP_address]/[dec] (expected [hex], got [hex])
The IPFAST-4-RADIXDELETE: Error trying to delete prefix entry [IP_address]/[dec] (expected [hex], got [hex]) error message indicates that the Routers Fast Switching Cache table in memory is corrupt. When the router tries to clear the cache table under normal processing or the clear ip cache command is entered, the system fails to delete entries due to the memory corruption. When the router fails to delete such an entry, the IPFAST-4-RADIXDELETE message is reported.
In order to resolve a cache table memory corruption issue, a hard reboot of the router is needed. A reboot will recarve the system memory structures and allow the fast cache to rebuild corruption-free.
The reason for the %SYS-2-CHUNKEXPANDFAIL: Could not expand chunk pool for TACL Bitmap. No memory available error message is that there is not enough processor memory left to grow the chunk pool specified. It is possibly caused by a process that behaves abnormally.
The workaround is to periodically capture (depending on the frequency of the issue) the output of these commands so that memory usage of the router can be monitored:
show processes memory sorted
show memory statistics
show memory allocating-process totals
Follow these steps.
Check the memory requirements for your Cisco IOS software release version or feature set.
If possible, upgrade to the latest Cisco IOS software release version in your release train.
Check for a large quantity of memory used for normal or abnormal processes. If required, add more memory .
Check whether this is a leak or a fragmentation (buffer leak on high-end routers).
Collect the relevant information and contact the TAC.
Follow these steps:
Check the shared memory requirements (see Not Enough Shared Memory for the Interfaces).
If possible, upgrade to the latest Cisco IOS Software release version in your release train.
Determine which buffer pool is affected, collect the relevant information, and contact the Cisco TAC.
|If you still need assistance after following the troubleshooting steps above and want to open a TAC Service Request (registered customers only) , be sure to include the following information:|
Note: Please do not manually reload or power-cycle the router before collecting the above information unless required to troubleshoot memory problems as this can cause important information to be lost that is needed for determining the root cause of the problem. Your TAC engineer may suggest reloading the router, and collecting additional information after the reload as part of the troubleshooting, depending on the cause.
The Cisco Support Community is a forum for you to ask and answer questions, share suggestions, and collaborate with your peers.
Refer to Cisco Technical Tips Conventions for information on conventions used in this document.