Guest

Cisco ASR 1000 Series Aggregation Services Routers

ASR 1000 Series Router Memory Troubleshoot Guide

ASR 1000 Series Router Memory Troubleshoot Guide

Document ID: 116777

Updated: Nov 19, 2013

Contributed by Vishnu Asok and Girish Devgan, Cisco TAC Engineers.

   Print

Introduction

This document describes how to check system memory and troubleshoot memory issues on Cisco 1000 Series Aggregation Services Routers (ASR1K).

Prerequisites

Requirements

Cisco recommends that you have basic knowledge of these topics:

  • Cisco IOS-XE software
  • ASR CLI

Note: You might need a special license in order to log in to the Linux shell on the ASR 1001 Series router.

Components Used

The information in this document is based on these software and hardware versions:

  • All ASR1K platforms
  • All Cisco IOS-XE software releases that support the ASR1K platform

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

ASR Memory Layout Overview

With most of the previous Cisco router platforms, the majority of the internal software processes are run with the Cisco IOS® (IOS) memory. The ASR1K platform introduces a distributed software architecture that moves many Operating System (OS) responsibilities out of the IOS process. In this architecture, IOS, which was previously responsible for almost all of the internal software processes, now runs as one of many Linux processes. This allows other Linux processes to share responsibility for the operation of the router. 

The ASR1K runs IOS-XE, not the traditional IOS. In IOS-XE, a Linux component runs the kernel, and the IOS runs as a daemon, which hereafter is referred as IOSd (IOS-Daemon). This creates a requirement that the memory be split between the Linux kernel and the IOSd instance.

The memory that is split between IOSd and the rest of the system is fixed at startup and cannot be modified. For a 4-GB system, IOSd is allocated approximately 2 GB, and for a 8-GB system, the IOSd is allocated approximately 3.8 GB (with software redundancy disabled).

Since the ASR1K has a 64-bit architecture, any pointer that is in every data structure in the system consumes double the amount of memory when compared to the consumption of a traditional single-CPU router (8 bytes instead of 4 bytes). The 64-bit addressing enables IOS to overcome the 2-GB addressable memory limitation of IOS, which allows it to scale to millions of routes. 

Note: Ensure that you have sufficient memory available before you activate any new features. Cisco recommends that you have at least 8 GB DRAM if you receive the entire Border Gateway Protocol (BGP) routing table when software redundancy is enabled in order to prevent memory exhaustion.

Memory Allocation under the lsmpi_io pool

The Linux Shared Memory Punt Interface (LSMPI) memory pool is used in order to transfer packets from the forwarding processor to the route processor. This memory pool is carved at router initialization into preallocated buffers, as opposed to the processor pool, where IOS-XE allocates memory blocks dynamically. On the ASR1K platform, the lsmpi_io pool has little free memory - generally less than 1000 bytes - which is normal. Cisco recommends that you disable monitoring of the LSMPI pool by the network management applications in order to avoid false alarms.


ASR1000# show memory statistics

           Head    Total(b)    Used(b)    Free(b)   Lowest(b)  Largest(b)

Processor 2C073008  1820510884  173985240  1646525644  1614827804  1646234064

lsmpi_io  996481D0  6295088     6294120    968     968     968

If there are any issues in the LSMPI path, the Device xmit fail counter appears to increment in this command output (some output omitted):


ASR1000-1# show platform software infrastructure lsmpi driver

LSMPI Driver stat ver: 3

Packets:

        In: 674572

       Out: 259861

Rings:

        RX: 2047 free    0    in-use    2048 total

        TX: 2047 free    0    in-use    2048 total

    RXDONE: 2047 free    0    in-use    2048 total

    TXDONE: 2047 free    0    in-use    2048 total

Buffers:

        RX: 7721 free    473  in-use    8194 total

Reason for RX drops (sticky):

    Ring full        : 0

    Ring put failed  : 0

    No free buffer   : 0

    Receive failed   : 0

    Packet too large : 0

    Other inst buf   : 0

    Consecutive SOPs : 0

    No SOP or EOP    : 0

    EOP but no SOP   : 0

    Particle overrun : 0

    Bad particle ins : 0

    Bad buf cond     : 0
    DS rd req failed : 0

    HT rd req failed : 0

Reason for TX drops (sticky):

    Bad packet len   : 0

    Bad buf len      : 0

    Bad ifindex      : 0

    No device        : 0

    No skbuff        : 0

    Device xmit fail : 0

    Device xmit rtry : 0

    Tx Done ringfull : 0

    Bad u->k xlation : 0

    No extra skbuff  : 0

<snip>

Memory Usage

The control CPUs in the ASR1K chassis, such as the Route Processor (RP), the Embedded Switch Processor (ESP), and the Shared Port Adapter (SPA) Interface Processor (SIP), run IOS-XE software. This OS software consists of a Linux-based kernel and a common set of OS-level utility programs, which includes Cisco IOS software that runs as a user process on the RP card. Within IOS-XE, each child process operates in protected memory under each line card Linux kernel and embedded memory.

Verify Memory Usage on IOS-XE

Enter the show platform software status control-processor brief command in order to monitor the memory usage on the RP, the ESP, and the SIP. The system state must be identical, in regards to aspects such as the feature configuration and traffic, while you compare the memory usage. 


ASR1K# show platform software status control-processor brief 

<snip>

Memory (kB)

Slot Status   Total    Used (Pct)     Free (Pct) Committed (Pct)

RP0 Healthy  3907744  1835628 (47%)  2072116 (53%)  2614788 (67%)

ESP0 Healthy  2042668  789764 (39%)  1252904 (61%)  3108376 (152%)

SIP0 Healthy  482544   341004 (71%)   141540 (29%)   367956 (76%)

SIP1 Healthy  482544   315484 (65%)   167060 (35%)   312216 (65%)

Note: Committed memory is an estimate of how much RAM you need in order to guarantee that the system is never Out of Memory (OOM) for this workload. Normally, the kernel overcommits memory. For example, when you run a 1-GB malloc, nothing really happens; you only receive true memory-on-demand when you begin to use that allocated memory, and only as much as you use.


Each processor listed in the previous output might report the status as Healthy, Warning, or Critical, which is dependent upon the amount of free memory. If any of the processors display the status as Warning or Critical, enter the monitor platform software process<slot> command in order to identify the top contributor.


BGL.J.16-ASR1000-4# monitor platform software process ?

  0   SPA-Inter-Processor slot 0

  1   SPA-Inter-Processor slot 1

  F0  Embedded-Service-Processor slot 0

  F1  Embedded-Service-Processor slot 1

  FP  Embedded-Service-Processor
  R0  Route-Processor slot 0

  R1  Route-Processor slot 1

  RP  Route-Processor

  <cr>

You might be prompted to set the terminal-type before you can execute the monitor platform software process command:


BGL.J.16-ASR1000-4# monitor platform software process r0

Terminal type 'network' unsupported for command

Change the terminal type with the 'terminal terminal-type' command.

The terminal type is set to network by default. In order to set the appropriate terminal type, enter the terminal terminal-type command:


ASR1000# terminal terminal-type vt100

Once the correct terminal type is configured, you can enter the monitor platform software process command (some output omitted):


ASR1000# monitor platform software process r0

top - 00:34:59 up  5:02,  0 users,  load average: 2.43, 1.52, 0.73

Tasks: 136 total,   4 running, 132 sleeping,   0 stopped,   0 zombie

Cpu(s):  0.8%us,  2.3%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:   2009852k total,  1811024k used,   198828k free,   135976k buffers

Swap:        0k total,        0k used,        0k free,  1133544k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

25956 root      20   0  928m 441m 152m R  1.2 22.5   4:21.32 linux_iosd-imag

29074 root      20   0  106m  95m 6388 S  0.0  4.9   0:14.86 smand

24027 root      20   0  114m  61m  55m S  0.0  3.1   0:05.07 fman_rp

25227 root      20   0 27096  13m  12m S  0.0  0.7   0:04.35 imand

23174 root      20   0 33760  11m 9152 S  1.0  0.6   1:58.00 cmand

23489 root      20   0 23988 7372 4952 S  0.2  0.4   0:05.28 emd

24755 root      20   0 19708 6820 4472 S  1.0  0.3   3:39.33 hman

28475 root      20   0 20460 6448 4792 S  0.0  0.3   0:00.26 psd

27957 root      20   0 16688 5668 3300 S  0.0  0.3   0:00.18 plogd

14572 root      20   0  4576 2932 1308 S  0.0  0.1   0:02.37 reflector.sh

<snip>

Note: In order to sort the output in descending order of memory usage, press Shift + M.


Warning: Open a Cisco Technical Assistance Center (TAC) case if any of the processors report a Critical or Warning status, and you need assistance in order to identify the cause.

Verify Memory Usage on IOSd

If you notice that the linux_iosd-imag process holds an unusually large amount of memory in the monitor platform software process rp active command output, focus your troubleshoot efforts on the IOSd instance. It is likely that a specific process in the IOSd thread is not freeing the memory. Troubleshoot memory related issues in the IOSd pool the same way that you troubleshoot any software-based forwarding platform, such as the Cisco 2800, 3800, or 3900 Series platforms.

ASR1000# monitor platform software process rp active

PID USER   PR  NI VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND

25794 root  20  0 2929m 1.9g 155m R 99.9 38.9 1415:11 linux_iosd-imag

23038 root   20  0 33848 13m  10m S  5.9  0.4  30:53.87 cmand

9599 root   20  0  2648 1152 884 R  2.0  0.0  0:00.01 top

<snip>

Enter the show process memory sorted command in order to identify the problem process:


ASR1000# show process memory sorted

Processor Pool Total: 1733568032 Used: 1261854564 Free: 471713468

lsmpi_io Pool Total: 6295088 Used: 6294116 Free: 972


PID TTY  Allocated   Freed       Holding    Getbufs    Retbufs  Process

522  0 1587708188  803356800   724777608  54432      0        BGP Router

234  0 3834576340 2644349464  232401568  286163388  15876  IP RIB Update

0    0  263244344   36307492  215384208  0          0        *Init

Note: Open a TAC case if you require assistance in order to troubleshoot or identify if the memory usage is legitimate.

Verify TCAM Utilization on an ASR1K

Traffic classification is one of the most basic functions found in routers and switches. Many applications and features require that the infrastructure devices provide these differentiated services for different users based on quality requirements or features based on classification requirements. The traffic classification process should be quick, so that the throughput of the device is not greatly degraded. The ASR1K platform uses the 4th generation of Ternary Content Addressable Memory (TCAM4) for this purpose.

In order to determine the total number of TCAM cells available on the platform, and the number of free entries that remain, enter this command:


ASR1000# show platform hardware qfp active tcam resource-manager usage 

Total TCAM Cell Usage Information

----------------------------------

Name                        : TCAM #0 on CPP #0

Total number of regions     : 3

Total tcam used cell entries : 65528

Total tcam free cell entries : 30422

Threshold status            : below critical limit

Note: Cisco recommends that you always check the threshold status before you make any changes to Access-lists or Quality of Service (QoS) policies, so that the TCAM has sufficient free cells available in order to program the entries.


If the forwarding processor runs critically low on free TCAM cells, the ESP might generate logs similar to these, and then crash, which causes the traffic forwarding to stop (if there is no redundancy):


%CPPTCAMRM-6-TCAM_RSRC_ERR: SIP0: cpp_sp: 
 Allocation failed because of insufficient TCAM resources
 in the system.

%CPPOSLIB-3-ERROR_NOTIFY: SIP0: cpp_sp:
 cpp_sp encountered an error -Traceback=
 1#d7f63914d8ef12b8456826243f3b60d7
 errmsg:7EFFC525C000+1175 cpp_common_os:7EFFC8D20000+D1E5
 cpp_common_os:7EFFC8D20000+D12E

Verify Memory Utilization on QFP

In addition to the physical memory, there is also memory attached to the Quantum Flow Processor (QFP) ASIC that is used in order to forward data structures, which includes data such as Forwarding Information Base (FIB) and QoS policies. The amount of DRAM available for the QFP ASIC is fixed, with ranges of  256 MB, 512 MB and 1 GB, dependent upon the ESP module.

Enter the show platform hardware qfp active infrastructure exmem statistics command in order  to determine the exmem memory usage. The sum of the memory for IRAM and DRAM that is used gives the total QFP memory that is in use.


BGL.I.05-ASR1000-1# show platform hardware qfp active
  infra exmem statistics user


Type: Name: IRAM, CPP: 0

  Allocations  Bytes-Alloc  Bytes-Total  User-Name

  ------------------------------------------------------

  1            115200       115712       CPP_FIA

Type: Name: DRAM, CPP: 0

  Allocations  Bytes-Alloc  Bytes-Total  User-Name

  ------------------------------------------------------

  4            1344          4096         P/I

  9            270600        276480       CEF

  1            1138256       1138688      QM RM

  1            4194304       4194304      TCAM

  1            65536         65536        Qm 16

  3            15745024     15745024     ING_EGR_UIDB

The IRAM is the instruction memory for QFP software. In the event that DRAM is exhausted, available IRAM can be used. If the IRAM runs critically low on memory, you might see this error message:


%QFPOOR-4-LOWRSRC_PERCENT: F1: cpp_ha:  QFP 0 IRAM resource low
  - 97 percent depleted

%QFPOOR-4-LOWRSRC_PERCENT: F1: cpp_ha:  QFP 0 IRAM resource low
  - 98 percent depleted 

In order to determine the process that consumes most of the memory, enter the show platform hardware qfp active infra exmem statistics use command:


ASR1000# show platform hardware qfp active infra exmem statistics user

Type: Name: IRAM, CPP: 0

  Allocations  Bytes-Alloc  Bytes-Total  User-Name

  ----------------------------------------------------

  1            115200       115712       CPP_FIA

Type: Name: DRAM, CPP: 0

Allocations  Bytes-Alloc  Bytes-Total  User-Name

  ----------------------------------------------------

  4          1344         4096        P/I

  9          270600       276480       CEF

  1          1138256      1138688     QM RM

  1          4194304      4194304     TCAM

  1          65536        65536        Qm 16

  3          15745024    15745024    ING_EGR_UIDB

Once you identify the feature that holds most of the memory, collect the output from the show platform hardware qfp active feature <feature> command, and contact the Cisco TAC in order to determine the root cause.

Updated: Nov 19, 2013
Document ID: 116777