Guest

Cisco ASR 1000 Series Aggregation Services Routers

ASR 1000 Series Router Memory Troubleshoot Guide

Document ID: 116777

Updated: Nov 19, 2013

Contributed by Vishnu Asok and Girish Devgan, Cisco TAC Engineers.

   Print

Introduction

This document describes how to check system memory and troubleshoot memory related issues on the Cisco ASR 1000 Series Aggregation Services Routers (ASR1K).

Prerequisites

Requirements

Cisco recommends that you have basic knowledge of these topics:

  • Cisco IOS-XE software
  • ASR CLI

Note: You might need a special license in order to log in to the Linux shell on the ASR 1001 Series router.

Components Used

The information in this document is based on these software and hardware versions:

  • All ASR1K platforms
  • All Cisco IOS-XE software releases that support the ASR1K platform

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

ASR Memory Layout Overview

With most of the software-based router platforms, the majority of the internal software processes are run within the Cisco IOS® memory. The ASR1K platform introduces a distributed software architecture that moves many Operating System (OS) responsibilities out of the IOS process. The IOS in this architecture, which was previously responsible for almost all of the internal operations, now runs as one of many Linux processes. This allows other Linux processes to share responsibility for the operation of the router. 

The ASR1K runs IOS-XE, not the traditional IOS. In IOS-XE, a Linux component runs the kernel, and the IOS runs as a daemon, which hereafter is referred as IOSd (IOS-Daemon). This creates a requirement that the memory be split between the Linux kernel and the IOSd instance.

The memory that is split between IOSd and the rest of the system is fixed at startup and cannot be modified. For a 4-GB system, IOSd is allocated approximately 2 GB, and for a 8-GB system, the IOSd is allocated approximately 4 GB (with software redundancy disabled).

Since the ASR1K has a 64-bit architecture, any pointer that is in every data structure in the system consumes double the amount of memory when compared to the traditional single-CPU platforms (8 bytes instead of 4 bytes). The 64-bit addressing enables IOS to overcome the 2-GB addressable memory limitation of IOS, which allows it to scale to millions of routes. 

Note: Ensure that you have sufficient memory available before you activate any new features. Cisco recommends that you have at least 8 GB DRAM if you receive the entire Border Gateway Protocol (BGP) routing table when software redundancy is enabled in order to prevent memory exhaustion.

Memory Allocation under the lsmpi_io pool

The Linux Shared Memory Punt Interface (LSMPI) memory pool is used in order to transfer packets from the forwarding processor to the route processor. This memory pool is carved at router initialization into preallocated buffers, as opposed to the processor pool, where IOS-XE allocates memory blocks dynamically. On the ASR1K platform, the lsmpi_io pool has little free memory – generally less than 1000 bytes – which is normal. Cisco recommends that you disable monitoring of the LSMPI pool by the network management applications in order to avoid false alarms.

ASR1000# show memory statistics
           Head    Total(b)    Used(b)    Free(b)   Lowest(b)  Largest(b)
Processor 2C073008  1820510884  173985240  1646525644  1614827804  1646234064
lsmpi_io  996481D0  6295088     6294120    968     968     968

If there are any issues in the LSMPI path, the Device xmit fail counter appears to increment in this command output (some output omitted):

ASR1000-1# show platform software infrastructure lsmpi driver
LSMPI Driver stat ver: 3
Packets:
        In: 674572
       Out: 259861
Rings:
        RX: 2047 free    0    in-use    2048 total
        TX: 2047 free    0    in-use    2048 total
    RXDONE: 2047 free    0    in-use    2048 total
    TXDONE: 2047 free    0    in-use    2048 total

Buffers:
        RX: 7721 free    473  in-use    8194 total
Reason for RX drops (sticky):
    Ring full        : 0
    Ring put failed  : 0
    No free buffer   : 0
    Receive failed   : 0
    Packet too large : 0
    Other inst buf   : 0
    Consecutive SOPs : 0
    No SOP or EOP    : 0
    EOP but no SOP   : 0
    Particle overrun : 0
    Bad particle ins : 0
    Bad buf cond     : 0
    DS rd req failed : 0
    HT rd req failed : 0
Reason for TX drops (sticky):
    Bad packet len   : 0
    Bad buf len      : 0
    Bad ifindex      : 0
    No device        : 0
    No skbuff        : 0
    Device xmit fail : 0
    Device xmit rtry : 0
    Tx Done ringfull : 0
    Bad u->k xlation : 0
    No extra skbuff  : 0
<snip>

Memory Usage

The ASR1K comprises these functional elements in its system:

  • ASR 1000 Series Route Processor (RP)
  • ASR 1000 Series Embedded Services Processor (ESP)
  • ASR 1000 Series SPA Interface Processor (SIP)

As such, it is required to monitor the memory utilization by each of these processors in a production environment.

The control processors run Cisco IOS-XE software that consists of a Linux-based kernel and a common set of OS-level utility programs, which includes Cisco IOS that runs as a user process on the RP card.

Verify Memory Usage on IOS-XE

Enter the show platform software status control-processor brief command in order to monitor the memory usage on the RP, the ESP, and the SIP. The system state must be identical, in regards to aspects such as the feature configuration and traffic, while you compare the memory usage. 

ASR1K# show platform software status control-processor brief 
<snip>

Memory (kB)
Slot Status   Total    Used (Pct)     Free (Pct) Committed (Pct)
RP0 Healthy  3907744  1835628 (47%)  2072116 (53%)  2614788 (67%)
ESP0 Healthy  2042668  789764 (39%)  1252904 (61%)  3108376 (152%)
SIP0 Healthy  482544   341004 (71%)   141540 (29%)   367956 (76%)
SIP1 Healthy  482544   315484 (65%)   167060 (35%)   312216 (65%)

Note: Committed memory is an estimate of how much RAM you need in order to guarantee that the system is never Out of Memory (OOM) for this workload. Normally, the kernel overcommits memory. For example, when you run a 1-GB malloc, nothing really happens. You only receive true memory-on-demand when you begin to use that allocated memory, and only as much as you use.

Each processor listed in the previous output might report the status as Healthy, Warning, or Critical, which is dependent upon the amount of free memory. If any of the processors display the status as Warning or Critical, enter the monitor platform software process<slot> command in order to identify the top contributor.

ASR1K# monitor platform software process ?
  0   SPA-Inter-Processor slot 0
  1   SPA-Inter-Processor slot 1
  F0  Embedded-Service-Processor slot 0
  F1  Embedded-Service-Processor slot 1
  FP  Embedded-Service-Processor
  R0  Route-Processor slot 0
  R1  Route-Processor slot 1
  RP  Route-Processor
  <cr>

You might be prompted to set the terminal-type before you can execute the monitor platform software process command:

ASR1K# monitor platform software process r0
Terminal type 'network' unsupported for command
Change the terminal type with the 'terminal terminal-type' command.

The terminal type is set to network by default. In order to set the appropriate terminal type, enter the terminal terminal-type command:

 ASR1K#terminal terminal-type vt100

Once the correct terminal type is configured, you can enter the monitor platform software process command (some output omitted):

ASR1000# monitor platform software process r0
top - 00:34:59 up  5:02,  0 users,  load average: 2.43, 1.52, 0.73
Tasks: 136 total,   4 running, 132 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  2.3%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2009852k total,  1811024k used,   198828k free,   135976k buffers
Swap:        0k total,        0k used,        0k free,  1133544k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

25956 root      20   0  928m 441m 152m R  1.2 22.5   4:21.32 linux_iosd-imag
29074 root      20   0  106m  95m 6388 S  0.0  4.9   0:14.86 smand
24027 root      20   0  114m  61m  55m S  0.0  3.1   0:05.07 fman_rp
25227 root      20   0 27096  13m  12m S  0.0  0.7   0:04.35 imand
23174 root      20   0 33760  11m 9152 S  1.0  0.6   1:58.00 cmand
23489 root      20   0 23988 7372 4952 S  0.2  0.4   0:05.28 emd
24755 root      20   0 19708 6820 4472 S  1.0  0.3   3:39.33 hman
28475 root      20   0 20460 6448 4792 S  0.0  0.3   0:00.26 psd
27957 root      20   0 16688 5668 3300 S  0.0  0.3   0:00.18 plogd
14572 root      20   0  4576 2932 1308 S  0.0  0.1   0:02.37 reflector.sh
<snip>

Note: In order to sort the output in descending order of memory usage, press Shift + M.

Verify Memory Usage on IOSd

If you notice that the linux_iosd-imag process holds an unusually large amount of memory in the monitor platform software process rp active command output, focus your troubleshooting efforts on the IOSd instance. It is likely that a specific process in the IOSd thread does not free up the memory. Troubleshoot memory related issues in the IOSd instance the same way that you troubleshoot any software-based forwarding platforms, such as the Cisco 2800, 3800, or 3900 Series.

ASR1K# monitor platform software process rp active
PID USER   PR  NI VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND
25794 root  20  0 2929m 1.9g 155m R 99.9 38.9 1415:11 linux_iosd-imag
23038 root   20  0 33848 13m  10m S  5.9  0.4  30:53.87 cmand
9599 root   20  0  2648 1152 884 R  2.0  0.0  0:00.01 top
<snip>

Enter the show process memory sorted command in order to identify the problem process:

ASR1000# show process memory sorted
Processor Pool Total: 1733568032 Used: 1261854564 Free: 471713468
lsmpi_io Pool Total: 6295088 Used: 6294116 Free: 972

PID TTY  Allocated   Freed       Holding    Getbufs    Retbufs  Process

522  0 1587708188  803356800   724777608  54432      0        BGP Router
234  0 3834576340 2644349464  232401568  286163388  15876  IP RIB Update
0    0  263244344   36307492  215384208  0          0        *Init

Note: Open a TAC case if you require assistance in order to troubleshoot or identify if the memory usage is legitimate.

Verify TCAM Utilization on an ASR1K

Traffic classification is one of the most basic functions found in routers and switches. Many applications and features require that the infrastructure devices provide these differentiated services for different users based on quality requirements. The traffic classification process should be quick, so that the throughput of the device is not greatly degraded. The ASR1K platform uses the 4th generation of Ternary Content Addressable Memory (TCAM4) for this purpose.

In order to determine the total number of TCAM cells available on the platform, and the number of free entries that remain, enter this command:

ASR1000# show platform hardware qfp active tcam resource-manager usage 

Total TCAM Cell Usage Information
----------------------------------
Name                        : TCAM #0 on CPP #0
Total number of regions     : 3
Total tcam used cell entries : 65528
Total tcam free cell entries : 30422
Threshold status            : below critical limit

Note: Cisco recommends that you always check the threshold status before you make any changes to Access-lists or Quality of Service (QoS) policies, so that the TCAM has sufficient free cells available in order to program the entries.

If the forwarding processor runs critically low on free TCAM cells, the ESP might generate logs similar to those shown below and might crash. If there is no redundancy, this results in raffic disruption.

%CPPTCAMRM-6-TCAM_RSRC_ERR: SIP0: cpp_sp: Allocation failed because of insufficient
TCAM resources in the system.

%CPPOSLIB-3-ERROR_NOTIFY: SIP0: cpp_sp:cpp_sp encountered an error -
Traceback=1#s7f63914d8ef12b8456826243f3b60d7 errmsg:7EFFC525C000+1175

Verify Memory Utilization on QFP

In addition to the physical memory, there is also memory attached to the Quantum Flow Processor (QFP) ASIC that is used in order to forward data structures, which includes data such as Forwarding Information Base (FIB) and QoS policies. The amount of DRAM available for the QFP ASIC is fixed, with ranges of  256 MB, 512 MB and 1 GB, dependent upon the ESP module.

Enter the show platform hardware qfp active infrastructure exmem statistics command in order  to determine the exmem memory usage. The sum of the memory for IRAM and DRAM that is used gives the total QFP memory that is in use.

BGL.I.05-ASR1000-1# show platform hardware qfp active infra exmem statistics user

Type: Name: IRAM, CPP: 0
  Allocations  Bytes-Alloc  Bytes-Total  User-Name
  ------------------------------------------------------
  1            115200       115712       CPP_FIA
Type: Name: DRAM, CPP: 0
  Allocations  Bytes-Alloc  Bytes-Total  User-Name
  -----------------------------------------------------
  4            1344          4096         P/I
  9            270600        276480       CEF
  1            1138256       1138688      QM RM
  1            4194304       4194304      TCAM
  1            65536         65536        Qm 16

The IRAM is the instruction memory for QFP software. In the event that DRAM is exhausted, available IRAM can be used. If the IRAM runs critically low on memory, you might see this error message:

%QFPOOR-4-LOWRSRC_PERCENT: F1: cpp_ha: QFP 0 IRAM resource low - 97 percent depleted
%QFPOOR-4-LOWRSRC_PERCENT: F1: cpp_ha: QFP 0 IRAM resource low - 98 percent depleted 

In order to determine the process that consumes most of the memory, enter the show platform hardware qfp active infra exmem statistics user command:

ASR1000# show platform hardware qfp active infra exmem statistics user

Type: Name: IRAM, CPP: 0
  Allocations  Bytes-Alloc  Bytes-Total  User-Name
  ----------------------------------------------------
  1            115200       115712       CPP_FIA

Type: Name: DRAM, CPP: 0
Allocations  Bytes-Alloc  Bytes-Total  User-Name
  ----------------------------------------------------
  4          1344         4096        P/I
  9          270600       276480       CEF
  1          1138256      1138688     QM RM
  1          4194304      4194304     TCAM
  1          65536        65536        Qm 16
Updated: Nov 19, 2013
Document ID: 116777