Guest

Cisco BPX/IGX/IPX WAN Software

Explaining WAN Switch Software Resource Aborts

Document ID: 15034



Contents

Introduction
Prerequisites
      Requirements
      Components Used
      Conventions
Explaining the WAN Switch Software Resource Aborts
Switch Software Interprocess Communication Example
Screen Captures to Identify Source of Switch Software Resource Abort
Switch Software Error 52 -- Time
Switch Software Error 300000 – Memory
Software Error 501
Switch Software Error 1000 – Process Letters
NetPro Discussion Forums - Featured Conversations
Related Information

Introduction

This document explains WAN switch software errors that represent resource aborts or failures in interprocess communication on the Cisco BPX 8600 and IGX 8400 series switches.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

The information in this document is based on the software and hardware versions below.

  • Cisco BPX 8600

  • Cisco IGX 8400

The information presented in this document was created from devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If you are working in a live network, ensure that you understand the potential impact of any command before using it.

Conventions

For more information on document conventions, see the Cisco Technical Tips Conventions.

Explaining the WAN Switch Software Resource Aborts

Interprocess communications allow for activities such as adding connections or running background tests. Resource aborts typically result in the Active processor card running out of memory blocks and switching to the Standby processor. If the switch from the Active to the Standby processor is not graceful, the node will rebuild.

BPX and IGX processor card types include:

  • BCC

  • BCC-3

  • BCC-4

  • NPM

  • NPM-32

  • NPM-64

Three resources required for switch software to execute include:

If one of these resources is depleted, the software could fail to perform a required operation and will generate an abort. Causes of switch software resource aborts include:

  • Problems at other nodes in the network that overload the aborted processor with messages and trigger an abort.

  • An application card on the node that overloads the aborted processor with messages and triggers an abort.

Switch Software Interprocess Communication Example

The processes used to add a connection are described below.

  1. The user interface process accepts and validates the command to add a connection. This process passes a letter to the transaction handler (TRNS) process. In this case, the term letter represents a message.

  2. The TRNS process requests the resource (RSRC) process to allocate a channel and then passes a letter to the network (NETW) process for transmission to the remote node.

  3. The letter received by the remote node is handled by the NETW process, which then passes it to the TRNS process at the remote node.

  4. The remote node TRNS process then requests the RSRC process to allocate a channel.

  5. Local and remote TRNS processes pass a letter to the failure handler (FAIL) process to log and print the event. The user will see the connection added in the screen display.

  6. Local and remote protocol (PROT) processes then log an entry in the event log of each node. The Cisco WAN Manager (CWM) database is then updated.

Screen Captures to Identify Source of Switch Software Resource Abort

To collect graceful switch and a node rebuild event from the node log, use the dsplog command and the CWM event log. Collecting node and network activity for one hour before the event will help determine the cause of the resource abort.

You can view switch software errors 52,3000000 and 1000 for the Active and Standby processor cards by issuing the following Service-level commands.

  • dspswlog – displays the Active processor software error log in summary

  • dspswlog d – displays the Active processor software error log in detail

  • dspswlog s – displays the Standby processor software error log in summary

  • dspswlog s d – displays the Standby processor software error log in detail

Use the following Service-level commands to determine the cause of the switch software errors.

  • cbstats – displays control bus statistics

  • cbstats a – displays abort data for control bus statistics

  • nwstats a – displays abort data for network statistics

  • dspmemblk – displays memory available for dynamic and static regions

  • dspqs a – displays abort data for processor queue statistics

  • dspprf – displays the status of the profiler facility on the Active processor

  • dspprf a – displays the abort status on the Active processor

  • dspprf t – displays total status on the Active processor

  • dspprfhist – displays 120 most recent samples in 20-second intervals on the Active processor

Use the following profiler display commands for the Standby processor card by appending the letter "s" at the end of normal commands:

  • dspprf s

  • dspprf a s

  • dspprf t s

The profiler is a facility on the IGX and BPX systems that collects and displays statistics characterizing resource usage within system software. Use the data to measure online performance, debugging, and post-mortem analysis.

Profiler screens collected immediately after a resource abort can help identify which process caused the abort. Switch software processes tracked by the profiler are listed by priority below. This information is available when you issue the dspprf command.

Process

Description

IDLE

This is not a process but an indication of the percentage of processor capability or real-time not being used.

RSRC

This process manages resources allocated to trunks and channels.

CBUS

This process handles communication between the processor card and other cards in the node.

NETW

This process sends and receives network messages to other nodes and to the Standby processor card.

TRNS

This process executes state tables and routes events, including timers, to the proper state tables.

FAIL

This process handles all failure notification events. It also handles the node log and logs information about CWM maintenance items.

SNMP

This process supports SNMP agent features. It handles SNMP communications and translates between the internal protocols and controls and SNMP commands.

PROT

This process handles the protocol for communication between the node and CWM.

TXIO

This process handles communication through the control and auxiliary ports.

ILMI

This process handles the ILMI protocol for ATM signaling in the BPX.

SUMM

If processes are configured not to be displayed using the cnfprf command, their statistics are summarized in this field.

Switch Software Error 52 -- Time

Switch software error 52 is also referred to as a Watchdog Timeout. Each switch software process has a minimum time interval in which it must execute. The minimum time interval appears under the Wdt Lim column on the screen after you issue the dspprf t command.

igx4       TN    Service         IGX 8420  9.3.11    May  23 2001 17:52 GMT 


Active           Last Cleared: Date/Time Not Set     Snapshot 
Current

Proc  Send Fails Wdt Cnt  Wdt Mrk Wdt Lim
IDLE      0         0        0         0
RSRC      0         0       10        60
CBUS      0         0       10       150
NETW      0         0       10       250
TRNS      0         0       10       350
FAIL      0         0       10       550
SNMP      0         0       10       850
PROT      0         0       10       900
TXIO      0         0       10       950
ILMI      0         0       10       650
SUMM      0        N/A      N/A      N/A  
                                                                                
Last Command: dspprf t

Each time the switch software process executes, it resets the watchdog count. If a high-level process such as control bus (CBUS) or network (NETW) is very busy and does not allow a lower-level process such as fail handler (FAIL) to execute, the watchdog count for the lower-level process will not reset. If the count hits the watchdog limit, the software assumes there is a problem in the higher-level process and attempts to resolve the issue by generating an abort.

Processor card IDLE or real-time availability may be depleted by a sustained high level of activity on the node such as handling a large number of network messages, which uses the NETW process, or by sending a very large number of configuration messages to cards, which uses the CBUS process.

Switch Software Error 300000 – Memory

There are pools of memory from which memory blocks are allocated. If a critical function requires memory but the memory is unavailable, the software then generates an abort action. A sustained high level of activity on a node can deplete memory, although a lack of memory is usually caused by:

  • memory leak in which the software fails to free memory

  • memory fragmentation in which there is not enough contiguous memory for the requested block size

A switch software error 300000 also results if a process overwrites the memory that has been allocated to another process.

Use the dspmemblk command to view the availability of memory and to check for memory fragmentation. The size of the largest contiguous memory block available for allocation in either the dynamic or static regions is highlighted below.

igx4       TN    Service         IGX 8420  9.3.20    May  24 2001 11:11 GMT 
        Max Block  # Available Blocks           Max Block  # Available Blocks
Region   in Bytes   MAX    -1k   -2k   Region   in Bytes    MAX    -1k   -2k
                                       
 STAT    102400      8      8     9    
 DYNM    102400      10     10    10   
 STTC    102400      8      8     8    
 POOL    102400      2      2     2    
 HIT1      *                           
 ST02    102400      10     10    10   
                                                                                                                                                        
   * Hitless Region: use dspfreelocal  
                                                                                
Last Command: dspmemblk

Software Error 501

If memory cannot be allocated for a noncritical function such as a user-requested display, a software error 501 is logged. For example, after enabling large numbers of Trivial File Transfer Protocol (TFTP) interval statistics, a switch may start logging multiple software error 501s and a software error 3000000. Eventually the switch spontaneously restarts with a resource abort 3000000.

Switch Software Error 1000 – Process Letters

Software processes communicate by using a real-time operating system to send letters. The number of letters available varies between 4,000 and 16,000, depending upon the switch software release. If a letter is not available, the software is unable to continue executing and aborts. Process letters may be depleted by a sustained high level of activity on the node. A high level of activity may prevent a lower-level process from reading letters in its queue and returning them to the operating system for reuse. Eventually the lower-level process may accumulate all the available letters. The higher-level process, which uses all available real-time, aborts if it cannot obtain a letter.

NetPro Discussion Forums - Featured Conversations

Networking Professionals Connection is a forum for networking professionals to share questions, suggestions, and information about networking solutions, products, and technologies. The featured links are some of the most recent conversations available in this technology.
NetPro Discussion Forums - Featured Conversations for WAN Switching
Network Infrastructure: WAN Routing and Switching

Related Information



Updated: Apr 30, 2009Document ID: 15034