Document ID: 15034
Contents
Introduction
Prerequisites
Requirements
Components Used
Conventions
Explaining the WAN Switch Software Resource Aborts
Switch Software Interprocess Communication Example
Screen Captures to Identify Source of Switch Software Resource Abort
Switch Software Error 52 -- Time
Switch Software Error 300000 – Memory
Software Error 501
Switch Software Error 1000 – Process Letters
NetPro Discussion Forums - Featured Conversations
Related Information
Introduction
This document explains WAN switch software errors that represent resource aborts or failures in interprocess communication on the Cisco BPX 8600 and IGX 8400 series switches.
Prerequisites
Requirements
There are no specific requirements for this document.
Components Used
The information in this document is based on the software and hardware versions below.
-
Cisco BPX 8600
-
Cisco IGX 8400
The information presented in this document was created from devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If you are working in a live network, ensure that you understand the potential impact of any command before using it.
Conventions
For more information on document conventions, see the Cisco Technical Tips Conventions.
Explaining the WAN Switch Software Resource Aborts
Interprocess communications allow for activities such as adding connections or running background tests. Resource aborts typically result in the Active processor card running out of memory blocks and switching to the Standby processor. If the switch from the Active to the Standby processor is not graceful, the node will rebuild.
BPX and IGX processor card types include:
-
BCC
-
BCC-3
-
BCC-4
-
NPM
-
NPM-32
-
NPM-64
Three resources required for switch software to execute include:
If one of these resources is depleted, the software could fail to perform a required operation and will generate an abort. Causes of switch software resource aborts include:
-
Problems at other nodes in the network that overload the aborted processor with messages and trigger an abort.
-
An application card on the node that overloads the aborted processor with messages and triggers an abort.
Switch Software Interprocess Communication Example
The processes used to add a connection are described below.
-
The user interface process accepts and validates the command to add a connection. This process passes a letter to the transaction handler (TRNS) process. In this case, the term letter represents a message.
-
The TRNS process requests the resource (RSRC) process to allocate a channel and then passes a letter to the network (NETW) process for transmission to the remote node.
-
The letter received by the remote node is handled by the NETW process, which then passes it to the TRNS process at the remote node.
-
The remote node TRNS process then requests the RSRC process to allocate a channel.
-
Local and remote TRNS processes pass a letter to the failure handler (FAIL) process to log and print the event. The user will see the connection added in the screen display.
-
Local and remote protocol (PROT) processes then log an entry in the event log of each node. The Cisco WAN Manager (CWM) database is then updated.
Screen Captures to Identify Source of Switch Software Resource Abort
To collect graceful switch and a node rebuild event from the node log, use the dsplog command and the CWM event log. Collecting node and network activity for one hour before the event will help determine the cause of the resource abort.
You can view switch software errors 52,3000000 and 1000 for the Active and Standby processor cards by issuing the following Service-level commands.
-
dspswlog – displays the Active processor software error log in summary
-
dspswlog d – displays the Active processor software error log in detail
-
dspswlog s – displays the Standby processor software error log in summary
-
dspswlog s d – displays the Standby processor software error log in detail
Use the following Service-level commands to determine the cause of the switch software errors.
-
cbstats – displays control bus statistics
-
cbstats a – displays abort data for control bus statistics
-
nwstats a – displays abort data for network statistics
-
dspmemblk – displays memory available for dynamic and static regions
-
dspqs a – displays abort data for processor queue statistics
-
dspprf – displays the status of the profiler facility on the Active processor
-
dspprf a – displays the abort status on the Active processor
-
dspprf t – displays total status on the Active processor
-
dspprfhist – displays 120 most recent samples in 20-second intervals on the Active processor
Use the following profiler display commands for the Standby processor card by appending the letter "s" at the end of normal commands:
-
dspprf s
-
dspprf a s
-
dspprf t s
The profiler is a facility on the IGX and BPX systems that collects and displays statistics characterizing resource usage within system software. Use the data to measure online performance, debugging, and post-mortem analysis.
Profiler screens collected immediately after a resource abort can help identify which process caused the abort. Switch software processes tracked by the profiler are listed by priority below. This information is available when you issue the dspprf command.
|
Process |
Description |
|---|---|
|
IDLE |
This is not a process but an indication of the percentage of processor capability or real-time not being used. |
|
RSRC |
This process manages resources allocated to trunks and channels. |
|
CBUS |
This process handles communication between the processor card and other cards in the node. |
|
NETW |
This process sends and receives network messages to other nodes and to the Standby processor card. |
|
TRNS |
This process executes state tables and routes events, including timers, to the proper state tables. |
|
FAIL |
This process handles all failure notification events. It also handles the node log and logs information about CWM maintenance items. |
|
SNMP |
This process supports SNMP agent features. It handles SNMP communications and translates between the internal protocols and controls and SNMP commands. |
|
PROT |
This process handles the protocol for communication between the node and CWM. |
|
TXIO |
This process handles communication through the control and auxiliary ports. |
|
ILMI |
This process handles the ILMI protocol for ATM signaling in the BPX. |
|
SUMM |
If processes are configured not to be displayed using the cnfprf command, their statistics are summarized in this field. |
Switch Software Error 52 -- Time
Switch software error 52 is also referred to as a Watchdog Timeout. Each switch software process has a minimum time interval in which it must execute. The minimum time interval appears under the Wdt Lim column on the screen after you issue the dspprf t command.
igx4 TN Service IGX 8420 9.3.11 May 23 2001 17:52 GMT
Active Last Cleared: Date/Time Not Set Snapshot
Current
Proc Send Fails Wdt Cnt Wdt Mrk Wdt Lim
IDLE 0 0 0 0
RSRC 0 0 10 60
CBUS 0 0 10 150
NETW 0 0 10 250
TRNS 0 0 10 350
FAIL 0 0 10 550
SNMP 0 0 10 850
PROT 0 0 10 900
TXIO 0 0 10 950
ILMI 0 0 10 650
SUMM 0 N/A N/A N/A
Last Command: dspprf t
Each time the switch software process executes, it resets the watchdog count. If a high-level process such as control bus (CBUS) or network (NETW) is very busy and does not allow a lower-level process such as fail handler (FAIL) to execute, the watchdog count for the lower-level process will not reset. If the count hits the watchdog limit, the software assumes there is a problem in the higher-level process and attempts to resolve the issue by generating an abort.
Processor card IDLE or real-time availability may be depleted by a sustained high level of activity on the node such as handling a large number of network messages, which uses the NETW process, or by sending a very large number of configuration messages to cards, which uses the CBUS process.
Switch Software Error 300000 – Memory
There are pools of memory from which memory blocks are allocated. If a critical function requires memory but the memory is unavailable, the software then generates an abort action. A sustained high level of activity on a node can deplete memory, although a lack of memory is usually caused by:
-
memory leak in which the software fails to free memory
-
memory fragmentation in which there is not enough contiguous memory for the requested block size
A switch software error 300000 also results if a process overwrites the memory that has been allocated to another process.
Use the dspmemblk command to view the availability of memory and to check for memory fragmentation. The size of the largest contiguous memory block available for allocation in either the dynamic or static regions is highlighted below.
igx4 TN Service IGX 8420 9.3.20 May 24 2001 11:11 GMT
Max Block # Available Blocks Max Block # Available Blocks
Region in Bytes MAX -1k -2k Region in Bytes MAX -1k -2k
STAT 102400 8 8 9
DYNM 102400 10 10 10
STTC 102400 8 8 8
POOL 102400 2 2 2
HIT1 *
ST02 102400 10 10 10
* Hitless Region: use dspfreelocal
Last Command: dspmemblk
Software Error 501
If memory cannot be allocated for a noncritical function such as a user-requested display, a software error 501 is logged. For example, after enabling large numbers of Trivial File Transfer Protocol (TFTP) interval statistics, a switch may start logging multiple software error 501s and a software error 3000000. Eventually the switch spontaneously restarts with a resource abort 3000000.
Switch Software Error 1000 – Process Letters
Software processes communicate by using a real-time operating system to send letters. The number of letters available varies between 4,000 and 16,000, depending upon the switch software release. If a letter is not available, the software is unable to continue executing and aborts. Process letters may be depleted by a sustained high level of activity on the node. A high level of activity may prevent a lower-level process from reading letters in its queue and returning them to the operating system for reuse. Eventually the lower-level process may accumulate all the available letters. The higher-level process, which uses all available real-time, aborts if it cannot obtain a letter.
NetPro Discussion Forums - Featured Conversations
| NetPro Discussion Forums - Featured Conversations for WAN Switching |
| Network Infrastructure: WAN Routing and Switching |
Related Information
- Cisco WAN Switching Solutions - Cisco Documentation
- Guide to New Names and Colors for WAN Switching Products
- Downloads - WAN Switching Software
- Technical Support - Cisco Systems
| Updated: Apr 30, 2009 | Document ID: 15034 |
