Cisco 10000 Series Router Software Configuration Guide
Cisco 10000 Series Router PXF Stall Monitor
Downloads: This chapterpdf (PDF - 444.0KB) The complete bookPDF (PDF - 16.03MB) | Feedback

Cisco 10000 Series Router PXF Stall Monitor

Table Of Contents

Cisco 10000 Series Router PXF Stall Monitor

Feature History of Cisco 10000 Series Router PXF Stall Monitor

Information about Cisco 10000 Series Router PXF Stall Monitor

Recovery Actions

Restrictions for Cisco 10000 Series Router PXF Stall Monitor

Configuring Cisco 10000 Series Router PXF Stall Monitor

Configuration Example of Cisco 10000 Series Router PXF Stall Monitor


Cisco 10000 Series Router PXF Stall Monitor


In Cisco IOS Release 12.2(33)XNE, the Cisco 10000 series routers include a mechanism that verifies whether Parallel Express Forwarding (PXF) can forward packet traffic.

A PXF forwards traffic and a fault in the PXF can cause the traffic to silently come to a halt. Any fault in the PXF can cause the traffic to halt and cause the router to drop packets without updating the error counters. These faults in the PXF can also affect the devices that are connected to the router. Manual intervention is required to recover from silent PXF failures.

The Cisco 10000 Series Router PXF Stall Monitor feature monitors PXF forwarding by getting information about forwarding traffic and any stalls on the PXF-route processor (RP), PXF-line card (LC), and LC-PXF paths.

This chapter includes the following topics:

Feature History of Cisco 10000 Series Router PXF Stall Monitor

Information about Cisco 10000 Series Router PXF Stall Monitor

Restrictions for Cisco 10000 Series Router PXF Stall Monitor

Configuring Cisco 10000 Series Router PXF Stall Monitor

Configuration Example of Cisco 10000 Series Router PXF Stall Monitor

Feature History of Cisco 10000 Series Router PXF Stall Monitor

Cisco IOS Release
Description
Required PRE

12.2(33)XNE

This feature was introduced in Cisco 10000 series routers.

PRE3 and PRE4


Information about Cisco 10000 Series Router PXF Stall Monitor

The Cisco 10000 Series Router PXF Stall Monitor feature periodically verifies the basic forwarding capability of the PXF. The PXF stall monitor (PSM) is present on both active and standby PREs but operates on only the active PRE. The PSM in a standby PRE starts detection as soon as the PRE becomes active. An user can turn the PSM on or off without disrupting the working of the router.

Prior to Cisco IOS Release 12.2(33)XNE, Cisco 10000 series routers could detect the following stall conditions on the packet data path.

PXF stall—On the LC to PXF path, shown in Figure 17-1, if there are no packets passing between Cobalt3 and the PXF, the PXF stalls. The stall occurs when a packet is not completely read from the Internal Packet Memory (IPM); therefore, the entry remains in the Cobalt3 New Work Queue (NWQ). This entry is marked as the top entry and stays there. To avoid the stall, there is a 24-bit counter in Cobalt3 NWQ that counts the number of clock ticks outstanding for the top entry of each Cobalt3 NWQ. When the programmable threshold is reached, an event is generated to RP that reloads the PXF.

Ironbus (IB) stuck pause requestOn the PXF to LC path, shown in Figure 17-2, an IB Stuck Pause Request error is detected by the Nickel driver. This error occurs when the LC egress data path does not function correctly. When this error occurs the Nickel driver restarts the Iron Bus. If the error repeats from the same LC within 4 seconds, the entire LC is reloaded.

Control path—The RP and LCs are connected on the Backplane Ethernet (BPE). Through BPE, the RP monitors the condition of the LCs, using keepalives packets that are sent every second from the LCs to RP. If the keepalives packet does not reach RP on time, the corresponding LC gets reloaded.

Figure 17-1 Data Path - LC to PXF

Figure 17-2 Data Path - PXF to LC/RP

In certain cases, the PSM detects stalls due to failures in the following data flow paths:

PXF to LC

LC to PXF

PXF to RP

Stall detection is based on threshold values. For example, to avoid false alarms, the number of times a stall condition must be detected, before it is declared as a stall, is set as a threshold value. The PSM communicates the stall condition using syslog messages.

The new stall conditions are:

LC stall that is fixed by resetting the LC.

RP queue stall that is fixed by resetting the Hyper Transport Data Protocol (HTDP) interface.

Recovery Actions

The Cisco 10000 Series Router PXF Stall Monitor feature provides the following primary recovery actions:

LC restart

HTDP reset

The PRE switchover is a secondary action that is taken after the configured number of primary recovery actions are executed. Table 17-1 displays the recovery time taken for the primary and secondary actions.

Table 17-1

Actions
Time taken

LC restart

81 seconds with minimum configuration on LC. The LC takes 48 seconds to come up and the user interface takes 33 seconds to come up. If line card redundancy is supported, time can be brought down to less than 5 seconds.

HTDP reset

4 seconds with minimum configuration. Shutdown takes 2 seconds and HTDP takes another 2 seconds to come up. With scaled configuration, the time can go up to 20 seconds.

PRE switchover

2 seconds, if SSO switchover occurs.


Recovery Time Taken for Primary and Secondary Actions

Restrictions for Cisco 10000 Series Router PXF Stall Monitor

The Cisco 10000 Series Router PXF Stall Monitor feature has the following restrictions:

For LCs using Vanadium and Nickel10G chips, there is enhanced detection logic using the LC FIB/TIB counter provided by Vanadium/Nickel10G. However, for LCs using Barium chips, the LC FIB/TIB counter is not used in the detection logic. Therefore, LC-PXF stall detection is not possible with LCs using the Barium chip.

PSM is not supported on PRE1 and PRE2.

The recovery time for a LC with redundancy is 6 seconds per stall.

Only GigE SPA cards are supported. The LC to PXF stall detection is supported on SIP-600, channelized STM-1, channelized OC12 and half-height Gigabit Ethernet line cards.

PSM cannot detect a LC that gets stalled before interfaces come up on the LC.

PSM cannot detect stalls in LCs where multiple link bundling is done across LCs.

PSM state in the active PRE is not synchronized with the standby PRE, even in SSO mode. Therefore, during a switchover from active PRE to standby PRE, there is a delay in detecting a stall that occurred during the switchover.

Management ports available on the PRE is not monitored by PSM.

Configuring Cisco 10000 Series Router PXF Stall Monitor

This section describes how to configure and monitor the Cisco 10000 Series Router PXF Stall Monitor feature.

SUMMARY STEPS

1. enable

2. configure terminal

3. hw-module pxf stall-monitoring

4. hw-module pxf stall-monitoring [HT-Reset 4-6 | LC-Reset 4-6]

5. exit

6. show pxf stall-monitoring [counters | reset {active-status | cob-fib | cob-tib | pxf-drop} subslot sub slot]

DETAILED STEPS

 
Command or Action
Purpose

Step 1 

enable

Example:

Router> enable

Enables privileged EXEC mode.

Enter your password if prompted.

Step 2 

configure terminal

Example:

Router# configure terminal

Enters the global configuration mode.

Step 3 

hw-module pxf stall-monitoring
Example:

Router(config)# hw-module pxf stall-monitoring

Enables PSM on the Cisco 10000 series router. By default, the threshold values of LC and HTDP reset are set to 3. The default threshold values for LC and HTDP can be changed using the hw-module pxf stall-monitoring command again.

Step 4 

hw-module pxf stall-monitoring [HT-Reset <4-6> 
| LC-Reset <4-6>]
Example:

Router(config)# hw-module pxf stall-monitoring HT-Reset 4

Configures the threshold value for LC and HTDP reset before the secondary action, PRE switchover, takes place using the following parameters:

HT-Reset 4-6—Specifies the threshold value for HTDP reset. You can configure the value ranging from 4 to 6.

LC-Reset 4-6—Specifies the threshold value for LC reset. You can configure the value ranging from 4 to 6.

Step 5 

exit

Example:
Router(config)# exit

Exits the global configuration mode.

Step 6 

show pxf stall-monitoring [counters | reset {active-status | cob-fib | cob-tib | pxf-drop} subslot <sub slot>]


Example:

Router# show pxf stall-monitoring

Displays the current configuration and the operating status of the PSM. The command also display the number of stalls on the PSM after it was last enabled. You can use the following parameters:

counters—Displays statistical information for all the counters.

reset—Displays the following counters:

active-status—Displays the active status on the specified subslot.

cob-fib—Displays the Cobalt FIB counter on the specified subslot.

cob-tib—Displays the Cobalt TIB counter on the specified subslot.

pxf-drop—Displays the PXF per RSRC drop counter on the specified subslot.

subslot sub slot—Displays information about the specified subslot.

Configuration Example of Cisco 10000 Series Router PXF Stall Monitor

The following example shows how to configure and verify the PSM:

Router# show pxf stall-monitoring
pxf stall-monitoring : Disabled

Router# configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
Router(config)# hw-module pxf stall-monitoring
Router(config)# exit
Router# show pxf stall-monitoring
pxf stall-monitoring : Enabled
Stall History
=============
Stall Threshold Configuration
=============================
Primary Action = LC-reset Threshold = 3 (default)
Primary Action = HT-reset Threshold = 3 (default)

Secondary action = SSO SwitchOver
Router# configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
Router(config)# hw-module pxf stall-monitoring HT-Reset 5
Router(config)# hw-module pxf stall-monitoring LC-Reset 4
Router(config)# exit
Router# show pxf stall-monitoring
pxf stall-monitoring : Enabled
Stall History
=============
Stall Threshold Configuration
=============================
Primary Action = LC-reset Threshold = 4
Primary Action = HT-reset Threshold = 5

Secondary action = SSO SwitchOver
Router# show pxf stall-monitoring counters
To RP Counters
==============
IOS To RP Counter = 22299
PXF To RP Drop Counter = 0
Current Counter Values
======================
Slot 0 Subslot 0 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 0 Subslot 1 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 1 Subslot 0 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 1 Subslot 1 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 2 Subslot 0 Cob TIB = 102092182 Cob FIB = 6354 PXF Drop = 0
Slot 2 Subslot 1 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 3 Subslot 0 Cob TIB = 1204 Cob FIB = 0 PXF Drop = 0
Slot 3 Subslot 1 Cob TIB = 1280 Cob FIB = 0 PXF Drop = 0
Slot 3 Subslot 2 Cob TIB = 4975 Cob FIB = 10370 PXF Drop = 0
Slot 3 Subslot 3 Cob TIB = 5172 Cob FIB = 13840 PXF Drop = 0
Slot 5 Subslot 0 Cob TIB = 102077261 Cob FIB = 0 PXF Drop = 0
Slot 5 Subslot 1 Cob TIB = 19888 Cob FIB = 0 PXF Drop = 0
Slot 6 Subslot 0 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 6 Subslot 1 Cob TIB = 2486 Cob FIB = 0 PXF Drop = 0
Slot 7 Subslot 0 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 7 Subslot 1 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 8 Subslot 0 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Slot 8 Subslot 1 Cob TIB = 0 Cob FIB = 0 PXF Drop = 0
Line Card Participant Status
============================
Slot 1 Subslot 0 = 1
Slot 1 Subslot 1 = 0
Slot 2 Subslot 0 = 1
Slot 2 Subslot 1 = 1
Slot 3 Subslot 0 = 0
Slot 3 Subslot 1 = 1
Slot 3 Subslot 2 = 1
Slot 3 Subslot 3 = 0
Slot 5 Subslot 0 = 1
Slot 5 Subslot 1 = 1
Slot 6 Subslot 0 = 1
Slot 6 Subslot 1 = 1
Slot 7 Subslot 0 = 1
Slot 7 Subslot 1 = 0
Slot 8 Subslot 0 = 1
Slot 8 Subslot 1 = 0
Line Card Active Status
=======================
Slot 1 Subslot 0 = 0
Slot 1 Subslot 1 = 0
Slot 2 Subslot 0 = 1
Slot 2 Subslot 1 = 0
Slot 3 Subslot 0 = 0
Slot 3 Subslot 1 = 0
Slot 3 Subslot 2 = 0
Slot 3 Subslot 3 = 0
Slot 5 Subslot 0 = 0
Slot 5 Subslot 1 = 0
Slot 6 Subslot 0 = 0
Slot 6 Subslot 1 = 0
Slot 7 Subslot 0 = 0
Slot 7 Subslot 1 = 0
Slot 8 Subslot 0 = 0
Slot 8 Subslot 1 = 0