Table Of Contents
Troubleshooting Router Switch Fabric and Data Path
Understanding Switch Fabric Architecture
Getting Started with Fabric Troubleshooting
Troubleshooting Packet Drops
Displaying Traffic Status in Line Cards and RSP Cards
Locating Packet Drops by Examining Counters
Locating Drops of Punted Packets
Packet Drop from LC to LC
Packet Drop Between RSP and LC
Packet Drop After Certain Actions
Packet Drop After a Redundancy Switchover
Packet Drop with Unknown Reason
Troubleshooting RSP and LC Crashes
Active RSP Is Crashing
Standby RSP Is Crashing
LC Is Crashing
Troubleshooting Complete Loss of Traffic
No Traffic from LC to LC
No Traffic Between RSP and LC
Gathering Fabric Information Before Calling TAC
Troubleshooting Router Switch Fabric and Data Path
This chapter describes techniques to troubleshoot router switch fabric and data path. It includes the following sections:
•
Understanding Switch Fabric Architecture
•
Getting Started with Fabric Troubleshooting
•
Troubleshooting Packet Drops
•
Troubleshooting RSP and LC Crashes
•
Troubleshooting Complete Loss of Traffic
•
Gathering Fabric Information Before Calling TAC
Understanding Switch Fabric Architecture
Figure 7-1 provides an overview of the switch fabric architecture.
Figure 7-1 Switch Fabric Architecture
As shown in Figure 7-1, there are two fabric interface ASIC on each RSP. Each fabric interface ASIC provides 40 GB of throughput. If one RSP is lost, the shelf can still operate at full capacity without loss of bandwidth.
Each line card (LC) has four 23 GB fabric channels on which to send traffic to the fabric ASICs. The switch fabric is in an active/active relationship. All four fabric ASICs are active, even though the RSP cards are in an active/standby relationship. The system performs load balancing on unicast traffic across these four channels.
The arbiters are in an active/standby relationship (the arbiter on the active RSP card is the active arbiter). Both the active and standby arbiters receive requests for switch fabric access from the LCs. If there is a switchover of the active RSP, the standby RSP arbiter has a current copy of switch fabric requests, which helps to speed up the switchover.
Figure 7-2 shows the data path from ingress to egress. (Several types of LCs are shown in this example.)
Figure 7-2 Data Path
As shown in the drawing, the path travelled by each data packet is:
Incoming interface on LC--> NP mapped to incoming interface on LC --> Bridge3 on LC --> FIA on LC --> Crossbar switch on RSP --> FIA on LC ---> Bridge3 on LC ---> NP mapped to outgoing interface ---> Outgoing Interface
Note
In this document, the network processor ASICs are referred to either as network processors (NPs) or network processor units (NPUs).
Getting Started with Fabric Troubleshooting
To begin troubleshooting problems with the fabric, perform the following steps.
Step 1
Look for active platform fault manager (PFM) alarms on the LCs and RSPs.
Step 2
Check that you have the appropriate version of the bridge field-programmable gate arrays (FPGAs) in your RSP card.
Step 3
Check that you have the correct software version, board, and FPGA and ASIC versions.
RP/0/RSP0/CPU0:router# show version
RP/0/RSP0/CPU0:router# show inventory raw
RP/0/RSP0/CPU0:router# show hw-module fpd location all
Step 4
Check if there are any errors detected by the system diagnostics.
RP/0/RSP0/CPU0:router# show diag
Step 5
Check that you have the appropriate version of the NPs in your RSP cards.
RP/0/RSP0/CPU0:router# show controllers np summary all
----------------------------------------------------------------
[total 4 NP] Driver - Version 10.26a Build 9 ( Dec 13 2008, 20:47:03 )
NP 0 : Hardware rev v2 A1
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
NP 1 : Hardware rev v2 A1
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
NP 2 : Hardware rev v2 A1
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
NP 3 : Hardware rev v2 A1
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
Node: 0/2/CPU0: <-- [ LC built with A0 NPU that has known issue ]
----------------------------------------------------------------
[total 4 NP] Driver - Version 10.26a Build 9 ( Dec 13 2008, 20:47:03 )
NP 0 : Hardware rev v2 A0
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
NP 1 : Hardware rev v2 A0
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
NP 2 : Hardware rev v2 A0
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
NP 3 : Hardware rev v2 A0
: Ucode - Version: 255.255 Build Date: ( Dec 12 2008, 2:13:00 )
Troubleshooting Packet Drops
This section explains how to track packets through the system from ingress to egress, and how to troubleshoot packet drops. It includes the following sections:
•
Displaying Traffic Status in Line Cards and RSP Cards
•
Locating Packet Drops by Examining Counters
•
Locating Drops of Punted Packets
•
Packet Drop from LC to LC
•
Packet Drop Between RSP and LC
•
Packet Drop After Certain Actions
•
Packet Drop After a Redundancy Switchover
•
Packet Drop with Unknown Reason
Displaying Traffic Status in Line Cards and RSP Cards
Figure 7-3 shows the traffic path on the LC and the corresponding CLI commands you use to display the status at each point in the path.
Figure 7-3 LC Traffic Path and Corresponding CLI Commands
Figure 7-4 shows the traffic path on the RSP and the corresponding CLI commands you use to display information at each point in the path.
Figure 7-4 RSP Traffic Path and Corresponding CLI Commands
Locating Packet Drops by Examining Counters
To locate the source of packet drops, perform the following procedure.
SUMMARY STEPS
1.
Clear the interface counters
2.
Clear the NP counters
3.
Clear the fabric counters
4.
Start the traffic pattern that caused the packet drop
5.
Display the NP-to-interface mapping.
6.
Check the counters at the input interface
7.
Check the NP counters
8.
Check the NP Bridge3 counters
9.
Check the bridge counters
10.
Check the fabric interface ASIC (FIA) counters
11.
Check the crossbar counters
Note
For the procedure to troubleshoot drops of punted packets, see the Locating Drops of Punted Packets.
DETAILED STEPS
Step 1
Clear the interface counters.
RP/0/RSP0/CPU0:router# clear counters all
Clear "show interface" counters on all interfaces [confirm]
Step 2
Clear the NP counters.
RP/0/RSP0/CPU0:router# clear controller np counters all
Step 3
Clear fabric counters.
a.
Clear FIA and bridge counters on the LC and RSP.
RP/0/RSP0/CPU0:router# clear controller fabric fia location
b.
Clear fabric crossbar counters.
RP/0/RSP0/CPU0:router# clear controller fabric crossbar-counters location
Step 4
Start the traffic pattern that caused the packet drop.
Step 5
Run the following command to display the NP-to-interface mapping.
RP/0/RSP0/CPU0:router# show controllers np ports all
Step 6
Check the counters at the input interface.
RP/0/RSP0/CPU0:router# show interfaces type location
Step 7
Check the NP counters to verify that traffic is flowing in NP counters along the data path.
RP/0/RSP0/CPU0:router# show controllers np counters {np0|np1|np2|np3|all} location node-id
{| include DROP}
RP/0/RSP0/CPU0:router# show controllers np counters np3 location 0/0/CPU0
RP/0/RSP0/CPU0:router# show controllers np counters np3 location 0/0/CPU0 | include DROP
The show controllers np command displays information about counters that helps you troubleshoot drops in the LCs. The names of the internal NP counters have the general format STAGE_DIRECTION_ACTION, for example, PARSE_FABRIC_RECEIVE_CNT, RESOLVE_EGRESS_DROP_CNT, and MODIFY_FRAMES_PADDED_CNT.
The values of stage, directon, and action are as follows:
•
There are five stages in the NP:
–
Parse
–
Search-I
–
Modify
–
Search-II
–
Resolve
•
Examples of the direction are:
–
Ingress
–
Egress
–
Next_hop
•
Examples of the action are:
–
Drop_count
–
Down
There are additional counters, such as DROP, PUNT, and DIAGS, that provide important information but are not associated with a specific internal NP stage. Drop and punt counters are kept as an aggregate total per stage.
Example
RP/0/RSP0/CPU0:router# show controllers np ports all
Thu Jan 1 02:18:48.264 UTC Node: 0/0/CPU0:
----------------------------------------------------------------
-- ------ --- ---------------------------------------------------
0 1 0 GigabitEthernet0/0/0/30 - GigabitEthernet0/0/0/39
1 1 0 GigabitEthernet0/0/0/20 - GigabitEthernet0/0/0/29
2 0 0 GigabitEthernet0/0/0/10 - GigabitEthernet0/0/0/19
3 0 0 GigabitEthernet0/0/0/0 - GigabitEthernet0/0/0/9
RP/0/RSP0/CPU0:router# show interfaces tenGigE 0/1/0/0
Thu Jan 1 01:10:01.908 UTC
TenGigE0/1/0/0 is up, line protocol is up
Interface state transitions: 1
Hardware is TenGigE, address is 001e.bdfd.1736 (bia 001e.bdfd.1736)
MTU 1514 bytes, BW 10000000 Kbit
reliability 255/255, txload 0/255, rxload 0/255
Full-duplex, 10000Mb/s, LR, link type is force-up
output flow control is off, input flow control is off
ARP type ARPA, ARP timeout 04:00:00
Last clearing of "show interface" counters never
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
0 packets input, 0 bytes, 0 total input drops
0 drops for unrecognized upper-level protocol
Received 0 broadcast packets, 0 multicast packets
0 runts, 0 giants, 0 throttles, 0 parity
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 packets output, 0 bytes, 0 total output drops
Output 0 broadcast packets, 0 multicast packets
0 output errors, 0 underruns, 0 applique, 0 resets
0 output buffer failures, 0 output buffers swapped out
In the following example, there were some ingress and egress drops in the RESOLVE stage. All of these drops in the ingress (9 drops) and egress (6 drops) were caused by the next hop being unreachable (a total of 15 drops for IPv4 next hop down).
RP/0/RSP0/CPU0:router# show controllers np counters np3 location 0/0/CPU0 | include DROP
Mon Nov 15 12:18:35.289 EST
30 RESOLVE_INGRESS_DROP_CNT 9 0
31 RESOLVE_EGRESS_DROP_CNT 6 0
295 DROP_IPV4_NEXT_HOP_DOWN 15 0
The following example shows a typical output from the same command, but without the modifier | include DROP.
RP/0/RSP0/CPU0:router# show controllers np counters np3
Mon Nov 15 12:20:35.289 EST
----------------------------------------------------------------
Show global stats counters for NP3, revision v3
Read 20 non-zero NP counters:
Offset Counter FrameValue Rate (pps)
-------------------------------------------------------------------------------
23 PARSE_FABRIC_RECEIVE_CNT 417 0
30 RESOLVE_INRESS_DROP_CNT 9 0
31 RESOLVE_EGRESS_DROP_CNT 6 0
53 MODIFY_FRAMES_PADDED_CNT 3230 0
67 PARSE_MOFRR_SWITCH_MSG_RCVD_FROM_FAB 920 0
70 RESOLVE_INGRESS_L2_PUNT_CNT 1081 0
71 RESOLVE_EGRESS_L3_PUNT_CNT 4613 0
74 RESOLVE_LEARN_FROM_NOTIFY_CNT 3484 0
75 RESOLVE_BD_FLUSH_DELETE_CNT 104 0
83 RESOLVE_MOFRR_HASH_UPDATE_CNT 463 0
87 RESOLVE_MOFRR_SWITCH_MSG_INGNORED 407 0
295 DROP_IPV4_NEXT_HOP_DOWN 15 0
Step 8
Check the NP Bridge3 counters.
RP/0/RSP0/CPU0:router# show controllers np fabric-counters all ?
RP/0/RSP0/CPU0:router# show controllers np fabric-counters all <np instance or all>
location <location>
RP/0/RSP0/CPU0:router# show controllers np fabric-counters all np3 location 0/5/CPU0
Check the NP-bridge rx/tx counters for each NP on the LC. View the packet sent and received counts, bytes transferred, packet counters categorized by packet size, and so forth. The fields of interest are:
xaui_a_t_transmited_packets_cnt: The number of packets sent by the NP to the bridge
xaui_a_r_received_packets_cnt: The number of packets sent by the bridge to the NP
Step 9
Check the bridge counters
RP/0/RSP0/CPU0:router# show controllers fabric fia bridge stats location node-id
Examples
RP/0/RSP0/CPU0:router# show controllers fabric fia bridge stats location 0/RSP0/CPU0
Mon Nov 22 14:14:48.010 PST
Device Rx Interface Packet Error Threshold
--------------------------------------------------------------------------------
Bridge0 From-Fabric(DDR) 492283 0 0
RP/0/RSP0/CPU0:router# show controllers fabric fia bridge stats location 0/1/CPU0
Mon Nov 22 14:18:54.834 PST
UC - Unicast , MC - Multicast
LP - LowPriority , HP - HighPriority
--------------------------------------------------------------------------------
Cast/ Packet Packet Error Threshold
Prio Direction Count Drops Drops
--------------------------------------------------------------------------------
UC HP Fabric to NP-0 70329 0 0
UC LP Fabric to NP-0 0 0 0
UC HP Fabric to NP-1 70329 0 0
UC LP Fabric to NP-1 0 0 0
UC HP Fabric to NP-2 70329 0 0
UC LP Fabric to NP-2 0 0 0
UC HP Fabric to NP-3 70329 0 0
UC LP Fabric to NP-3 0 0 0
----------------------------------------------------------------
UC Total Egress 281316 0 0
MC HP Fabric to NP-0 0 0 0
MC LP Fabric to NP-0 0 0 0
MC HP Fabric to NP-1 0 0 0
MC LP Fabric to NP-1 0 0 0
MC HP Fabric to NP-2 0 0 0
MC LP Fabric to NP-2 0 0 0
MC HP Fabric to NP-3 0 0 0
MC LP Fabric to NP-3 0 0 0
---------------------------------------------------------------
--------------------------------------------------
UC HP NP-0 to Fabric 70329
UC HP NP-1 to Fabric 70329
UC HP NP-2 to Fabric 70329
UC HP NP-3 to Fabric 70329
--------------------------------------------------
--------------------------------------------------
Ingress Drop Stats (MC & UC combined)
**************************************
PriorityPacket Error Threshold
--------------------------------------------------
--------------------------------------------------
Step 10
Check the FIA counters
RP/0/RSP0/CPU0:router# show controllers fabric fia stats location location
RP/0/RSP0/CPU0:router# show controllers fabric fia stats location 0/RSP0/CPU0
Wed Aug 25 12:36:43.151 DST
FIA:0 DDR Packet counters:
=========================
FIA:0 SuperFrame counters:
=========================
To Unicast Xbar[0] 821335
To MultiCast Xbar[0] 7758
To MultiCast Xbar[2] 15807
From Unicast Xbar[0] 629854
From MultiCast Xbar[0] 2589
From MultiCast Xbar[2] 2588
FIA:0 Total Drop counters:
=========================
RP/0/RSP0/CPU0:router# show controllers fabric fia stats location 0/2/CPU0
FIA:0 DDR Packet counters:
=========================
FIA:0 SuperFrame counters:
=========================
FIA:0 Total Drop counters:
=========================
RP/0/RSP0/CPU0:router# show controllers fabric fia q-depth [location location]
Thu Jan 1 02:16:37.227 UTC
Total Pkt queue depth count = 0
Step 11
Check the crossbar counters to make sure there are no dropped packets.
RP/0/RSP0/CPU0:router# show controllers fabric crossbar statistics instance [0|1] location
location
RP/0/RSP0/CPU0:router# show controllers fabric crossbar statistics instance 0 location
0/RSP0/CPU0
Location: 0/RSP0/CPU0 (physical slot 4)
Fabric info for node 0/RSP0/CPU0 (physical slot: 4)
Dropped packets : mcast unicast
+---------------------------------------------------------------+
Xbar timeout buf bp pkts : 0 0
Locating Drops of Punted Packets
To locate drops of punted packets, perform the following procedure.
SUMMARY STEPS
1.
Clear all packet counters
2.
Start traffic
3.
Check traffic counters at each component
4.
Check NP counters for NP mapping to interface, and check NP0 for inject packet count
5.
Check fabric-related counters
6.
Check punt FPGA counters
DETAILED STEPS
Step 1
Clear all packet counters as described in the "Locating Packet Drops by Examining Counters" section.
Step 2
Start traffic.
Step 3
Check traffic counters at each component in the punted packet path. Use a procedure similar to the one described in the "Locating Packet Drops by Examining Counters" section. However, for punted packets, the data path is:
Incoming Interface --> NP --> LC CPU --> NP --> Bridge3 --> LC FIA --> RSP Crossbar--> Punt FPGA on RSP --> RSP CPU --> RSP FIA --> RSP Crossbar --> LC FIA --> LC CPU --> NP0 ---> LC FIA ---> Crossbar ---> RSP FIA ---> RSP CPU
Step 4
Check the NP counters for NP mapping to interface, and check NP0 for the inject packet count. The following fields provide information on the NP counters:
801 PARSE_FABRIC_RECEIVE_CNT
820 PARSE_LC_INJECT_TO_FAB_CNT
872 RESOLVE_INGRESS_L2_PUNT_CNT
970 MODIFY_FABRIC_TRANSMIT_CNT
822 PARSE_FAB_INJECT_IPV4_CNT
Step 5
Check the fabric-related counters for any packet drops.
RP/0/RSP0/CPU0:router# show controllers fabric crossbar statistics instance 0 location
0/RSP0/CPU0
RP/0/RSP0/CPU0:router# show controllers fabric fia stats [location location]
RP/0/RSP0/CPU0:router# show controllers fabric fia stats location 0/5/CPU0
RP/0/RSP0/CPU0:router# show controllers fabric fia bridge stats [location location]
RP/0/RSP0/CPU0:router# show controllers fabric fia bridge stats location 0/RSP0/CPU0
Wed Aug 25 14:12:03.916 DST
Device Rx Interface Packet Error Threshold
--------------------------------------------------------------------------------
Bridge0 From-Fabric(DDR) 603698 0 0
RP/0/RSP0/CPU0:router# show controllers fabric fia bridge stats location 0/5/CPU0
Wed Aug 25 14:12:20.867 DST
UC - Unicast , MC - Multicast
LP - LowPriority , HP - HighPriority
--------------------------------------------------------------------------------
Cast/ Packet Packet Error Threshold
Prio Direction Count Drops Drops
--------------------------------------------------------------------------------
UC HP Fabric to NP-0 28 0 0
UC LP Fabric to NP-0 0 0 0
UC HP Fabric to NP-1 28 0 0
UC LP Fabric to NP-1 0 0 0
UC HP Fabric to NP-2 28 0 0
UC LP Fabric to NP-2 0 0 0
UC HP Fabric to NP-3 28 0 0
UC LP Fabric to NP-3 0 0 0
----------------------------------------------------------------
MC HP Fabric to NP-0 205 0 0
MC LP Fabric to NP-0 2 0 0
MC HP Fabric to NP-1 205 0 0
MC LP Fabric to NP-1 2 0 0
MC HP Fabric to NP-2 205 0 0
MC LP Fabric to NP-2 2 0 0
MC HP Fabric to NP-3 205 0 0
MC LP Fabric to NP-3 2 0 0
---------------------------------------------------------------
Step 6
To check for packets punted to and injected from the LC or RP CPU, run the following commands.
RP/0/RSP0/CPU0:router# show spp interface location node-id
RP/0/RSP0/CPU0:router# show spp node-counters location node-id
RP/0/RSP0/CPU0:router# show spp node location node-id
RP/0/RSP0/CPU0:router# show spp sid stats location node-id
RP/0/RSP0/CPU0:router# show spp client location node-id
Note
To clear the spp counters, run the command clear spp {client | interface | node-counters} location node-id. This command clears client statistics, interface statistics, and per-node counters, depending on the keyword you use.
Step 7
To query the punt switch for the statistics on the LC CPU, run the following command.
RP/0/RSP0/CPU0:router# show controllers punt-switch switch-stats location node-id
Packet Drop from LC to LC
In this scenario, you have configured the system, RSP and LC have come up and are stable, LC to LC traffic is going through, but some packets are dropped.
The possible causes are:
•
Traffic dropped at interface
•
Traffic dropped at NP3
•
Traffic dropped at bridge
•
Traffic dropped at the fabric I/O
•
Synchronization between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Oversubscribed traffic
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
If not already done, perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the LC.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Collect configuration information.
Step 4
dump PFM errors on both source and destination LC.
show pfm location <0/1/cpu0>
Step 5
Collect the fabric I/O/Bridge counters on both source and destination card.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
Step 6
Collect redundancy information.
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at LC ROMMON and reboot the LC again to see if this clears the problem.
Step 2
Pull out the LC and reinsert it to see if it can boot up.
Step 3
Stop other streams of traffic to see if this failed stream can go through.
Step 4
Reduce the rate of the traffic to see if the drop continues.
Packet Drop Between RSP and LC
In this scenario, you have configured the system, RSP and LC have come up and are stable, but one of the following problems occurred:
•
Protocol or ping traffic (punt path traffic) has some drops
•
Initially the ping/protocol packets are not going through, but later recover.
The possible causes are:
•
Traffic dropped at interface
•
Traffic dropped at NP3
•
Traffic dropped at bridge
•
Traffic dropped at the fabric I/O
•
Sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Traffic drop at Punt FPGA
•
sn database sync issue
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
If not already done, perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the linecard.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Collect configuration information.
Step 4
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
show pfm location <0/rsp0/cpu0>
Step 5
Collect the fabric I/O/bridge counters on both RSP and LC.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
show controllers fabric fia stats location 0/rsp0/CPU0
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at LC ROMMON and reboot the LC again to see if this clears the problem.
Step 2
Pull out the LC and reinsert it to see if it can boot up.
Step 3
Stop other streams of traffic to see if this failed stream can go through.
Step 4
Determine whether the drop is a single burst in the beginning or is continuous.
Step 5
Determine if the drop is associated with particular packet size.
Packet Drop After Certain Actions
In this scenario, the system is configured, RSP and LC have come up, and traffic is flowing properly for some time. However, after certain action such as configuration change, online insertion and removal (OIR) of LC/RSP, LC reload, or software upgrade, some traffic drop or complete traffic loss is observed.
The possible causes are:
•
Traffic dropped at interface
•
Traffic dropped at NP3
•
Traffic dropped at bridge
•
Traffic dropped at the fabric I/O
•
Sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Traffic drop at Punt FPGA
•
sn database sync issue
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
Perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the linecard.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Collect configuration information.
Step 4
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
show pfm location <0/rsp0/cpu0>
Step 5
Collect the fabric I/O/bridge counters on both the RSP and LC.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
show controllers fabric fia stats location 0/rsp0/CPU0
Step 6
Collect redundancy information.
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at LC ROMMON and reboot the LC again to see if this clears the problem.
Step 2
Pull out the LC and reinsert it to see if it can boot up.
Step 3
Stop other streams of traffic to see if this failed stream can go through.
Step 4
Repeat Step 1 through Step 3 to determine whether the results are reproducible.
Packet Drop After a Redundancy Switchover
In this scenario, you have configured the system, RSP and LC have come up, and traffic is flowing properly for some time. However, after a switchover (by a command or OIR), you see some traffic drop or complete traffic loss.
The possible causes are:
•
Traffic dropped at interface
•
Traffic dropped at NP3
•
Traffic dropped at bridge
•
Traffic dropped at the fabric I/O
•
Sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Traffic drop at Punt FPGA
•
sn database sync issue
•
Fabric is stuck
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
Perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the linecard before and after the switchover.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Collect configuration information.
Step 4
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
show pfm location <0/rsp0/cpu0>
Step 5
Collect the fabric I/O/bridge counters on both the RSP and LC.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
show controllers fabric fia stats location 0/rsp0/CPU0
Step 6
Collect redundancy information.
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Stop other streams of traffic to see if this failed stream can go through again.
Step 2
Repeat Step 1 several times to determine if the result is reproducible.
Step 3
Perfom a switchover back to the other side to determine whether both directions are having the same traffic problems.
Step 4
After obtaining the necessary approvals from your network and system administrators (because this step will stop all traffic on this unit), reboot the entire system and check to see if it recovers.
Packet Drop with Unknown Reason
In this scenario, you have configured the system, RSP and LC have come up, and traffic is flowing properly for a significant time (at least several days). However, for an unknown reason, the system experiences traffic drops or complete traffic loss.
The possible causes are:
•
Traffic dropped at interface
•
Traffic dropped at NP3
•
Traffic dropped at bridge
•
Traffic dropped at the fabric I/O
•
Sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Traffic drop at Punt FPGA
•
Fabric is stuck
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
Perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the linecard before and after the switchover.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
show pfm location <0/rsp0/cpu0>
Step 4
Collect the fabric I/O/bridge counters on both the RSP and LC.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
show controllers fabric fia stats location 0/rsp0/CPU0
Step 5
Collect redundancy information.
Step 6
Check for drops on the the fabric I/O interface (FIA drop counters) on the LC in both the ingress (to fabric) and egress (from fabric) directions.
show controllers fabric fia drops egress location
show controllers fabric fia drops ingress location
show controllers fabric fia error egress location
show controllers fabric fia error ingress location
Step 7
Check for drops on the bridge. Counters are a combination of high priority (HP), low priority (LP), unicast, multicast, DDR, and DDR-threshold packets. They are furthur segregated into critical and informational based on their severity. All Ethernet linecards have 2 bridges. Use the following command to obtain this information.
show controllers fabric fia bridge stats location <linecard location>
Step 8
Check if there are any drops on Punt FPGA on RSP.
show controllers fabric fia bridge stats location 0/RSP0/CPU0
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Stop other streams of traffic to see if this failed stream can go through again.
Step 2
Reboot the LCs one at a time and check if the traffic recovers.
Step 3
After obtaining the necessary approvals from your network and system administrators (because this step will stop all traffic on this unit), reboot the entire system and check to see if it recovers.
Step 4
Reconfigure the system to see if it recovers.
Troubleshooting RSP and LC Crashes
This section explains how to troubleshoot the following problems:
•
Active RSP Is Crashing
•
Standby RSP Is Crashing
•
LC Is Crashing
Active RSP Is Crashing
In this scenario, the active RSP keeps crashing and the RSP console shows that the active fabric manager or fia_rsp (the fabric I/O process) terminates repeatedly.
The possible causes are:
•
Initialization of the fabric I/O fails for some reason
•
Fabric self-test fails
•
The synchronization between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
Perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the RSP card.
show controllers fabric fia link-status location <0/RSP0/CPU0>
show controllers fabric fia bridge sync-status location
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Dump the PFM errors for the card.
show pfm location <0/rsp0/cpu0>
Step 4
Collect the fabric I/O/Punt counters.
show controllers fabric fia stats location <0/rsp0/CPU0>
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at LC ROMMON and reboot the RSP again to see if this clears the problem.
Step 2
Pull out the RSP and reinsert it to see if it can boot up.
Step 3
Swap the slot (put the RSP card into the other RSP slot) and see if it can boot up properly.
Standby RSP Is Crashing
In this scenario, the active RSP is up and running, but the standby RSP keeps crashing. The RSP console shows that the standby fabric manager or fia_rsp (the fabric I/O process) terminates repeatedly.
The possible causes are:
•
Initialization of the standby fabric I/O fails for some reason
•
Fabric self-test on the standby card fails
•
The sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Communication between the active and standby card is not working
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
If not already done, perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the RSP card.
show controllers fabric fia link-status location <0/RSP0/CPU0>
Step 3
Dump the PFM errors for the card.
show pfm location <0/rsp0/cpu0>
Step 4
Dump the redundancy status.
Step 5
Collect the fabric I/O/ punt counters.
show controllers fabric fia stats location <0/1/CPU0>
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at the ROMMON and reboot the standby RSP again to see if this clears the problem.
Step 2
Pull out the RSP and reinsert it to see if it can boot up.
Step 3
Swap the slot (put the RSP card into the other RSP slot) and see if it can boot up properly.
LC Is Crashing
In this scenario, a LC keeps crashing and the RSP console shows that fia_lc (the fabric I/O process) terminates repeatedly.
The possible causes are:
•
Initialization of the LC fabric I/O fails for some reason
•
Fabric self-test on the LC fails
•
The synchronization between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Communication between the LC and the RSP is not working properly
•
There is a sync problem between the fabric I/O and the bridge
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
If not already done, perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of the fabric on the LC.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
Step 4
Collect the fabric I/O/ bridge counters.
show controllers fabric fia stats location <0/1/CPU0>
show controllers fabric fia bridge stats location <0/1/CPU0>
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at the LC ROMMON and reboot the LC again to see if this clears the problem.
Step 2
Pull out the LC and reinsert it to see if it can boot up.
Step 3
Swap the slot (pull out the LC and insert it into another LC slot) and see if it can boot up properly.
Step 4
Put a different LC of same type to see if that card can booting up properly.
Troubleshooting Complete Loss of Traffic
This section explains how to troubleshoot scenarios in which the system is active but traffic does not go through. It includes the following topics:
•
No Traffic from LC to LC
•
No Traffic Between RSP and LC
No Traffic from LC to LC
In this scenario, you have configured the system and the RSP and LC have come up and are stable, but no LC-to-LC traffic is going through.
The possible causes are:
•
Traffic dropped at the interface
•
Traffic dropped at NP3
•
Traffic dropped at the bridge
•
Traffic dropped at the fabric I/O
•
Sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
Perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the LC.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Collect configuration information.
Step 4
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
show pfm location <0/rsp0/cpu0>
Step 5
Collect the fabric I/O/bridge counters on both the source and destination cards.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
Step 6
Collect redundancy information.
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at the LC ROMMON and reboot the LC again to see if this clears the problem.
Step 2
Pull out the LC and reinsert it to see if it can boot up and carry traffic.
Step 3
Stop other streams of traffic to see if this failed stream can go through.
Step 4
Run online diagnostics to locate errors in the system. For additional information on diagnostics, see the "Using Diagnostic Commands" section on page 1-59.
No Traffic Between RSP and LC
In this scenario, you have configured the system and the RSP and LC have come up and are stable, but no protocol or ping traffic (punt path traffic) is going through.
The possible causes are:
•
Traffic dropped at the interface
•
Traffic dropped at NP3
•
Traffic dropped at the bridge
•
Traffic dropped at the fabric I/O
•
Sync between the fabric I/O and the fabric NP or fabric arbiter NP has a problem
•
Traffic has wrong vqi
•
Traffic dropped at the punt FPGA
•
Traffic dropped at the protocol level
•
Unknown failures
Locate the Problem and Take Corrective Action
Follow this procedure to locate the problem. After you locate the problem, take corrective action based on your findings. Corrective action might include, for example, configuration updates or hardware/software version upgrades.
Step 1
If not already done, perform the procedures in the "Getting Started with Fabric Troubleshooting" section to verify that you have the correct versions of the hardware and software.
Step 2
Collect the sync status of fabric on the LC.
show controllers fabric fia link-status location <0/1/CPU0>
show controllers fabric fia bridge ddr-status location <0/1/cpu0>
show controllers fabric fia bridge sync-status location 0/1/cpu0
Step 3
Collect configuration information.
Step 4
Dump the PFM errors for the card.
show pfm location <0/1/cpu0>
show pfm location <0/rsp0/cpu0>
Step 5
Collect the fabric I/O/bridge counters on both the RSP and LC.
show controllers np counters all
show controllers fabric fia stats location 0/1/CPU0
show controllers fabric fia bridge stats location 0/1/CPU0
show controllers fabric fia stats location 0/rsp0/CPU0
Step 6
Collect redundancy information.
Where to Go Next
If you have not been able to locate or correct the problem, you might be able to clear it by performing the following steps. However, these steps might delete information that would help you perform additional troubleshooting with Cisco Technical Support. Some of the steps involve stopping or reducing traffic streams, which might not be appropriate on a deployed system. Consult with your network administrator before you perform any of these steps.
Caution 
Before you follow these next steps, consider contacting Cisco Technical Support. Some of these steps can cause loss of data that would be useful for future analysis and troubleshooting, or could cause loss of traffic.
Step 1
Perform `reset -h' at the LC ROMMON and reboot the LC again to see if this clears the problem.
Step 2
Pull out the LC and reinsert it to see if it can boot up and carry traffic.
Step 3
Pull out the RSP card and reinsert it to see if it can boot up and carry traffic.
Step 4
Stop other streams of traffic to see if this failed stream can go through.
Step 5
Run online diagnostics to locate errors in the system. For additional information on diagnostics, see the "Using Diagnostic Commands" section on page 1-59.
Gathering Fabric Information Before Calling TAC
If you need support from Cisco to troubleshoot the fabric, we recommend that you gather the following information if time permits:
•
Output of the following commands (this will display software version, and the line card, fabric card, FPGA, and ASIC versions)
show hw-module fpd location
•
Information on chassis type
•
Platform-related information
•
Ingress interface(s), egress interface(s), and expected packet path
•
Drop counters
•
Logs (capture all logs on the RSP console port)