Troubleshooting Line Cards and Interface Modules
This chapter discusses troubleshooting faults on the following Cisco uBR10012 line cards:
•General Information for Troubleshooting Line Card Crashes
•Troubleshooting the Timing, Communication, and Control Plus Card
•Troubleshooting the DOCSIS Timing, Communication, and Control Card
•Troubleshooting the Gigabit Ethernet Line Card
•Troubleshooting the Cable Interface Line Cards
•Troubleshooting the SIP and SPA Interface Modules
General Information for Troubleshooting Line Card Crashes
Line card crashes occur when the hardware or software encounter unexpected situations that are not expected in the current design. As a general rule, they usually indicate a configuration error, a software error, or a hardware problem.
Table 4-1 lists the show commands that are most useful in collecting information to troubleshoot line card crashes.
Table 4-1 Relevant Show Commands for Line Card Crashes
|
|
show version |
Provides general information about the system's hardware and software configurations |
show logging |
Displays the general logs of the router |
show diag [slot/subslot] |
Provides specific information about a particular slot: type of engine, hardware revision, firmware revision, memory configuration, and so on. |
show context [summary | slot [slot/subslot] ] |
Provides context information about the most recent crashes. This is often the most useful command for troubleshooting line card crashes. |
Use the following procedure if you suspect that a line card has crashed.
Step 1 If you can identify the particular card that has crashed or is experiencing problems, first use the other sections in this chapter to perform basic troubleshooting. In particular, ensure that the line card is fully inserted into the proper slot, and that all cables are properly connected.
Step 2 If any system messages were displayed on the console or in the SYSLOG logs at the time of the crash, consult the Cisco IOS System Messages Guide for possible suggestions on the source of the problem.
Step 3 Line cards can crash or appear to crash when an excessive number of debug messages are being generated. In particular, this can happen when using the verbose or detail mode of a debug command, or if the debug command is dumping the contents of packets or packet buffers. If the console contains a large volume of debug output, turn off all debugging with the no debug all command.
Step 4 If the system message log contains messages that indicate the line card is not responding (for example, %IPCOIR-3-TIMEOUT), and the card's LEDs are not lit, the line card might have shut down because of overheating. Ensure that all chassis slots either have the proper card or module installed in them. If a slot is blank, ensure that the slot has a blank front panel installed, so that proper airflow and cooling can be maintained in the chassis.
Step 5 Use the show context summary command to identify all of the line cards that have experienced a crash:
Router# show context summary
1 - crash at 13:52:57 UTC Wed Nov 24 2010
2 - crash at 13:49:03 UTC Wed Nov 24 2010
1 - crash at 13:56:08 UTC Wed Nov 24 2010
Step 6 After identifying the particular card that crashed, use the show context command again to display more information about the most recent crash. For example:
Router# show context slot 5/0
CRASH INFO: Slot 5/0, Index 1, Crash at 13:52:57 UTC Wed Nov 24 2010
10000 Software (UBR10KCLC-LCK8-M), Version 12.2(32.8.12)SCE EXPERIMENTAL IMAGE ENGINEERING
C10K_WEEKLY BUILD, synced to V122_32_8_SCE
Compiled Sun 21-Nov-10 16:30 by jdkerr
Card Type: uBR10000 5x20 Cable Line Card, S/N CAT1221E16A
System exception: sig=10, code=0x8, context=0x64348944
traceback 60A2D984 60A2CC18 600A8AAC 600D5098 602206AC 60220698
$0 : 00000000, AT : 61D50000, v0 : 00000000, v1 : 00000000
a0 : 00000020, a1 : 65AA0FE8, a2 : 00000183, a3 : 00006F39
t0 : 0000C100, t1 : 3400C101, t2 : 60281678, t3 : FFFF00FF
t4 : 60281658, t5 : 000001D9, t6 : 00000000, t7 : 00000000
s0 : 61BF0000, s1 : 00000035, s2 : 0000001E, s3 : 61BF0000
s4 : 64800000, s5 : 00000008, s6 : 64813300, s7 : 60E20000
t8 : 0D0D0D0D, t9 : 00000000, k0 : 65B2367C, k1 : 60268DD0
gp : 61D573A8, sp : 648132C8, s8 : 00000000, ra : 60A2D984
EPC : 0x00000000, SREG : 0x3400C103, Cause : 0x00000008
te to administratively down
SLOT 5/0: Nov 24 13:50:34.143: %UBR10000-5-UPDOWN: Interface Cable5/0/3 U2, changed state
to administratively down
Step 7 Look for the SIG Type in the line that starts with "System exception" to identify the reason for the crash. Table 4-2 lists the most common SIG error types and their causes.
Table 4-2 SIG Value Types
|
|
|
|
SIGINT |
Unexpected hardware interrupt |
3 |
SIGQUIT |
Abort due to break key |
4 |
SIGILL |
Illegal opcode exception |
5 |
SIGTRAP |
Abort due to Break Point or an arithmetic exception |
8 |
SIGFPE |
Floating point unit (FPU) exception |
9 |
SIGKILL |
Reserved exception |
10 |
SIGBUS |
Bus error exception |
11 |
SIGSEGV |
SegV exception |
20 |
SIGCACHE |
Cache parity exception |
21 |
SIGWBERR |
Write bus error interrupt |
22 |
SIGERROR |
Fatal hardware error |
23 |
SIGRELOAD |
Software-forced crash |
Step 8 The vast majority of line card crashes are either Cache Parity Exception (SIG type=20), Bus Error Exception (SIG type=10), and Software-forced Crashes (SIG type=23). Use the following sections to further troubleshoot these problems:
•Cache Parity Errors
•Bus Errors
•Software-Forced Crashes
If the line card crashed for some other reason, capture the output of the show tech-support command. Registered Cisco.com users can decode the output of this command by using the Output Interpreter tool, which is at the following URL:
https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl
Step 9 If you cannot resolve the problem using the information from the Output Interpreter, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show tech-support command.
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
Cache Parity Errors
A cache parity error (SIG type is 20) means that one or more bits at a memory location were unexpectedly changed after they were originally written. This error could indicate a potential problem with the Dynamic Random Access Memory (DRAM) that is onboard the line card.
Parity errors are not expected during normal operations and could force the line card to crash or reload. These memory errors can be categorized in two different ways:
•Soft parity errors occur when an energy level within the DRAM memory changes a bit from a one to a zero, or a zero to a one. Soft errors are rare and are most often the result of normal background radiation. When the CPU detects a soft parity error, it attempts to recover by restarting the affected subsystem, if possible. If the error is in a portion of memory that is not recoverable, it could cause the system to crash. Although soft parity errors can cause a system crash, you do not need to swap the board or any of the components, because the problem is not defective hardware.
•Hard parity errors occur when a hardware defect in the DRAM or processor board causes data to be repeatedly corrupted at the same address. In general, a hard parity error occurs when more than one parity error in a particular memory region occurs in a relatively short period of time (several weeks to months).
When parity occurs, take the following steps to resolve the problem:
Step 1 Determine whether this is a soft parity error or a hard parity error. Soft parity errors are 10 to 100 times more frequent than hard parity errors. Therefore, wait for a second parity error before taking any action. Monitor the router for several weeks after the first incident, and if the problem reoccurs, assume that the problem is a hard parity error and proceed to the next step.
Step 2 When a hard parity error occurs (two or more parity errors at the same memory location), try removing and reinserting the line card, making sure to fully insert the card and to securely tighten the restraining screws on the front panel.
Step 3 If this does not resolve the problem, remove and reseat the DRAM chips. If the problem continues, replace the DRAM chips.
Step 4 If parity errors occur, the problem is either with the line card or the router chassis. Try removing the line card and reinserting it. If the problem persists, try removing the line card from its current slot and reinserting it in another slot, if one is available. If that does not fix the problem, replace the line card.
Step 5 If the problems continue, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show tech-support command.
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
Bus Errors
Bus errors (SIG type is 10) occur when the line card tries to access a memory location that either does not exist (which indicates a software error) or that does not respond (which indicates a hardware error). Use the following procedure to determine the cause of a bus error and to resolve the problem.
Perform these steps as soon as possible after the bus error. In particular, perform these steps before manually reloading or power cycling the router, or before performing an Online Insertion/Removal (OIR) of the line card, because doing so eliminates much of the information that is useful in debugging line card crashes.
Step 1 Capture the output of the show stacks, show context, and show tech-support commands. Registered Cisco.com users can decode the output of this command by using the Output Interpreter tool, which is at the following URL:
https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl
Step 2 If the results from the Output Interpreter indicate a hardware-related problem, try removing and reinserting the hardware into the chassis. If this does not correct the problem, replace the DRAM chips on the hardware. If the problem persists, replace the hardware.
Step 3 If the problem appears software-related, verify that you are running a released version of software, and that this release of software supports all of the hardware that is installed in the router. If necessary, upgrade the router to the latest version of software.
Tip The most effective way of using the Output Interpreter tool is to capture the output of the show stacks and show tech-support commands and upload the output into the tool. If the problem appears related to a line card, you can also try decoding the show context command.
Upgrading to the latest version of the Cisco IOS software eliminates all fixed bugs that can cause line card bus errors. If the crash is still present after the upgrade, collect the relevant information from the above troubleshooting, as well as any information about recent network changes, and contact Cisco TAC.
Software-Forced Crashes
Software-forced crashes (SIG type is 23) occur when the Cisco IOS software encounters a problem with the line card and determines that it can no longer continue, so it forces the line card to crash. The original problem could be either hardware-based or software-based.
The most common reason for a software-forced crash on a line card is a "Fabric Ping Timeout," which occurs when the PRE module sends five keepalive messages (fabric pings) to the line card and does not receive a reply. If this occurs, you should see error messages similar to the following in the router's console log:
%GRP-3-FABRIC_UNI: Unicast send timed out (4)
%GRP-3-COREDUMP: Core dump incident on slot 4, error: Fabric ping failure
Fabric ping timeouts are usually caused by one of the following problems:
•High CPU Utilization—Either the PRE module or line card is experiencing high CPU utilization. The PRE module or line card could be so busy that either the ping request or ping reply message was dropped. Use the show processes cpu command to determine whether CPU usage is exceptionally high (at 95 percent or more). If so, see the "High CPU Utilization Problems" section on page 3-8 for information on troubleshooting the problem.
•CEF-Related Problems—If the crash is accompanied by system messages that begin with "%FIB," it could indicate a problem with Cisco-Express Forwarding (CEF) on one of the line card's interfaces. For more information, see Troubleshooting CEF-Related Error Messages, at the following URL:
http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a0080110d68.shtml
•IPC Timeout—The InterProcess Communication (IPC) message that carried the original ping request or the ping reply was lost. This could be caused by a software bug that is disabling interrupts for an excessive period of time, high CPU usage on the PRE module, or by excessive traffic on the line card that is filling up all available IPC buffers.
If the router is not running the most current Cisco IOS software, upgrade the router to the latest software release, so that any known IPC bugs are fixed. If the show processes cpu shows that CPU usage is exceptionally high (at 95 percent or more), or if traffic on the line card is excessive, see the "High CPU Utilization Problems" section on page 3-8.
If the crash is accompanied by %IPC-3-NOBUFF messages, see Troubleshooting IPC-3-NOBUFF Messages on the Cisco 12000, 10000 and 7500 Series, at the following URL:
http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186a00800945a1.shtml
•Hardware Problem—The card might not be fully inserted into its slot, or the card hardware itself could have failed. In particular, if the problem began occurring after the card was moved or after a power outage, the card could have been damaged by static electricity or a power surge. Only a small number of fabric ping timeouts are caused by hardware failures, so check for the following before replacing the card:
–Reload the software on the line card, using the hw-module slot reset command.
–Remove and reinsert the line card in its slot.
–Try moving the card to another slot, if one is available.
If software-forced crashes continue, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show tech-support command.
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
Troubleshooting the Timing, Communication, and Control Plus Card
At least one working Timing, Communication, and Control Plus (TCC+) card must be installed in the Cisco uBR10012 router for normal operations. The TCC+ card acts as a secondary processor that performs the following functions:
•Generates and distributes 10.24 MHz clock references to each cable interface line card.
•Generates and distributes 32-bit time stamp references to each cable interface line card.
•Allows software to independently power off any or all cable interface line cards.
•Provides support for Online Insertion/Removal (OIR) operations of line cards.
•Drives the LCD panel used to display system configuration and status information.
•Monitors the supply power usage of the chassis.
•Provides two redundant RJ-45 ports for external timing clock reference inputs such as a Global Positioning System (GPS) or BITS clock.
If the Cisco uBR10012 router does not have a working TCC+ card installed, the WAN and cable interface line cards will experience excessive packet drops, or all traffic will be dropped, because of an invalid timing signal. Also, if no TCC+ card is installed, the cable power command is disabled, because this function is performed by the TCC+ card.
Note Because the TCC+ card is considered a half-height card, use slot numbers 1/1 or 2/1 to display information for the TCC+ card using the show diag command. The show cable clock and show controllers clock-reference commands also use these slot numbers when displaying clock-related information.
For more information about the TCC+ card, refer to the troubleshooting section in the Cisco uBR10012 Universal Broadband Router TCC+ Card guide.
Troubleshooting the DOCSIS Timing, Communication, and Control Card
The DOCSIS Timing, Communication, and Control (DTCC) card performs the following functions:
•In the default DTI mode, a 10.24 MHz clock and 32-bit DOCSIS timestamp is generated by the DTI Server, propagated to DTI client using DTI protocol, and distributed by DTI client to each cable interface line card.
•Allows software to independently power off any or all cable interface line cards.
•Drives the LCD panel used to display system configuration and status information.
•Monitors the supply power usage of the chassis.
•Two RJ-45 cables with the DTI server, which, in turn, can generate the clock using its own oscillator or external timing reference inputs such as GPS or network clock.
Note In Cisco IOS Release 12.2(33)SCB and later releases, you must ensure that two DTCC cards are installed and configured on the Cisco uBR10012 router before installing the line cards or the shared port adaptor (SPA). Installing and configuring a single DTCC card on a Cisco uBR10012 router is not supported in Cisco IOS Release 12.2(33)SCB and later.
For more information about the DTCC card, refer to the troubleshooting section in the Cisco uBR10012 Universal Broadband Router DTCC Card guide.
Troubleshooting the Gigabit Ethernet Line Card
The Cisco Half-Height Gigabit Ethernet (HHGE) line card contains a single Gigabit Ethernet port that provides a trunk uplink to switches and core routers. The Cisco HHGE line card provides the Cisco uBR10012 universal broadband router with an IEEE 802.3z- compliant Ethernet interface that can run up to 1 Gbps in full duplex mode. The line card uses a small form-factor pluggable (SFP) Gigabit Interface Converter (GBIC) module that supports a variety of Gigabit Ethernet interface types (SX, LX/LH, and ZX), which you can change or upgrade at any time.
For more information about the HHGE line card, refer to the troubleshooting section in the Cisco uBR10012 Universal Broadband Router Gigabit Ethernet Half-Height Line Card Installation guide.
Troubleshooting the Cable Interface Line Cards
The cable interface line cards, together with external IF-to-RF upconverters, serve as the RF interface between the cable headend and DOCSIS/EuroDOCSIS-based cable modems.
Note For troubleshooting information about obsolete line cards, see Cisco uBR10012 Universal Broadband Router Troubleshooting Guide.
Troubleshooting the Cisco uBR10-MC5X20U/H Cable Interface Line Card
The Cisco uBR10-MC5X20 U and H cable interface line cards are 20 by 16 inch cards designed specifically for the Cisco uBR10012 router. These cards transmit and receive RF signals between the subscriber and the headend over hybrid fiber-coaxial (HFC) system.
For more information about this interface processor, refer to the troubleshooting sections in the following guides:
•Cisco uBR10-MC5X20U/H Cable Interface Line Card Hardware Installation Guide
•Configuring the Cisco uBR10-MC5X20U/H Broadband Processing Engine
Troubleshooting the Cisco UBR-MC20X20V Cable Interface Line Card
The Cisco UBR-MC20X20V cable interface line card is a 20 by 16 inch card designed specifically for the Cisco uBR10012 universal broadband router. This card transmits and receives RF signals between the subscriber and the headend over hybrid fiber-coaxial (HFC) system.
For more information about this line card, refer to the troubleshooting sections in the following guides:
•Cisco UBR-MC20X20V Cable Interface Line Card Hardware Installation Guide
•Configuring the Cisco UBR-MC20X20V Cable Interface Line Card
Troubleshooting the Cisco uBR-MC3GX60V Cable Interface Line Card
The Cisco uBR-MC3GX60V line card is a high density Modular CMTS line card that provides 72 Annex B downstream and 60 upstream channels and is introduced to provide increased bandwidth to the cable modems.
It's front panel has a four-character alphanumeric display that shows the licensing status information of the upstream and downstream ports. The first two characters represent the downstream license count and the last two characters represent the upstream license count of the line card.
For more information about this line card, refer to the troubleshooting sections in the following guides:
•Cisco uBR-MC3GX60V Cable Interface Line Card Hardware Installation Guide
•Configuring the Cisco uBR-MC3GX60V Cable Interface Line Card
Troubleshooting the SIP and SPA Interface Modules
SIPs and SPAs are a carrier card and port adapter architecture that increases modularity, flexibility, and density across Cisco routers for network connectivity. A SIP is a carrier card that inserts into a router slot like a line card. It provides no network connectivity on its own. A SPA is a modular type of port adapter that inserts into a bay (subslot) of a compatible SIP carrier card to provide network connectivity and increased interface port density. A SIP can hold one or more SPAs, depending on the SIP type.
Cisco Wideband SIP
On a Cisco uBR10012 router, the Cisco Wideband SIP occupies two full-height line card slots: either slots 1/0 and 2/0, or slots 3/0 and 4/0.
For more information about this interface processor, refer to the troubleshooting sections in the following guides:
•Cisco uBR10012 Universal Broadband Router SIP and SPA Hardware Installation Guide
•Cisco uBR10012 Universal Broadband Router SIP and SPA Software Configuration Guide
Cisco Wideband SPA
The Cisco Wideband SPA is a single-wide, half-height shared port adapter that provides Cisco Wideband Protocol for a DOCSIS network formatting to the downstream data packets. The Cisco Wideband SPA is used for downstream data traffic only.
For more information about this shared port adapter, refer to the troubleshooting sections in the following guides:
•Cisco uBR10012 Universal Broadband Router SIP and SPA Hardware Installation Guide
•Cisco uBR10012 Universal Broadband Router SIP and SPA Software Configuration Guide