Guest

Cisco Catalyst 6000 Series Switches

Hardware Failure Checklist for Catalyst 4500/4000, 5500/5000, and 6500/6000 Series Switches Running CatOS

Document ID: 8636

Updated: Aug 30, 2005

   Print

Introduction

This document provides general guidelines for determining if there is a hardware failure on a Catalyst switch. The checklist below applies to the Catalyst 4500/4000, 5500/5000, and 6500/6000 series switches running Cisco Catalyst OS (CatOS) software. The goal is to help Cisco customers identify basic hardware issues, or to perform more extensive troubleshooting prior to contacting Cisco Technical Support.

You may also refer to these related troubleshooting documents for further assistance:

Before You Begin

Conventions

For more information on document conventions, see the Cisco Technical Tips Conventions.

Prerequisites

There are no specific prerequisites for this document.

Components Used

The information in this document is based on the commands available in all versions of software for the Catalyst 4500/4000, 5500/5000, and 6500/6000 switches.

  • The only exception to this is the set test diaglevel command, which was introduced in CatOS software release 5.4(1).

  • If you run Cisco IOS® Software on Catalyst 6500/6000 or Catalyst 4500/4000 series switches, the show and set commands used in CatOS do not work. However, the steps used in this document apply to switch hardware and can be applied using the comparable Cisco IOS Software command.

The information presented in this document was created from devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If you are working in a live network, ensure that you understand the potential impact of any command before using it.

Checklist

The following is an orderly troubleshooting process that will assist in gathering the information necessary to resolve the problem. Refining the scope of the problem will save the customer valuable time in finding a solution and following the steps will ensure that important data will not be lost.

Check Switch Environment

View the output of show system command for any failures. The status fields relate to the various LEDs on the system components. All of the various LEDs on the system should be green. If the LEDs are not green this could indicate a failure. It is important to understand the Catalyst switch family components and what the LEDs are telling you to determine if a component is failing. The status LED on the Supervisor Engine indicates whether the Supervisor Engine has passed all diagnostic tests. The Supervisor Engine contains the system operating software. Check the Supervisor Engine if you have trouble with the system software. Have a console session open and determine whether the Supervisor Engine is in boot or ROM monitor (ROMmon) mode. If the switch is stuck in boot or ROMmon mode, follow the troubleshooting steps in the Recovering Catalyst Switches Running CatOS from Booting Failures document.

Result

The show system command will give you valuable environment and system information for the Catalyst switch. The command output also displays the uptime, which is the amount of time that switch has been up and running. This information is useful in determining the time a switch failure may have occurred.

If you have the output of a show system command from your Cisco device, you can use Output Interpreter (registered customers only) to display potential issues and fixes.

Sample Output

Console (enable) show system
PS1-Status PS2-Status 
---------- ---------- 
ok         none       

Fan-Status Temp-Alarm Sys-Status Uptime d,h:m:s Logout
---------- ---------- ---------- -------------- ---------
faulty     off        faulty     18,22:37:24    20 min

PS1-Type     PS2-Type     
------------ ------------ 
WS-CAC-1300W none         

Modem   Baud  Traffic Peak Peak-Time
------- ----- ------- ---- -------------------------
disable  9600   0%      0% Fri May 24 2002, 07:04:29

PS1 Capacity: 1153.32 Watts (27.46 Amps @42V)

System Name              System Location          System Contact           CC
------------------------ ------------------------ ------------------------ ---
                                                                              
Console (enable)  

Note: If any failures occur like the faulty Fan-Status above, inspect the fan assembly and power supplies for any problems.

PS -Status : indicates the status of the power supplies in the chassis. A failing PS or a PS that does not have power supplied might indicate a failure on the Supervisor Engine module (system status faulty). If the switch has an orange system LED and a PS#-Status of faulty this does not necessarily mean the power supply or switch is faulty. This is a possible indication that one of the power supplies has not been inserted correctly or it may not be plugged in.

Note: The Catalyst 4006 requires two power supplies installed to operate the switch, and an additional power supply for redundancy. For more information, review the Standard Equipment section of the Key Features of the Catalyst 4000 Family Switches document.

Fan Status : if this indicates a problem, the system might become overheated and therefore cause problems with the operation of the switch.

Sys-Status : indicates if there is any failure detected in the system.

Traffic & Peak : this gives an indication of the load on the Catalyst backplane. Monitor this utilization when the network is running normally without any problem. Later, when a problem might occur on the network, the results of the normal operation might be used to compare if, for example, Spanning Tree Protocol (STP) loops, broadcast storms, or other types of events that generate a lot of traffic may cause other devices to experience slower performance.

Verify Hardware Operation

View the output of the suspected failing module by issuing the show test mod command.

Note: The show test command may show you a diaglevel entry. If this diaglevel is set to bypass or minimal, you can change this by issuing the set test diaglevel complete command, and resetting the module so that the self test occurs. The set test diaglevel complete command executes all self tests available, whereas the minimal and bypass options skip some or all of the tests.

Result

If you see an F in the output of the show test command, this indicates that this part might be suffering from a hardware failure.

Sample 1: Catalyst 4000

Galaxy> (enable) show test 1 

Diagnostic mode (mode at next reset: complete) 
  

Module 1 : 2-port 1000BaseX Supervisor 
 POST Results 
 Network Management Processor (NMP) Status: (. = Pass, F = Fail, U = Unknown) 
 Galaxy Supervisor Status : . 
 CPU Components Status 
   Processor              : . 
   DRAM                   : . 
   RTC                    : . 
   EEPROM                 : . 
   FLASH                  : . 
   NVRAM                  : . 
   Temperature Sensor     : . 
 Uplink Port 1            : . 
 Uplink Port 2            : . 
 Me1  Status              : . 
 EOBC Status              : . 

 SCX1000 - 0 
   Register               : . 
   Switch Sram            : . 
   Switch Gigaports 
    0: .   1: .   2: .   3: . 
    4: .   5: .   6: .   7: . 
    8: .   9: .  10: .  11: . 
 SCX1000 - 1 
   Register               : . 
   Switch SRAM            : . 
   Switch Gigaports 
    0: .   1: .   2: .   3: . 
    4: .   5: .   6: .   7: . 
    8: .   9: .  10: .  11: . 
 SCX1000 - 2 
   Register               : . 
   Switch SRAM            : . 
   Switch Gigaports 
    0: .   1: .   2: .   3: . 
    4: .   5: .   6: .   7: . 
    8: .   9: .  10: .  11: . 

Galaxy> (enable) show test 5 

Diagnostic mode (mode at next reset: complete) 
  

Module 5 : 14-port 1000 Ethernet 
Status: (. = Pass, F = Fail, U = Unknown) 
  Eeprom: . 
  NICE Regs: 
    Ports 1-4 : .    Ports 5-8 : .    Ports 9-12 : . 
  NICE SRAM: 
    Ports 1-4 : .    Ports 5-8 : .    Ports 9-12 : . 

  1000Base Loopback Status: 
  Ports   1  2  3  4  5  6  7  8  9 10 11 12 13 14 
         ----------------------------------------- 
          .  .  .  .  .  .  .  .  .  .  .  .  .  . 

Sample 2: Catalyst 5000

Sacal> show test 1 

Module 1 : 2-port 100BaseFX MM Supervisor 
Network Management Processor (NMP) Status: (. = Pass, F = Fail, U = Unknown) 
  ROM:  .   Flash-EEPROM: .   Ser-EEPROM: .   NVRAM: .   MCP Comm: . 

  EARL Status : 
        NewLearnTest:         . 
        IndexLearnTest:       . 
        DontForwardTest:      . 
        MonitorTest           . 
        DontLearn:            . 
        FlushPacket:          . 
        ConditionalLearn:     . 
        EarlLearnDiscard:     . 
        EarlTrapTest:         . 
  

Line Card Diag Status for Module 1  (. = Pass, F = Fail, N = N/A) 
 CPU         : .    Sprom    : .    Bootcsum : .    Archsum  : N 
 RAM         : .    LTL      : .    CBL      : .    DPRAM    : .   SAMBA : . 
 Saints      : .    Pkt Bufs : .    Repeater : N    FLASH    : N 

  MII Status: 
  Ports 1  2 
  ----------- 
        N  N 

 SAINT/SAGE Status : 
  Ports 1  2  3 
  -------------- 
        .  .  . 

 Packet Buffer Status : 
  Ports 1  2  3 
  -------------- 
        .  .  . 

 Loopback Status [Reported by Module 1] : 
  Ports  1  2  3 
  -------------- 
         .  .  . 

 Channel Status : 
  Ports 1  2 
  ----------- 
        .  . 

Sample 3: Catalyst 6500

tamer>(enable) show test 1 

Diagnostic mode: complete   (mode at next reset: minimal) 
  
Module 1 : 2-port 1000BaseX Supervisor 
Network Management Processor (NMP) Status: (. = Pass, F = Fail, U = Unknown) 
  ROM:  .   Flash-EEPROM: .   Ser-EEPROM: .   NVRAM: .   EOBC Comm: . 

Line Card Status for Module 1 : PASS 

Port Status : 
  Ports 1  2 
  ----------- 
        .  . 

Line Card Diag Status for Module 1  (. = Pass, F = Fail, N = N/A) 

 Module 1 
  Earl V Status : 
        NewLearnTest:             . 
        IndexLearnTest:           . 
        DontForwardTest:          . 
        DontLearnTest:            . 
        ConditionalLearnTest:     . 
        BadBpduTest:              . 
        TrapTest:                 . 
        MatchTest:                . 
        SpanTest:                 . 
        CaptureTest:              . 
        ProtocolMatchTest:        . 
        IpHardwareScTest:         . 
        IpxHardwareScTest:        . 
        MultipleRoutersScTest:    . 
        L3DontScTest:             . 
        L3RedirectTest:           . 
        L3Capture2Test:           . 
        L3VlanMetTest:            . 
        AclPermitTest:            . 
        AclDenyTest:              . 

 Loopback Status [Reported by Module 1] : 
  Ports 1  2 
  ----------- 
        .  . 

 Channel Status : 
  Ports 1  2 
  ----------- 
        .  . 

 InlineRewrite Status : 
  Ports 1  2 
  ----------- 
        .  . 
tamer>(enable) show test 3 

Diagnostic mode: complete   (mode at next reset: minimal) 
  

Module 3 : 48-port 10/100BaseTX Ethernet 

Line Card Status for Module 3 : PASS 

Port Status : 
  Ports 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 
        25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
       ------------------------------------------------------------------------ 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 

Line Card Diag Status for Module 3  (. = Pass, F = Fail, N = N/A) 

 Loopback Status [Reported by Module 1] : 
  Ports 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 
  Ports 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 

 Channel Status : 
  Ports 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 
  Ports 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 

 InlineRewrite Status : 
  Ports 1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 
  Ports 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
  ----------------------------------------------------------------------------- 
        .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 

Review Error Messages

View the output of the show logging buffer command for any error messages displayed around the time you encountered the switch failure.

Result

The Catalyst switch may display a message indicating any events that have taken place in the switch. Look at this output and check the meaning of any messages displayed in the Message and Recovery Procedures document. This can give you an indication of what exactly went wrong at the time of the failure, and enable you to verify whether the problem is hardware or software related. Use the Error Message Decoder (registered customers only) tool to help decipher the output of any messages.

Sample Output

The syslog may give the following error messages:

SYS-2-FAN_FAIL: Fan failed
SYS-2-MOD_TEMPMINORFAIL: Module 2 minor temperature threshold exceeded 

If we look up the messages in the Message and Recovery Procedures, we see that the first message indicates a failure of one or more fans in the system. This is followed by a message that module 2 has detected that the temperature has risen on module 2. In this case, you need to examine the fan module in order to resolve the problem.

In addition to reviewing the error messages, it is a good idea use the Bug Toolkit (registered customers only) to see if there are any issues with the software release. The show version command will provide the software version information to use for a bug search.

For example, if you identify an exception in the show log command output, use the Bug Toolkit (registered customers only) to search for bugs on your Catalyst platform, software version, and the exception from the show log.

Check Software Compatibility

Check the model number of the module you are having problems with and the software version you are using by issuing the show version command. Determine the total Dynamic Random-Access Memory (DRAM) and total Flash. Use the Software Advisor (registered customers only) or Product Overview for the particular platform to determine if the hardware is compatible with the software.

Result

This command verifies the version of software you are running. This command also has information about the size of Flash and DRAM. This is useful information should you need to upgrade.

Sample Output

Console (enable) show version
WS-C6509 Software, Version NmpSW: 5.5(5)
Copyright (c) 1995-2000 by Cisco Systems
NMP S/W compiled on Dec 14 2000, 17:05:38

System Bootstrap Version: 5.3(1)

Hardware Version: 2.0  Model: WS-C6509  Serial #: SCA0412024U

Mod Port Model               Serial #    Versions
--- ---- ------------------- ----------- --------------------------------------
1   2    WS-X6K-SUP1A-2GE    SAD04281END Hw : 3.2
                                         Fw : 5.3(1)
                                         Fw1: 5.4(2)
                                         Sw : 5.5(5)
                                         Sw1: 5.5(5)
         WS-F6K-PFC          SAD04340506 Hw : 1.1
3   8    WS-X6408-GBIC       SAD0415009A Hw : 2.4
                                         Fw : 5.1(1)CSX
                                         SW : 5.5(5)
4   48   WS-X6348-RJ-45      SAL0446200S Hw : 1.4
                                         Fw : 5.4(2)
                                         SW : 5.5(5)
15  1    WS-F6K-MSFC         SAD04120BNJ Hw : 1.4
                                         Fw : 12.1(8a)E2
                                         SW : 12.1(8a)E2

       DRAM                    FLASH                   NVRAM
Module Total   Used    Free    Total   Used    Free    Total Used  Free
------ ------- ------- ------- ------- ------- ------- ----- ----- -----
1       65408K  37463K  27945K  16384K  15673K    711K  512K  236K  276K

Uptime is 18 days, 21 hours, 54 minutes
Console (enable) 

If an upgrade is required, always check the release notes first for the particular platform and choose the version you need to upgrade to.

Enable or Disable Port

If you are having problems with multiple ports, try enabling or disabling the problem ports. The port can be enabled or disabled by issuing set port {enable | disable} mod/port command.

Result

In some situations, the Catalyst switch might encounter problems with one port. Disabling and re-enabling this specific port might solve the problem.

Note: By disabling or enabling a port, you may also trigger some events on the device connected to that port (such as restarting a process on a server that is stuck). In most situations when disabling and re-enabling a port resolves your problem, this means that the problem is not hardware related. If this resolves the issue, reset this line card during a maintenance window so that the self test occurs.

Move Connection to Another Port

If you are having problems on one specific port, try moving the connection to another port. Use a station that you know to be working and connect it to the failing port.

Result

If the previous action of disabling and re-enabling the port does not clear the problem, move the connection to a different port on the same module (with the same configuration). If this solves the problem, this indicates that a hardware failure might have occurred. If the problem persists, it may be due to the configuration of the connected device. Verify that the port speed and duplex settings are the same on the switch port and the connected device.

For example, a station connected to port 1 on module 7 keeps going up and down. Try swapping the port 1 and port 2 connections on the same module (making sure that port 1 and port 2 have the same configuration). If the problem no longer occurs on port 2, but now the station on port 1 starts flapping instead, this points to a problem with the port 1. If the problem follows to port 2, this could indicate a possible issue with the configuration, connected device, or cable problem.

Reset Module

Have a console session open and capture bootup Power-On Self Test (POST) diagnostics and any system error messages. Reset the module by issuing the reset mod command.

Result

After resetting the module, if the line card comes back online and all the ports pass their diagnostics test and traffic starts passing, the module problems is probably due to a software issue. Issue the show test mod command to determine if this module has passed all of its diagnostic tests on bootup. Note any F for fail results.

Reseat the Line Card

Remove the module and inspect it for bent pins. To reseat the module, firmly press down the ejector levers, and tighten the installation screws.

Result

In some cases, a badly-seated card can cause symptoms that appear to be a hardware failure. A badly-seated card may cause traffic corruption on the backplane, which might result in various problems occurring in the Catalyst chassis. For example, if one module corrupts traffic on the Catalyst backplane, this can cause the self test to fail for both itself and other modules. Reseating all the cards can resolve this and allow the self tests to pass.

Eliminate Chassis Failure

Try removing all of the line cards in the chassis, except the active Supervisor Engine module and the problem module, to determine if the failure you encountered changes. If the failure continues, move the line card to a known working slot in the chassis.

Note: If the module was a different type of module, save your configuration and issue the clear config module command.

Result

If one of the line cards in the chassis is faulty, it can cause a failure on other line cards as well. In that case, removing one card can resolve the problems seen on the other cards. If the module is still failing after you have removed all the other line cards and moved the line card to another slot, this may indicate that the line card is faulty. If another switch is available, try the module in another chassis to ultimately determine if it is a module or chassis problem.

If the module appears to be operating normally and passing traffic after removing the additional modules and moving the module to another slot, this may indicate a possible problem with the chassis. Try putting the module back in its original location and determine if the failures return. If the module appears to be operating normally and is passing traffic in the original location, this may indicate a software problem. Use the Bug Toolkit (registered customers only) to search for bugs on the Catalyst platform, software version, and the error you are experiencing.

Reseat the Supervisor Engine

Remove the Supervisor Engine and inspect for bent pins. Reseat the Supervisor Engine, firmly press down the ejector levers, and tighten the captive installation screws. For the Catalyst 5500 and 6000 series switches, the Supervisor Engines can be installed in slot 1 and 2. To eliminate slot dependency problems, move the Supervisor Engine to the other slot available for the Supervisor Engine module. The Catalyst 5500 and 6000 series switches also support redundancy which allows the switch to have dual Supervisor Engines. If you are running dual Supervisor Engines, try to force a switchover to the standby Supervisor Engine by either unplugging the active Supervisor Engine or by issuing the reset mod command. For more information about slot Supervisor Engine requirements, refer to these links:

Result

Have a console session open and capture bootup POST diagnostics and any system error messages. Wait for the Supervisor Engine to initialize. If the show system command sys-status is still faulty, the Supervisor Engine has failed.

Check External Environment

Check what was happening at the time of the failure.

Result

If a failure keeps reoccurring, examine what happens at that time and place. This might give you an indication as to what is occurring outside of the Catalyst switch causing it to fail. For example, a short interruption of power, which could be lights flickering in the building.

Related Information

Updated: Aug 30, 2005
Document ID: 8636