Guest

Cisco 7500 Series Routers

Troubleshooting Router Hangs

Cisco - Troubleshooting Router Hangs

Introduction

This document helps troubleshoot a system that does not respond. The document also discusses the cause, and how you can eliminate the problem.

A router appears to stop working when the system is not responsive to the console or to queries sent from the network (for example, Telnet, Simple Network Management Protocol (SNMP), and so on). These problems can be classified into two broad categories:

  • When the console does not respond.

  • When traffic does not go through.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

The information in this document is based on these software and hardware versions:

  • All Cisco IOS® software versions

  • All Cisco routers

This document does not apply to Cisco Catalyst switches or MGX platforms.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

For more information on document conventions, refer to the Cisco Technical Tips Conventions.

Console is Not Responsive

Console problems occur when the router becomes unresponsive to input at the console port. If the console is not responsive, it means that a high priority process prevents the console driver from responding to input.

Steps to Troubleshoot

  • Verify cable connectivity.

  • Verify that the power supply is on.

  • Verify the router LED status. If all LEDs are down, it is most likely an issue with the power supply of the router.

If traffic still flows through the router:

  • Disconnect network interfaces and see if the router responds. Many times the router assumes it is doing something too important to service exec sessions.

  • You can also attempt to reproduce the problem after you issue these commands:

    On the Cisco 7200 and 7500 Series:

    configure terminal
    scheduler allocate 3000 1000
    ^Z
    

    The scheduler allocate command guarantees CPU time for low priority processes. It puts a maximum time allocated to fast-switching (3000 microseconds - usec) and process-switching (1000 usec) per network interrupt context.

    On all other platforms, use:

    configure terminal 
    scheduler interval 500 
    ^Z 
    

    The scheduler interval command allows low priority processes to be scheduled every 500 usec, and thereby allows some commands to be typed even if CPU usage is at 100%.

    Check the Basic System Management Commands in the Cisco IOS Software Command Reference for more information on these commands.

  • If the console does not respond because the router CPU utilization is high, it is important to find and correct the cause of the high CPU utilization. For example, if process-switched IP traffic causes problems, then this is reflected in the "IP Input" process in the output from the show processes cpu command. In this situation, it is important to collect the output from show interfaces, show interfaces stat, and possibly show processes to further diagnose the problem. To fix the problem, you would need to reduce the amount of IP traffic that is process switched. See Troubleshooting High CPU Utilization on Cisco Routers for more information.

  • Another possible cause of an apparent hang is memory allocation failure; that is, either the router has used all available memory, or the memory has been fragmented into such small pieces that the router cannot find a usable available block. For more information, see Troubleshooting Memory Problems.

  • The router can stop responding due to a security-related problem, such as a worm or virus. This is especially likely to be the cause if there have not been recent changes to the network, such as a router IOS upgrade. Usually, a configuration change, such as adding additional lines to your access lists can mitigate the effects of this problem. The Cisco Security Advisories and Notices page contains information on detection of the most likely causes and specific workarounds.

    For additional information, refer to:

  • If the router appears to freeze during the bootup process, it can be the result of an improperly configured feature or of a software defect in a configured feature. This is often evident from the appearance of a warning or error message on the console immediately before the router freezes.

    As a workaround to this problem, boot the router into ROMMON, and bypass the stored configuration, and then configure it again. Complete these steps:

    1. Attach a terminal or PC with terminal emulation to the console port of the router.

      Use these terminal settings:

      • 9600 baud rate

      • No parity

      • 8 data bits

      • 1 stop bit

      • No flow control

    2. Reboot the router and break into ROMMON by pressing break on the terminal keyboard within 60 seconds of the power-up. If the break sequence doesn't work, see Standard Break Key Sequence Combinations During Password Recovery for other key combinations.

    3. Change the configuration register to 0x2142 and then reset the router. For this, execute the confreg 0x2142 command at the rommon 1> prompt. Then type reset at the rommon 2> prompt. This causes the router to boot from Flash without loading the configuration.

    4. Type no after each setup question or press Ctrl-C to skip the initial setup procedure.

    5. Type enable at the Router> prompt.

      You are in enable mode, and see the Router# prompt.

    6. Now, you can save an empty configuration (all commands removed). Issue the copy running-config startup-config command. Alternatively, if you suspect that a certain command causes the problem, you can edit the configuration. To do so, issue the copy startup-config running-config command. Then type configure terminal, and make the changes.

    7. When finished, change the configuration-register back to 0x2102. For this, type config-register 0x2102. Issue the copy running-config startup-config command to commit the changes.

If traffic does not flow through the router:

  • If traffic no longer passes through the router and the console is unresponsive, there is probably a problem with the system. Generally this means that the router is caught in a continuous loop or stuck at a function. This is almost always caused by a bug in the software. Install the most recent maintenance release of the Cisco IOS software train you currently run.

    Before you create a service request with the Cisco TAC, obtain a stack trace from ROM Monitor. Obtaining stack traces during a problem makes it possible to determine where in the code the router is looping or stuck.

Traffic Does Not Pass Through

Traffic problems occur when the console remains responsive but traffic does not pass through the router. In this case, part of the traffic or part of the interfaces are not responsive. This behavior can be caused by a variety of different causes. When this problem occurs, information can be collected from the router through the console port. The causes for these traffic problems can range from errors on the interfaces to software and hardware problems.

Possible Causes

  • Routing issue – Changes in the network topology or in the configuration of some routers could have affected the routing tables.

  • High CPU Utilization – Issue the show process cpu command. If the CPU is above 95%, the performance of the router can be affected, and packets can be delayed or dropped. Refer to Troubleshooting High CPU Utilization on Routers for more information.

  • Interface down – One of the router interfaces can be down. There are multiple events that could cause this, which can range from a wrong configuration command to a hardware failure of the interface or the cable. If some interfaces appear down when you issue a show interfaces command, try to find out what caused it.

  • Wedged interfaces – This is a particular case of buffer leaks that causes the input queue of an interface to fill up to the point where it can no longer accept packets. Reload the router. This frees that input queue, and restores traffic until the queue is full again. This can take anywhere from a few seconds to a few weeks, based on the severity of the leak.

    The easiest way to identify a wedged interface is to issue a show interfaces command, and to look for something similar to this:

    Output queue 0/40, 0 drops; input queue 76/75, 27 drops 

    See Troubleshooting Buffer Leaks for detailed guidelines and examples.

Obtain a Stack Trace from ROM Monitor

K-trace refers to the procedure used to obtain a stack trace from the router from ROM Monitor. On routers with older ROM Monitor code, a stack trace is obtained with the k command. On routers that run more recent ROM Monitor code, the stack command can also be used.

Complete these steps to obtain stack traces from a router that does not respond:

  1. Enable the break sequence. For this, change the configuration register value. The eighth bit value must be set to zero so that break is not ignored. A value of 0x2002 works.

    Router#configure terminal
    Enter configuration commands, one per line.  End with CNTL/Z.
    Router(config)#config-register 0x2002
    
  2. Reload the router so that the new configuration register value is used.

  3. Send the break sequence when the problem occurs. The ROM Monitor prompt ">" or "rommon 1 >" must be displayed.

  4. Capture a stack trace. For this, collect the output from either the k 50 or stack 50 commands. Add 50 to the command to print a longer stack trace.

  5. Issue the c or cont command to continue.

  6. Repeat the three last steps several times to ensure that multiple points in a continuous loop have been captured.

  7. After you have obtained several stack traces, reboot the router to recover from the hung state.

Here is an example of this procedure:

User break detected at location 0x80af570
rommon 1 > k 50
Stack trace:
PC = 0x080af570
Frame 00: FP = 0x02004750    RA = 0x0813d1b4
Frame 01: FP = 0x02004810    RA = 0x0813a8b8
Frame 02: FP = 0x0200482c    RA = 0x08032000
Frame 03: FP = 0x0200483c    RA = 0x040005b0
Frame 04: FP = 0x02004b34    RA = 0x0401517a
Frame 05: FP = 0x02004bf0    RA = 0x04014d9c
Frame 06: FP = 0x02004c00    RA = 0x040023d0
Frame 07: FP = 0x02004c68    RA = 0x04002e9e
Frame 08: FP = 0x02004c78    RA = 0x040154fe
Frame 09: FP = 0x02004e68    RA = 0x04001fc0
Frame 10: FP = 0x02004f90    RA = 0x0400c41e
Frame 11: FP = 0x02004fa4    RA = 0x04000458
Suspect bogus FP = 0x00000000, aborting
rommon 2 > cont

Repeat this procedure several times in the event of a system problem to collect multiple instances of the stack trace.

When a router does not respond, it is almost always a software problem. In this case, collect as much information as possible, including the stack trace, before you open a TAC service request. It is also important to include output from the show version, show run, and show interfaces commands.

Information to Collect if You Open a TAC Service Request

If you open a TAC Service Request, please attach the following information to your request for troubleshooting Router Hangs:
  • Troubleshooting performed before opening the case
  • show technical-support output (in enable mode if possible)
  • show log output or console captures if available
  • stack trace from ROM Monitor
Please attach the collected data to your case in non-zipped, plain text format (.txt). You can attach information to your case by uploading it using the TAC Service Request Tool (registered customers only) . If you cannot access the TAC Service Request Tool, you can attach the relevant information to your case by sending it to attach@cisco.com with your case number in the subject line of your message.

Note: If the console is responsive, please do not manually reload or power-cycle the router before collecting the above information, unless required to troubleshoot router hangs, as this can cause important information to be lost that is needed for determining the root cause of the problem.

Related Information

Updated: Aug 02, 2006
Document ID: 15105