Guest

Cisco 90 Series Customer Premises Equipment

What Causes %SYS-3-CPUHOG Messages?

Cisco - What Causes %SYS-3-CPUHOG Messages?

Document ID: 15093

Updated: Jun 24, 2008

   Print

Introduction

This document lists the causes of %SYS-3-CPUHOG error messages, and explains how to troubleshoot them.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

For more information on document conventions, refer to the Cisco Technical Tips Conventions.

Background Information

To reduce the impact of runaway processes, Cisco IOS® software uses a process watchdog timer that allows the scheduler to periodically poll the currently active process. This feature is not the same as preemption. Instead, it is a fail-safe mechanism, which ensures that the system does not become unresponsive or completely lock up due to the total consumption of the CPU by any process.

If a process appears to hang (for example, if it continues to run for a long time), the scheduler can force the process to terminate.

Every time the scheduler allows a process to run on the CPU, it starts a watchdog timer for that process. After a preset period, if the process continues to run, the watchdog process generates an interrupt and causes a router restart by a "software forced crash" (the stack trace shows a watchdog process as the trigger of the crash).

The first time the watchdog expires, the scheduler prints a warning message such as:

%SYS-3-CPUHOG: Task ran for 2148 msec (20/13), Process = IP Input, PC = 3199482    
-Traceback= 314B5E6 319948A

This message indicates a process has held up the CPU. Here, it is the "IP Input" process. This message usually appears during transient circumstances, such as an Online Insertion and Removal (OIR) when the router boots up, or under heavy traffic conditions. The "%SYS-3-CPUHOG" messages must not appear during normal operation of the router.

If the router is busy at interrupt level after a process was scheduled to run, the accounting of the duration for which the process ran can be inaccurate. This is because, the CPUHOG only tracks process level tasks. It does not track interrupt level tasks that are permitted to interrupt and gain control of the CPU.

The typical process to run at interrupt level is packet switching.

Troubleshoot

This section explains how you can troubleshoot CPUHOG messages in different scenarios.

CPUHOG at the Bootup Process

CPUHOG messages at the time of the boot sequence are fairly common. The error message itself means that the boot process has held the CPU just a little longer than the system wanted it to hold, and then has sent a message to the console output to inform you about it. The process in this case is "Boot Load," which indicates where the CPUHOG occurred:

System Bootstrap, Version 11.1(12)XA, EARLY DEPLOYMENT RELEASE SOFTWARE
(fc1)
Copyright (c) 1997 by cisco Systems, Inc.
C1600 processor with 16384 Kbytes of main memory

program load complete, entry point: 0x4018060, size: 0x108968

%SYS-3-CPUHOG: Task ran for 2040 msec (6/6), Process = Boot Load, PC =40B513A
-Traceback= 407EB6E 407F628 407D118 40180E0 40005B0 4015C3E 40152B2 4014ED4
40025B8 4003086 4015636 40021A8 400C616program load complete, entry point:
0x2005000, size: 0x4195b9
Self decompressing the image :
############################################################################
############################################################################
################################################################## [OK]

You can safely ignore this error message. At the time of the boot process, the boot loader uses the CPU for 2-4 seconds, and does not release it. This is not a problem at boot time, because the CPU needs to run only the boot loader at that point. More recent boot ROMs suppress the printing of that particular message.

You can also encounter a CPUHOG message from the boot helper image whenever the router loads a large image, for example, when you use the Cisco 1600 Series Routers. These routers are configured with more than 16 MB DRAM.

This message occurs only when the image is being loaded, and has no effect on the operation of the system or the loading process. In any case, this is a cosmetic problem as it has no effect on the normal operation of the system.

CPUHOG at the Time of an OIR

CPUHOG messages are common at the time of an OIR, because the router has to perform a set of complicated and relatively long tasks. There is no need to worry about CPUHOG messages that occur during OIRs, as long as the card that was inserted comes up properly.

CPUHOG When You Try to Access a Flash Device

A CPUHOG message can appear when you attempt to access a Flash device (such as a Flash card, or a Flash single inline memory module (SIMM)) when the device is defective or when it does not respond. If the problem recurs, please contact your TAC representative.

Note: If you have a Catalyst 6500 that runs Integrated Cisco IOS software (Native Mode) or Hybrid Mode, and which has CPUHOG messages when you format MSFC (RP) bootflash:, it can be the problem mentioned in Cisco Bug ID CSCdw53175 (registered customers only) , which is resolved in Cisco IOS Software Releases 12.1.11b, 12.1(12c)E5, or12.1(13)E, and later versions.

CPUHOG Due to "CEF LC Background" Process

On the Cisco 12000 Series Internet Router, the forwarding information base (FIB) is maintained on each line card for use in packet switching. Due to the structure of the FIB tree, routing changes with short subnet masks (between /1 and /4) can cause messages like this in the console log:

SLOT 1: %SYS-3-CPUHOG: Task ran for 4024 msec (690/0), 
process = CEF IPC Background, PC = 400B8908. 
-Traceback= 400B8910 408FF588 408FF6F4 408FFE8C 400A404C 400A4038

When a process in Cisco IOS software runs for longer than 2000ms (2 seconds), a CPUHOG message is displayed. In the case of Cisco Express Forwarding (CEF) updates for very short subnet masks, the amount of processing required can be more than 2000ms, which can trigger these messages. The "CEF IPC Background" process is the parent process that controls the addition and removal of prefixes from the forwarding tree.

Additionally, if the CPU is locked down for an extended period, the line card can crash due to a Fabric Ping failure, or that FIB can become disabled due to lost IPC communication timeouts. If you need to troubleshoot these problems, see Troubleshooting Fabric Ping Timeouts and Failures on the Cisco 12000 Series Internet Router.

In general, routing updates with masks shorter than /7 are erroneous or malicious. Cisco recommends that all customers configure adequate route filtering to prevent the processing and propagation of such updates. If you need assistance to configure routing filters, contact your technical support representative.

A CPUHOG message can also be triggered due to the "CEF IPC Background" process when you clear the Border Gateway Protocol (BGP) or the routing table.

CPUHOG at the Time of Normal Router Operation

Most of the time, these error messages are due to an internal software bug in the Cisco IOS Software.

The first step to troubleshoot this sort of error message is to look for a known bug. You can use the Bug Toolkit (registered customers only) to find a bug that matches the error. In the Bug Toolkit page, click Launch Bug Toolkit, and select Search for Cisco IOS-related bugs. In order to narrow your search, you can select your Cisco IOS software version under number 1. Under number 3, you can perform a keyword search for "CPUHOG, <process>" where process is the corresponding process, such as Virtual Exec or IP Input.

You can upgrade to the latest Cisco IOS Software image in your release train to eliminate all fixed CPUHOG bugs.

Information to Collect if You Open a TAC Service Request

If you still need assistance after following the troubleshooting steps above and want to open a service request (registered customers only) with the Cisco TAC, be sure to include the following information:
  • Troubleshooting performed before opening the service request.
  • show technical-support output (in enable mode if possible).
  • show log output or console captures, if available.
  • execute-on slot [slot #] show tech for the slot which experienced the line card crash.
  • The crashinfo file (if it is available, and has not already been included in the show technical-support output).
Please attach the collected data to your service request in non-zipped, plain text format (.txt). You can attach information to your service request by uploading it using the TAC Service Request tool (registered customers only) . If you cannot access the Service Request tool, you can send the information in an email attachment to attach@cisco.com with your service request number in the subject line of your message.

Note: Please do not manually reload or power-cycle the router before collecting the above information unless required to troubleshoot a line card crash on the Cisco 12000 Series Internet Router, as this can cause important information to be lost that is needed for determining the root cause of the problem.

Related Information

Updated: Jun 24, 2008
Document ID: 15093