User Guide for Cisco Unified Operations Manager 2.0.2
How Operations Manager Calculates Repeated Restarts and Flapping
Downloads: This chapterpdf (PDF - 88.0KB) The complete bookPDF (PDF - 24.6MB) | Feedback

How Operations Manager Calculates Repeated Restarts and Flapping

Table Of Contents

How Operations Manager Calculates Repeated Restarts and Flapping


How Operations Manager Calculates Repeated Restarts and Flapping


Operations Manager uses similar calculations to diagnose both repeated restarts and flapping. Operations Manager considers a system to be restarting repeatedly when it performs too many cold or warm starts over a short period of time. Table H-1 lists the elements, traps, and user-defineable parameters that Operations Manager uses to calculate repeated restarts.

Table H-1 Elements, Traps, and Parameters Used to Calculate Repeated Restarts

Elements
SNMP Traps
Threshold Category
Parameter
Parameter Definition

Hosts

Hubs

Routers

Switches

Cold Start

Warm Start

Reachability Settings

Restart trap threshold

Minimum number of SNMP traps required in a user-defined period of time to trigger an event.

Restart trap window

User-defined period within which minimum number of traps must be received to trigger an event.


Operations Manager considers a network adapter to be flapping when it fluctuates between the Up and Down states too often over a short period of time. Table H-2 lists the elements, traps, and user-defineable parameters Operations Manager uses to diagnose flapping.

Table H-2 Elements, Traps, and Parameters Used to Calculate Flapping

Elements
SNMP Traps
Threshold Category
Parameter
Parameter Definition

Network adapters

(See Interface Groups, Access Port Groups, and Trunk Port Groups in Data Settings—Threshold Categories, page 18-32.)

Link Up

Link Down

Interface/port flapping settings

Link trap threshold

Minimum number of SNMP traps required in a user-defined period of time to trigger an event.

Link trap window

User-defined period within which minimum number of traps must be received to trigger an event.


After Operations Manager generates a Repeated Restarts event or a Flapping event, Operations Manager computes the stable time (the amount of time that must elapse without further traps before Operations Manager declares the element stable again). The stable time is at least as long as the time the element was at fault, and at least as long as the trap window; however, it can be no longer than one hour.

Figure H-1 illustrates how a system is diagnosed as performing repeated restarts, or how a network adapter is diagnosed as flapping.

Figure H-1 Diagnosing Repeated System Restarts or Flapping Network Adapters

In Figure H-1, the trap window (Restart trap window or Link trap window parameter) has a value of 30 seconds, and the trap threshold (Restart trap threshold or Link trap threshold parameter) has a value of 2. Operations Manager performs the following actions:

1. As soon as Operations Manager receives a Link Down Trap from a physical port or interface (or a Warm Start/Cold Start Trap from a system), Operations Manager begins counting.

2. When Operations Manager receives 2 or more traps within 30 seconds, it considers the network adapter or system to be at fault and Operations Manager generates a Repeated Restarts event or a Flapping event. The minimum traps parameter (set by the Link trap threshold or Restart trap threshold parameter) determines the number of traps Operations Manager must receive (2) within the trap window (30 seconds, set by the Link trap window or restart trap window parameter) before it considers an element at fault.

3. Operations Manager continues to receive traps for 80 seconds after the initial trap, resulting in a stable time of 80 seconds.

The stable time is the amount of time that Operations Manager waits before it clears the Repeated Restarts event or Flapping event.