Table Of Contents
Faults and Exceptions Diagnosed
by DFM
Backplane Utilization Exception
Error Exception
Operational Exception
Performance Exception
Understanding the High Utilization Fault
Power Supply Exception
Resource Exception
Temperature Exception
Faults and Exceptions Diagnosed
by DFM
These topics describe the faults and exceptions reported by DFM:
•
Backplane Utilization Exception
•
Error Exception
•
Operational Exception
•
Performance Exception
•
Power Supply Exception
•
Resource Exception
•
Temperature Exception
A fault is an abnormal condition that occurs when a system (or component of a system) exceeds a performance threshold or is not functioning properly. An exception is a group of related faults. DFM groups related faults into a single exception.
For example, DFM generates a Resource Exception notification to indicate that a router has exhausted most of its processor or memory resources. Upon receiving a Resource Exception notification, you can access the notification to view which of the following faults has been detected:
•
High utilization of the processor
•
High utilization of the memory buffer
•
High rate of buffer misses
•
Excessive fragmentation of system memory
•
Insufficient free memory
DFM generates a single exception of a given type per device, regardless of the number of faults that exist.
Consider a router called moto-gw, which has a faulty processor. DFM reports a Resource Exception for moto-gw and notifies you in the Monitoring Console alarm log as illustrated in Figure 3-1.
Figure 3-1 Resource Exception Notification for Router moto-gw
You can access details about the notification to learn that the router's processor is experiencing High Utilization. Figure 3-2 shows the Notification Properties window, which pinpoints the fault.
Figure 3-2 Faulty Processor on Router Moto-gw
Table 3-1 summarizes all exceptions (except for Operational Exceptions, which are listed in Table 3-2), the faults that trigger them, and the thresholds associated with each fault. There is a one-to-one relationship between thresholds and faults. In addition, the table identifies which managed elements the exceptions are generated for and which components of the managed elements can be at fault.
For information on the DiscoveryError and DuplicateIP error, refer to:
•
Discovery Error and Devices Not Supporting SNMP
•
Duplicate IP Address Error
Table 3-1 Exception Diagnosis Summary
Managed Element
|
Exceptions
|
Component that Can Be at Fault
|
Fault(s) that Trigger Exception
|
Threshold Associated with Fault
|
Systems
|
Backplane Utilization
|
Chassis
|
High Backplane Utilization
|
Backplane Utilization Threshold
|
Systems
VLANs
|
Error
|
Network Adapters
|
High Error Rate
|
Error Threshold
|
Systems
VLANs
|
Performance
|
Network Adapters
|
High Broadcast Rate
High Collision Rate (for Ethernet adapters only)
High Discard Rate
High Queue Drop Rate
High Utilization
|
Broadcast Threshold
Collision Threshold (for Ethernet adapters only)
Discard Threshold
Queue Drop Threshold
Utilization Threshold
|
Systems
|
Power Supply
|
Power Supply
|
Power Supply State Not Normal
|
N/A
|
Voltage Sensor
|
Voltage Out of Range
Voltage State Not Normal
|
Relative Voltage Threshold
N/A
|
Systems
|
Resource
|
Memory
|
Excessive Fragmentation
High Buffer Miss Rate
High Buffer Utilization
Insufficient Free Memory
|
Memory Fragmentation Threshold
Memory Buffer Miss Threshold
Memory Buffer Utilization Threshold
Free Memory Threshold
|
Processor
|
High Utilization
|
Processor Utilization Threshold
|
Systems
|
Temperature
|
Temperature Sensor
|
Temperature Out of Range
Temperature State Not Normal
|
Relative Temperature Threshold
N/A
|
Fan
|
Fan State Not Normal
|
N/A
|
DFM also reports an Operational Exception when a system or component of a system is not functioning properly. You are notified of an operational exception at the system or VLAN level. Table 3-2 summarizes the faults that trigger an operational exception and identifies the systems or components of systems that can be at fault.
Table 3-2 Operational Exception Diagnosis Summary
Managed Element
|
Exception
|
System/Component that Can Be at Fault
|
Fault(s) that Trigger Exception
|
Systems
VLANs
|
Operational
|
Network Adapter
|
Backup Activated
Exceeded Maximum Uptime
Flapping
Operationally Down
|
Systems
|
Operational
|
Card
|
Operationally Down
|
System
|
Excessive Restarts
Unresponsive
|
SNMP Agent
|
Unresponsive
|
Note
For descriptions of the network elements managed by DFM, refer to "Network Elements Managed by DFM."
Backplane Utilization Exception
A Backplane Utilization Exception indicates that the data being transmitted over a switch's backplane is excessive. The following fault can trigger a Backplane Utilization Exception.
High Backplane Utilization
This fault is reported if utilization of the backplane's bandwidth exceeds the Backplane Utilization Threshold. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Backplane Utilization Threshold" section.
Error Exception
An Error Exception indicates that a network adapter is experiencing transmission or functional problems. This causes some error counters to exceed the specified thresholds. The thresholds indicate the acceptable number of input or output packets in error as a percentage of the total number of input or output packets transmitted by a network adapter.
An Error Exception notification is generated for:
•
A system when any network adapters on the system are experiencing a problem.
•
A Virtual Local Area Network (VLAN) when any ports that are members of the VLAN are experiencing a problem.
The following fault can trigger an Error Exception.
High Error Rate
A High Error Rate fault is reported if:
•
The input packet rate is greater than or equal to the minimum packet rate, and the input packet error percentage is greater than the Error Threshold configured for this network adapter. The input packet error percentage is calculated by dividing the number of input packets in error by the total number of input packets, and expressing the result as a percentage.
•
The output packet rate is greater than or equal to the minimum packet rate, and the output packet error percentage is greater than the Error Threshold configured for this network adapter. The output packet error percentage is calculated by dividing the number of output packets in error by the total number of output packets, and expressing the result as a percentage.
Access this notification in the alarm log to view the breakdown by type of error (for example, runts, giants, cyclical redundancy checking (CRC), frame alignment, ignored and aborted). The type of information available is dependent on the device's MIB.
For more information about the Error Threshold in the Generic Interface/Port Performance setting, see the "Error Threshold" section; for the Error Threshold in the Ethernet Interface/Port Performance setting, see the "Error Threshold" section.
Operational Exception
An Operational Exception indicates that a system or one of its components is not functioning properly.
An Operational Exception notification is generated for:
•
A system when the system or a card or network adapter on the system is experiencing an operational problem.
•
A Virtual Local Area Network (VLAN) when any network adapters that are members of the VLAN are experiencing an operational problem.
The following faults can trigger an Operational Exception.
Backup Activated
This fault is reported if a backup port or interface has come online. It is possible to mark a port or interface as a backup. By default, interfaces with an ISDN Type attribute are marked as backup and use the Backup Interface Support setting, which controls the monitoring of backup activation.
Because a backup should not normally be online, DFM notifies you when one comes online. For more information about backup activation, see the "Backup Interface Support" section.
Exceeded Maximum Uptime
This fault is reported if a backup or dial-on-demand port or interface has been in an UP state for too long. It is possible to mark a port or interface as a backup or as dial-on-demand. By default, interfaces with an ISDN Type attribute are marked as backup, and interfaces with a PPP or SLIP Type attribute are marked as dial-on-demand. Since a backup or dial-on-demand port or interface should not normally be up for very long, DFM notifies you when the Maximum Uptime has been exceeded for one of these types of devices.
For more information about the Maximum Uptime parameter, see the "Maximum Uptime" section or the "Maximum Uptime" section.
Excessive Restarts
This fault is reported if a system repeatedly restarts over a short period of time. DFM diagnoses this fault by monitoring the number of system cold and warm start traps that have been received within the restart trap window. The Restart Trap Threshold and the Restart Trap Window parameters in the Connectivity setting control analysis of excessive restarts. For more information about restart parameters, refer to the "Connectivity" section.
Flapping
This fault is reported if a port or interface repeatedly alternates between up and down states over a short period of time. DFM diagnoses this fault by monitoring the number of link down traps that have been received within the link trap window for a particular network adapter. The Link Trap Threshold and the Link Trap Window parameters in the Interface/Port Flapping setting control flapping analysis. For more information about link parameters, refer to the "Interface/Port Flapping" section.
Note
By default, DFM manages trunk ports but does not manage access ports.
Operationally Down
This fault is reported when a card or network adapter's operational state is not normal. DFM diagnoses this fault by polling the operational status of the network adapter.
Unresponsive
This fault can be reported for a system or for an SNMP Agent.
If this fault is reported for a system, then ICMP Ping requests and SNMP queries to the device timeout with no response.
If this fault is reported for an SNMP Agent, the DFM is capable of Pinging the device, but SNMP requests timeout with no response.
Note
A system may also be reported as Unresponsive if the only link (for example, an interface) to the system goes down.
Performance Exception
A Performance Exception indicates that a network adapter is misconfigured or is exhibiting utilization conditions that affect an adapter's ability to receive or process packets.
A Performance Exception notification is generated for:
•
A system when any network adapters on the system are experiencing a performance problem.
•
A Virtual Local Area Network (VLAN) when any network adapters that are members of the VLAN are experiencing a performance problem.
The following faults can trigger a Performance Exception.
High Broadcast Rate
This fault is reported if the input packet broadcast percentage exceeds the Broadcast Threshold. The input packet broadcast percentage calculates the percentage of total capacity that was used to receive broadcast packets. For more information about this threshold in the Generic Interface/Port Performance setting, see the "Broadcast Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Broadcast Threshold" section.
High Collision Rate
This fault is reported if the rate of collisions exceeds the Collision Threshold. For more information about this threshold, which is contained in the Ethernet Interface/Port Performance setting, see the "Collision Threshold" section.
Note
This fault is only reported for Ethernet adapters.
High Discard Rate
This fault is reported if:
•
The input packet queued rate is greater than the minimum packet rate, and the input packet discard percentage is greater than the Discard Threshold. The input packet queued rate is the rate of packets received without error. The input packet discard percentage is calculated by dividing the rate of input packets discarded by the rate of packets received.
•
The output packet queued rate is greater than the minimum packet rate, and the output packet discard percentage is greater than the Discard Threshold. The output packet queued rate is the rate of packets sent without error. The output packet discard percentage is calculated by dividing the rate of output packets discarded by the rate of packets sent.
For more information about the Discard Threshold in the Generic Interface/Port Performance setting, see the "Discard Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Discard Threshold" section.
High Queue Drop Rate
This fault is reported if the number of packets discarded due to input or output queue overflow exceeds the Queue Drop Threshold. The input (or output) queue overflow is derived by dividing the number of packets designated to be sent (or received) that were discarded due to queue overflow by the total number of packets in the queue. For more information about this threshold in the Generic Interface/Port Performance setting, see the "Queue Drop Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Queue Drop Threshold" section.
High Utilization
This fault is reported if the current utilization is greater than the Utilization Threshold configured for this network adapter.
For more information about this threshold in the Generic Interface/Port Performance setting, see the "Utilization Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Utilization Threshold" section. (Default values are different for Ethernet network adapters.)
Refer to the "Understanding the High Utilization Fault" section for information on how these faults are computed.
Understanding the High Utilization Fault
Port and interface network elements represent the connections between switches and routers, respectively, and the wires or other physical media that lead to other devices. The two fundamental properties that describe a port or interface are its speed and its duplex setting:
•
The speed of a port or interface is the maximum rate at which it can transport information. It is usually expressed in bits per second (bps).
•
The duplex setting of a port or interface controls whether data transmission is bidirectional (Full Duplex) or unidirectional (Half Duplex) at any given time. One way to explain the difference is to imagine that a Full Duplex port or interface has two wires plugged into it: one wire carries bits into the router or switch, and one carries them out. A Half Duplex port or interface, on the other hand, has only a single wire plugged into it. The port or interface can either send bits out that wire or receive them from it; but at any given time, it cannot do both.
The speed of a port or interface is interpreted relative to a single wire. For example, if a port has a speed of 100,000 bps and it is a Full Duplex device, it could, in any one second, send up to 100,000 bits out one wire and simultaneously receive 100,000 bits from the other. Note that this is a total of 200,000 bits in one second. If, on the other hand, it were a Half Duplex device, in one second it could send and receive, in any combination, up to a total of 100,000 bits.
The Current Utilization of a port or interface is the ratio, expressed as a percentage, of the total number of bits sent over some period over the total number of bits that could have been sent in that period. A value of 100% means that, had more traffic arrived in that period, some would have been dropped due to lack of capacity.
For example, a port with a speed of 100,000 bps sends 450,000 bits and receives 400,000 bits in the last 10 seconds. To compute the Current Utilization, DFM first needs to know whether the device is Full or Half Duplex:
•
If the device is Full Duplex, the limiting factor is the "more heavily loaded wire." For a Full Duplex, 100,000 bps device, each wire can carry 1,000,000 bits in 10 seconds. The outgoing wire carried 450,000 bits, or 45% of the theoretical capacity; the incoming wire carried 400,000 bits, or 40% of capacity. The Current Utilization is defined to be 45%, the higher percentage.
•
If the device is Half Duplex, the total traffic has to go over a single wire. In this case, a total of 850,000 bits moved over the wire, which could have carried 1,000,000. Hence, the Current Utilization is 85%.
The DFM High Utilization Fault is reported when the Current Utilization exceeds the Utilization Threshold configured for the device.
Devices With an Unspecified Duplex Setting
DFM uses SNMP queries to determine the duplex setting of managed ports and interfaces.
•
All switches are able to report the duplex settings for their ports, although in some cases they report them only as Unspecified.
•
Most routers do not supply any information about the duplex setting of their interfaces. When this is the case, DFM will compute a setting—Half Duplex, Full Duplex, or Unspecified—based on the interface hardware type. Because many common interface types have a default duplex setting, but are capable of operating at either Full or Half Duplex, even when DFM computes a setting, it may not be correct for the interface as actually configured.
When DFM cannot correctly determine the duplex mode setting, the value it would compute for Current Utilization using the rules outlined in the previous section would not be correct. To avoid this, DFM uses a mathematical procedure to modify the Current Utilization value in a way that allows it to produce useful results even in the absence of full information.
DFM computes the Current Utilization exactly when:
1.
DFM reports the mode as Full Duplex.
2.
The port or interface is actually in Half Duplex mode.
3.
The sum of the incoming and outgoing data rates exceeds the line speed. In this situation, DFM treats the port or interface as though it were a Full Duplex device. For full-duplex interfaces, this will be the case whenever both "wires" are used at more than half their capacity, a very common situation.
If none of these situations applies, the mathematically adjusted Current Utilization will be larger than the true device utilization. This means that you may receive a High Utilization notification even though the device utilization has not really exceeded the threshold. You will, however, never fail to receive a notification that you should have received.
Power Supply Exception
A Power Supply Exception indicates that voltage and power supply conditions around a system present a potential hazard. The following faults can trigger a Power Supply Exception.
Voltage Out of Range
This fault is reported if the voltage for this device is outside the normal operating range and exceeds the Relative Voltage Threshold. For more information about this threshold, which is contained in the Environment setting, see the "Relative Voltage Threshold" section.
Voltage State Not Normal
This fault is reported if the voltage sensor test point for this device is not in the NORMAL or SHUTDOWN state. The voltage sensor state can be NORMAL, WARNING, CRITICAL, or SHUTDOWN.
Power Supply State Not Normal
This fault is reported if the power supply for this device is not in the NORMAL or SHUTDOWN state. The power supply state can be NORMAL, WARNING, CRITICAL, or SHUTDOWN.
Resource Exception
A Resource Exception indicates that processor and memory faults occurred because a system does not have enough processor or memory resources to process the current traffic load. The following faults can trigger a Resource Exception.
Note
A MIB may not support all of these exceptions, depending on the IOS version the MIB is running on. Refer to Appendix B, "MIBs Polled and SNMP Traps Processed or Passed-Through by DFM."
Excessive Fragmentation
This fault is reported if the system memory is highly fragmented: the fragmentation exceeds the Memory Fragmentation Threshold. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Memory Fragmentation Threshold" section.
High Buffer Miss Rate
This fault is reported if the rate of buffer misses exceeds the Memory Buffer Miss Threshold. Access this event notification in the alarm log to view the percentage breakdown of buffer misses into the types of misses (such as small, medium, big, large, and huge buffer misses). For more information about this threshold, which is contained in the Processor and Memory setting, see the "Memory Buffer Miss Threshold" section.
High Buffer Utilization
This fault is reported if the number of buffers used exceeds the Memory Buffer Utilization Threshold. Access this event notification in the alarm log to view the percentage of small, medium, big, large, and huge buffers used. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Memory Buffer Utilization Threshold" section.
High Utilization
This fault is reported if the processor utilization exceeds the Processor Utilization Threshold. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Processor Utilization Threshold" section.
Insufficient Free Memory
This fault is reported if the system is running out of memory resources and the percentage of free memory falls below the Free Memory Threshold. This fault is also reported if there has been a failure to allocate a buffer due to lack of memory. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Free Memory Threshold" section.
Temperature Exception
A Temperature Exception indicates that the temperature conditions around the device present a potential hazard. The following faults can trigger a Temperature Exception.
Fan State Not Normal
This fault is reported if the fan for this device is not in the NORMAL state. The fan state can be NORMAL, WARNING, or CRITICAL.
Temperature Out of Range
This fault is reported if the temperature for this device is outside the normal operating range and exceeds the Relative Temperature Threshold.
For more information about this threshold, which is contained in the Environment setting, see the "Relative Temperature Threshold" section.
Temperature State Not Normal
This fault is reported if the temperature sensor for this device is reporting abnormal temperature measurements. The temperature sensor state can be NORMAL, WARNING, or CRITICAL.