Table Of Contents
Faults and Exceptions Diagnosed by DFM
Backplane Utilization Exception
Error Exception
Operational Exception
Performance Exception
Understanding the High Utilization Fault
Power Supply Exception
Resource Exception
Temperature Exception
Faults and Exceptions Diagnosed by DFM
These topics describe the faults and exceptions reported by DFM:
•
Backplane Utilization Exception
•
Error Exception
•
Operational Exception
•
Performance Exception
•
Power Supply Exception
•
Resource Exception
•
Temperature Exception
A fault is an abnormal condition that occurs when a system (or component of a system) exceeds a performance threshold or is not functioning properly. An exception is a group of related faults. DFM groups related faults into a single exception.
For example, DFM generates a Resource Exception notification to indicate that a router has exhausted most of its processor or memory resources. Upon receiving a Resource Exception notification, you can access the notification to view which of the following faults has been detected:
•
High utilization of the processor
•
High utilization of the memory buffer
•
High rate of buffer misses
•
Excessive fragmentation of system memory
•
Insufficient free memory
DFM generates a single exception of a given type per device, regardless of the number of faults that exist.
Consider a router called moto-gw, which has a faulty processor. DFM reports a Resource Exception for moto-gw and notifies you in the Monitoring Console alarm log as illustrated in Figure 3-1.
Figure 3-1 Resource Exception Notification for Router moto-gw
You can access details about the notification to learn that the router's processor is experiencing High Utilization. Figure 3-2 shows the Notification Properties window, which pinpoints the fault.
Figure 3-2 Faulty Processor on Router Moto-gw
Table 3-1 summarizes all exceptions (except for Operational Exceptions, which are listed in Table 3-2), the faults that trigger them, and the thresholds associated with each fault. There is a one-to-one relationship between thresholds and faults. In addition, the table identifies which managed elements the exceptions are generated for and which components of the managed elements can be at fault.
For information on the DiscoveryError and DuplicateIP error, refer to:
•
Discovery Error and Devices Not Supporting SNMP
•
Duplicate IP Address Error
Table 3-1 Exception Diagnosis Summary
Managed Element
|
Exceptions
|
Component that Can Be at Fault
|
Fault(s) that Trigger Exception
|
Threshold Associated with Fault
|
Systems
|
Backplane Utilization
|
Chassis
|
High Backplane Utilization
|
Backplane Utilization Threshold
|
Systems
VLANs
|
Error
|
Network Adapters
|
High Error Rate
|
Error Threshold
|
ErrorTraffic Threshold1
|
Systems
VLANs
|
Performance
|
Network Adapters
|
High Broadcast Rate
|
Broadcast Threshold
|
High Collision Rate (for Ethernet adapters only)
|
Collision Threshold (for Ethernet adapters only)
|
High Discard Rate
|
Discard Threshold
|
High Queue Drop Rate
|
Queue Drop Threshold
|
High Utilization
|
Utilization Threshold
|
Systems
|
Power Supply
|
Power Supply
|
Power Supply State Not Normal
|
N/A
|
Voltage Sensor
|
Voltage Out of Range
|
Relative Voltage Threshold
|
Voltage State Not Normal
|
N/A
|
Systems
|
Resource
|
Memory
|
Excessive Fragmentation
|
Memory Fragmentation Threshold
|
High Buffer Miss Rate
|
Memory Buffer Miss Threshold
|
High Buffer Utilization
|
Memory Buffer Utilization Threshold
|
Insufficient Free Memory
|
Free Memory Threshold
|
Processor
|
High Utilization
|
Processor Utilization Threshold
|
Systems
|
Temperature
|
Temperature Sensor
|
Temperature Out of Range
|
Relative Temperature Threshold
|
Temperature State Not Normal
|
N/A
|
Fan
|
Fan State Not Normal
|
N/A
|
DFM also reports an Operational Exception when a system or component of a system is not functioning properly. You are notified of an operational exception at the system or VLAN level. Table 3-2 summarizes the faults that trigger an operational exception and identifies the systems or components of systems that can be at fault.
Table 3-2 Operational Exception Diagnosis Summary
Managed Element
|
Exception
|
System/Component that Can Be at Fault
|
Fault(s) that Trigger Exception
|
Systems
VLANs
|
Operational
|
Network Adapter
|
Backup Activated
Exceeded Maximum Uptime
Flapping
Operationally Down
|
Systems
|
Operational
|
Card
|
Operationally Down
|
System
|
Excessive Restarts
Unresponsive
|
SNMP Agent
|
Unresponsive
InvalidResponse1
|
Note
For descriptions of the network elements managed by DFM, refer to "Network Elements Managed by DFM."
Backplane Utilization Exception
A Backplane Utilization Exception indicates that the data being transmitted over a switch's backplane is excessive. The following fault can trigger a Backplane Utilization Exception.
High Backplane Utilization
This fault is reported if the chassis bandwidth utilization is excessive and surpasses the Backplane Utilization Threshold. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Processor and Memory" section.
Error Exception
An Error Exception indicates that a network adapter is experiencing transmission or functional problems. This causes some error counters to exceed the specified thresholds. The thresholds indicate the acceptable number of input or output packets in error as a percentage of the total number of input or output packets transmitted by a network adapter.
An Error Exception notification is generated for:
•
A system when any network adapters on the system are experiencing a problem.
•
A Virtual Local Area Network (VLAN) when any ports that are members of the VLAN are experiencing a problem.
The following fault can trigger an Error Exception.
High Error Rate
A High Error Rate fault is reported if:
•
The input packet rate is greater than or equal to the minimum packet rate, and the input packet error percentage is greater than the Error Threshold configured for this network adapter. The input packet error percentage is calculated by dividing the number of input packets in error by the total number of input packets, and expressing the result as a percentage.
•
The output packet rate is greater than or equal to the minimum packet rate, and the output packet error percentage is greater than the Error Threshold configured for this network adapter. The output packet error percentage is calculated by dividing the number of output packets in error by the total number of output packets, and expressing the result as a percentage.
Access this notification in the alarm log to view the breakdown by type of error (for example, runts, giants, cyclical redundancy checking (CRC), frame alignment, ignored and aborted). The type of information available is dependent on the device's MIB.
For more information about the error thresholds in the Ethernet Interface/Port Performance setting, see the following:
•
Error Threshold
•
ErrorTraffic Threshold
For more information about the error thresholds in the Generic Interface/Port Performance setting, see the following:
•
Error Threshold
•
ErrorTraffic Threshold
Operational Exception
An Operational Exception indicates that a system or one of its components is not functioning properly.
An Operational Exception notification is generated for:
•
A system when the system or a card or network adapter on the system is experiencing an operational problem.
•
A Virtual Local Area Network (VLAN) when any network adapters that are members of the VLAN are experiencing an operational problem.
The following faults can trigger an Operational Exception.
Backup Activated
This fault is reported if a backup port or interface has come online. It is possible to mark a port or interface as a backup. By default, interfaces with an ISDN Type attribute are marked as backup and use the Backup Interface Support setting, which controls the monitoring of backup activation.
Because a backup should not normally be online, DFM notifies you when one comes online. For more information about backup activation, see the "Backup Interface Support" section.
Exceeded Maximum Uptime
This fault is reported if a backup or dial-on-demand port or interface has been in an UP state for too long. It is possible to mark a port or interface as a backup or as dial-on-demand. By default, interfaces with an ISDN Type attribute are marked as backup, and interfaces with a PPP or SLIP Type attribute are marked as dial-on-demand. Since a backup or dial-on-demand port or interface should not normally be up for very long, DFM notifies you when the Maximum Uptime has been exceeded for one of these types of devices.
For more information about the Maximum Uptime parameter, see the "Maximum Uptime" section or the "Maximum Uptime" section.
Excessive Restarts
This fault is reported if a system repeatedly restarts over a short period of time. DFM diagnoses this fault by monitoring the number of system cold and warm start s that have been received within the restart window. The Restart Threshold and the Restart Window parameters in the Connectivity setting control analysis of excessive restarts. For more information about restart parameters, refer to the "Connectivity" section.
Flapping
This fault is reported if a port or interface repeatedly alternates between up and down states over a short period of time. DFM diagnoses this fault by monitoring the number of link downs that have been received within the link window for a particular network adapter. The Link Threshold and the Link Window parameters in the Interface/Port Flapping setting control flapping analysis. For more information about link parameters, refer to the "Interface/Port Flapping" section.
Note
By default, DFM manages trunk ports but does not manage access ports.
Operationally Down
This fault is reported when a card or network adapter's operational state is not normal. DFM diagnoses this fault by polling the operational status of the network adapter.
Unresponsive
This fault can be reported for a system or for an SNMP Agent.
If this fault is reported for a system, then ICMP Ping requests and SNMP queries to the device timeout with no response.
If this fault is reported for an SNMP Agent, the DFM is capable of Pinging the device, but SNMP requests timeout with no response.
Note
A system may also be reported as Unresponsive if the only link (for example, an interface) to the system goes down.
InvalidResponse
Note
The InvalidResponse fault is only supported if you have downloaded and installed DFM 1.2 Patch/IDU 1.2.10 (or later), by logging into Cisco.com at: http://www.cisco.com/pcgi-bin/tablebuild.pl/cw2000-dfm.
This fault can be reported for an SNMP agent when queries to the agent return an invalid value.
Performance Exception
A Performance Exception indicates that a network adapter is misconfigured or is exhibiting utilization conditions that affect an adapter's ability to receive or process packets.
A Performance Exception notification is generated for:
•
A system when any network adapters on the system are experiencing a performance problem.
•
A Virtual Local Area Network (VLAN) when any network adapters that are members of the VLAN are experiencing a performance problem.
The following faults can trigger a Performance Exception.
High Broadcast Rate
This fault is reported if the input packet broadcast percentage exceeds the Broadcast Threshold. The input packet broadcast percentage calculates the percentage of total capacity that was used to receive broadcast packets. For more information about this threshold in the Generic Interface/Port Performance setting, see the "Broadcast Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Broadcast Threshold" section.
High Collision Rate
This fault is reported if the rate of collisions exceeds the Collision Threshold. For more information about this threshold, which is contained in the Ethernet Interface/Port Performance setting, see the "Collision Threshold" section.
Note
This fault is only reported for Ethernet adapters.
High Discard Rate
This fault is reported if:
•
The input packet queued rate is greater than the minimum packet rate, and the input packet discard percentage is greater than the Discard Threshold. The input packet queued rate is the rate of packets received without error. The input packet discard percentage is calculated by dividing the rate of input packets discarded by the rate of packets received.
•
The output packet queued rate is greater than the minimum packet rate, and the output packet discard percentage is greater than the Discard Threshold. The output packet queued rate is the rate of packets sent without error. The output packet discard percentage is calculated by dividing the rate of output packets discarded by the rate of packets sent.
For more information about the Discard Threshold in the Generic Interface/Port Performance setting, see the "Discard Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Discard Threshold" section.
High Queue Drop Rate
This fault is reported if the number of packets discarded due to input or output queue overflow exceeds the Queue Drop Threshold. The input (or output) queue overflow is derived by dividing the number of packets designated to be sent (or received) that were discarded due to queue overflow by the total number of packets in the queue. For more information about this threshold in the Generic Interface/Port Performance setting, see the "Queue Drop Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Queue Drop Threshold" section.
High Utilization
This fault is reported if the current utilization is greater than the Utilization Threshold configured for this network adapter.
For more information about this threshold in the Generic Interface/Port Performance setting, see the "Utilization Threshold" section; for this threshold in the Ethernet Interface/Port Performance setting, see the "Utilization Threshold" section. (Default values are different for Ethernet network adapters.)
Refer to the "Understanding the High Utilization Fault" section for information on how these faults are computed.
Understanding the High Utilization Fault
Port and interface network elements represent the connections between switches and routers, respectively, and the wires or other physical media that lead to other devices. The two fundamental properties that describe a port or interface are its speed and its duplex setting:
•
The speed of a port or interface is the maximum rate at which it can transport information. It is usually expressed in bits per second (bps).
•
The duplex setting of a port or interface controls whether data transmission is bidirectional (Full Duplex) or unidirectional (Half Duplex) at any given time. One way to explain the difference is to imagine that a Full Duplex port or interface has two wires plugged into it: one wire carries bits into the router or switch, and one carries them out. A Half Duplex port or interface, on the other hand, has only a single wire plugged into it. The port or interface can either send bits out that wire or receive them from it; but at any given time, it cannot do both.
The speed of a port or interface is interpreted relative to a single wire. For example, if a port has a speed of 100,000 bps and it is a Full Duplex device, it could, in any one second, send up to 100,000 bits out one wire and simultaneously receive 100,000 bits from the other. Note that this is a total of 200,000 bits in one second. If, on the other hand, it were a Half Duplex device, in one second it could send and receive, in any combination, up to a total of 100,000 bits.
The Current Utilization of a port or interface is the ratio, expressed as a percentage, of the total number of bits sent over some period over the total number of bits that could have been sent in that period. A value of 100% means that, had more traffic arrived in that period, some would have been dropped due to lack of capacity.
For example, a port with a speed of 100,000 bps sends 450,000 bits and receives 400,000 bits in the last 10 seconds. To compute the Current Utilization, DFM first needs to know whether the device is Full or Half Duplex:
•
If the device is Full Duplex, the limiting factor is the "more heavily loaded wire." For a Full Duplex, 100,000 bps device, each wire can carry 1,000,000 bits in 10 seconds. The outgoing wire carried 450,000 bits, or 45% of the theoretical capacity; the incoming wire carried 400,000 bits, or 40% of capacity. The Current Utilization is defined to be 45%, the higher percentage.
•
If the device is Half Duplex, the total traffic has to go over a single wire. In this case, a total of 850,000 bits moved over the wire, which could have carried 1,000,000. Hence, the Current Utilization is 85%.
The DFM High Utilization Fault is reported when the Current Utilization exceeds the Utilization Threshold configured for the device.
Devices With an Unspecified Duplex Settings
DFM uses SNMP queries to determine the duplex setting of managed ports and interfaces.
•
All switches are able to report the duplex settings for their ports, although in some cases they report them only as Unspecified.
•
Most routers do not supply any information about the duplex setting of their interfaces. When this is the case, DFM will compute a setting—Half Duplex, Full Duplex, or Unspecified—based on the interface hardware type. Because many common interface types have a default duplex setting, but are capable of operating at either Full or Half Duplex, even when DFM computes a setting, it may not be correct for the interface as actually configured.
How DFM determines duplexity depends on which version of DFM you are running, as described in the following sections.
Algorithm Used by DFM 1.2 Patch/IDU 1.2.8 and Later Releases of DFM
If you have installed DFM 1.2 Patch/IDU 1.2.8 or later, DFM determines the duplexity by going through the following steps.
1.
DFM checks the portDuplexity MIB attribute in the CISCO-STACK-MIB, and:
a.
If the value is set to either half duplex or full duplex, DFM uses that setting.
b.
If the portDuplexity attribute is not present or is set to auto/disagree, DFM proceeds to Step 2.
2.
DFM checks the dot3StatsDuplexStatus MIB attribute in the ETHERLIKE-MIB, and:
a.
If the value is set to either half duplex or full duplex, DFM uses that setting.
b.
If the value is unknown, DFM proceeds to Step 3.
3.
DFM checks the cdpCacheDuplex MIB attribute in the CISCO-CDP-MIB, and:
a.
If the value is set to either half duplex or full duplex, DFM uses that setting.
b.
If the value is unknown, DFM proceeds to Step 4.
4.
Because it cannot correctly determine the duplex mode, DFM will do the following:
a.
If the interface is a 10 MB Ethernet interface, DFM will assume the setting is half duplex. (DFM considers an interface to be a 10 MB Ethernet when its Type="*ETHER*" and its MaxSpeed=10000000.)
b.
For all other interfaces, DFM will assume the setting is full duplex.
You can download the latest patch/IDU by logging into Cisco.com at: http://www.cisco.com/pcgi-bin/tablebuild.pl/cw2000-dfm.
Algorithm Used by DFM 1.2 Patch/IDU 1.2.5 and 1.2.6
If you have installed DFM 1.2 Patch/IDU 1.2.5 or later, and DFM cannot correctly determine the duplex mode setting of a port or interface, it will disable all utilization and attribute rates, and duplexity will not be reported.
If you have not installed DFM 1.2 Patch/IDU 1.2.5 or later, and DFM cannot correctly determine the duplex mode setting, DFM uses a mathematical procedure to modify the Current Utilization value in a way that allows it to produce useful results even in the absence of full information. (If DFM used the rules in the previous section, the result would be incorrect.) DFM will compute the Current Utilization exactly when:
1.
DFM reports the mode as Full Duplex.
2.
The port or interface is actually in Half Duplex mode.
3.
The sum of the incoming and outgoing data rates exceeds the line speed. In this situation, DFM treats the port or interface as though it were a Full Duplex device. For full-duplex interfaces, this will be the case whenever both "wires" are used at more than half their capacity, a very common situation.
If none of these three situations applies (or if you have an earlier version than DFM Patch/IDU 1.2.5), the mathematically adjusted Current Utilization will be larger than the true device utilization. This means that you may receive a High Utilization notification even though the device utilization has not really exceeded the threshold. You will, however, never fail to receive a notification that you should have received.
Power Supply Exception
A Power Supply Exception indicates that voltage and power supply conditions around a system present a potential hazard. The following faults can trigger a Power Supply Exception.
Voltage Out of Range
This fault is reported if the voltage for this device is outside the normal operating range and exceeds the Relative Voltage Threshold. For more information about this threshold, which is contained in the Environment setting, see the "Relative Voltage Threshold" section.
Voltage State Not Normal
This fault is reported if the voltage sensor test point for this device is not in the NORMAL or SHUTDOWN state. The voltage sensor state can be NORMAL, WARNING, CRITICAL, or SHUTDOWN.
Power Supply State Not Normal
This fault is reported if the power supply for this device is not in the NORMAL or SHUTDOWN state. The power supply state can be NORMAL, WARNING, CRITICAL, or SHUTDOWN.
Resource Exception
A Resource Exception indicates that processor and memory faults occurred because a system does not have enough processor or memory resources to process the current traffic load. The following faults can trigger a Resource Exception.
Note
Flash and free memory components are not instrumented in DFM.
Note
A MIB may not support all of these exceptions, depending on the IOS version on which the MIB is running. Refer to "MIBs Polled and SNMP Traps Processed or Passed-Through by DFM."
Excessive Fragmentation
This fault is reported if the system memory is highly fragmented: the fragmentation exceeds the Memory Fragmentation Threshold. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Memory Fragmentation Threshold" section.
High Buffer Miss Rate
This fault is reported if the rate of buffer misses exceeds the Memory Buffer Miss Threshold. Access this event notification in the alarm log to view the percentage breakdown of buffer misses into the types of misses (such as small, medium, big, large, and huge buffer misses). For more information about this threshold, which is contained in the Processor and Memory setting, see the "Memory Buffer Miss Threshold" section.
High Buffer Utilization
This fault is reported if the number of buffers used exceeds the Memory Buffer Utilization Threshold. Access this event notification in the alarm log to view the percentage of small, medium, big, large, and huge buffers used. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Memory Buffer Utilization Threshold" section.
High Utilization
This fault is reported if the processor utilization exceeds the Processor Utilization Threshold. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Processor Utilization Threshold" section.
Insufficient Free Memory
This fault is reported if the system is running out of memory resources and the percentage of free memory falls below the Free Memory Threshold. This fault is also reported if there has been a failure to allocate a buffer due to lack of memory. For more information about this threshold, which is contained in the Processor and Memory setting, see the "Free Memory Threshold" section.
Temperature Exception
A Temperature Exception indicates that the temperature conditions around the device present a potential hazard. The following faults can trigger a Temperature Exception.
Fan State Not Normal
This fault is reported if the fan for this device is not in the NORMAL state. The fan state can be NORMAL, WARNING, or CRITICAL.
Temperature Out of Range
This fault is reported if the temperature for this device is outside the normal operating range and exceeds the Relative Temperature Threshold.
For more information about this threshold, which is contained in the Environment setting, see the "Relative Temperature Threshold" section.
Temperature State Not Normal
This fault is reported if the temperature sensor for this device is reporting abnormal temperature measurements. The temperature sensor state can be NORMAL, WARNING, or CRITICAL.