Table Of Contents
Advanced Correlation Scenarios
Device Unreachable Alarm
Connectivity Test
Device Fault Identification
Device Unreachable Example 1
Device Unreachable Example 2
HSRP Alarms
HSRP Example 1
HSRP Registry Parameters
IP Interface Failure Scenarios
IP Interface Status Down Alarm
Correlation of Syslogs/Traps
All IP Interfaces Down Alarm
IP Interface Failure Examples
Interface Example 1
Interface Example 2
Interface Example 3
Interface Example 4
Interface Example 5
ATM Examples
Ethernet, Fast Ethernet, Giga Ethernet Examples
Interface Registry Parameters
"ip interface status down"
"All ip interfaces down"
Advanced Correlation Scenarios
This chapter describes the specific alarms which use advanced correlation logic on top of the root cause analysis flow.
Device Unreachable Alarm describes the "device unreachable" alarm, its correlation and provides various examples.
HSRP Alarms describes the HSRP alarms and provides various examples.
IP Interface Failure Scenarios describes the "ip interface status down" alarm and its correlation. In addition, it describes the "all ip interfaces down" alarm, its correlation and provides several examples.
Device Unreachable Alarm
Connectivity Test
Connectivity tests are used to verify connectivity between the Cisco ANA VNEs and managed network elements. The connectivity is tested per each protocol through which the VNE polls the device. The supported protocols for connectivity test are SNMP, Telnet and ICMP.
Device unreachable alarm will be issued if one or more of the connectivity test fails. i.e. the device does not respond on this protocol. The alarm will be cleared when all the protocol connectivity test are passed successfully.
Note
The ICMP connectivity test is enabled in the Cisco ANA Manage.
Device Fault Identification
When a network element stops responding to queries from the management system, one of two things has happened:
•
Connectivity to that device is lost
•
The device itself crashes/restarts
Cisco ANA implements an algorithm that uses additional data to heuristically resolve the ambiguity and declare the Root-Cause correctly. Refer to the examples that follow.
Device Unreachable Example 1
In this example, the router (R1) goes down. As a result the links: L2, L3, and L4 go down in addition to the R1 session.
Figure 3-1 Device Unreachable Example 1
In this case the system will provide the following report:
•
Root-Cause—Device Unreachable.(R1)
•
Correlated events:
–
L2 down
–
L3 down
–
L4 down
Device Unreachable Example 2
In this example, the router (R1) goes down. As a result the links: L2, L3, L4 go down as well as the R1 session. The router R2, accessed by the link L3 is also unreachable.
Note
No Link down alarm is displayed for L3 as its state cannot be determined.
Figure 3-2 Device Unreachable Example 2
Note
If the device has a single link, and it is being managed through that link (in-band management), there is no way to determine if the device is unreachable due to link down, or the link is down because the device is unreachable. In this case Cisco ANA shows that the device unreachable due to link down.
In this case the system will provide the following report:
•
Root-Cause—Device Unreachable.(R1)
•
Correlated events:
–
L2 down
–
Device Unreachable (R2)
–
L4 down
HSRP Alarms
When an active Hot Standby Router Protocol (HSRP) group's status changes a service alarm is generated and a syslog is sent.
Table 3-1 HSRP Service Alarms
Alarm
|
Is-ticketable
|
Is-correlation-allowed
|
Correlated to
|
Severity
|
Primary HSRP interface is not active / Primary HSRP interface is active
|
Yes
|
No
|
Can be correlated to several other alarms, for example, link down
|
Major
|
Secondary HSRP interface is active / Secondary HSRP interface is not active
|
Yes
|
No
|
Can be correlated to several other alarms, for example, link down
|
Major
|
Note
HSRP group information can be viewed in the Inventory window of Cisco ANA NetworkVision.
HSRP Example 1
In this example the link between Router 2 and Switch 2 is shut down (causing the HSRP standby group on Router 3 to become active), and a link down service alarm is generated. The Primary HSRP group on Router 2 is not active anymore. A service alarm is generated and correlated to the link down alarm. Router 2 also sends a syslog which is correlated to the link down alarm.
The secondary HSRP group, configured on Router 3 now changes from standby to active. This network event triggers an IP based active flow with the destination being the virtual IP address configured in the HSRP group. When the flow reaches its destination a service alarm is generated and correlated to the link down alarm. Router 3 also sends a syslog which is correlated to the link down alarm.
Figure 3-3 HSRP Example 1
In this case the system provides the following report:
•
Root-Cause—Link down (Router 2-Switch 2)
•
Correlated events:
–
Primary HSRP interface is not active (source: Router 2)
–
%HSRP-6-STATECHANGE: FastEthernet0/0 Grp 1 state Active -> Speak (source: Router 2)
–
Secondary HSRP interface is active (source: Router 3)
–
%STANDBY-6-STATECHANGE: Ethernet0/0 Group 1 state Standby -> Active (source: Router 3)
HSRP Registry Parameters
The following "hsrp group status changed" parameters can be controlled through the Registry for both primary and secondary service alarms:
•
flow-delay
•
time-stamp-delay
The following "hsrp syslog" parameter can be controlled through the Registry for both primary and secondary HSRP status change syslogs:
•
expiration-time
Note
For more information about these parameters see the Event and Alarm Configuration Parameters chapter.
IP Interface Failure Scenarios
This section includes the following:
•
IP Interface Status Down Alarm
•
All IP Interfaces Down Alarm
•
IP Interface Failure Examples
IP Interface Status Down Alarm
Alarms related to subinterfaces, for example, Line Down trap, Line Down syslog, and so on are reported on IP Interfaces configured above the relevant subinterface, this means that actually in the system subinterfaces are represented by the IP interfaces configured above them. All events sourcing from subinterfaces without a configured IP are reported on the underlying Layer1.
An "ip interface status down" alarm is generated when the status of the ip interfaces (whether it is over an interface or a sub interface) changes from "Up" to "Down", or any other non-operational state. All events sourcing from the subinterfaces correlate to this alarm. In addition an "All ip interfaces down" alarm is generated when all of the ip interfaces above a physical port change state to "Down".
Table 3-2 IP Interface Status Down Alarm
Name
|
Description
|
Is-ticketable
|
Is-correlation-allowed
|
Correlated to
|
Severity
|
Interface status down/up
|
Sent when an IP interface changes oper status to "down"
|
Yes
|
Yes
|
Link Down/Device unreachable/Configuration changed
|
Major
|
The alarm's description includes the full name of the IP interface, e.g. Serial0.2 (including the identifier for the sub interface if it is a sub interface) and the source of the alarm source points to the IP interface (and not to Layer1).
All syslogs and traps indicating changes in sub interfaces (above which an IP is configured) correlate to the "ip interface status down" alarm (if this alarm was supposed to be issued). The source of these events is the IPInterface. Syslogs and traps that indicate problems in Layer1 (that do not have a subinterface qualifier in their description) are sourced to Layer1.
Note
In case a syslog/trap is received from a subinterface that does not have an IP configured above it, the source of the created alarm is the underlying Layer1.
For example:
•
Line down trap (for sub interface)
•
Line down syslogs (for sub interface)
For events that occur on subinterfaces:
•
When sending the information northbound, the system uses the full sub interface name in the interface name in the source field, as described in the ifDesc/ifName OID (e.g. Serial0/0.1 and not Serial0/0 DLCI 50).
•
The source of the alarm is the IPInterface configured above the subinterface.
•
If there is no IP configured, the source is the underlying Layer1.
In case the main interface goes down, all related sub-interfaces traps and syslogs are correlated as child tickets to the main interface parent ticket.
The following technologies are supported:
•
Frame Relay/HSSI
•
ATM
•
Ethernet, Fast Ethernet, Gigabit Ethernet
•
POS
•
CHOC
Correlation of Syslogs/Traps
When receiving a trap/syslog for the sub interface level, immediate polling of the status of the relevant IP interface occurs and a polled parent event (for example, "ip interface status down") is created. The trap/syslog is correlated to this alarm.
Where there is a multipoint setup, and only some circuits under an IP interface go down and this does not cause the state of the IP interface to change to "down", then no "ip interface status down" alarm is created. All of the circuit down syslogs correlate by flow to the possible root cause, for example "Device unreachable" on a CE device.
All IP Interfaces Down Alarm
•
When all of the IP interfaces configured above a physical interface change their state to "down", the "All ip interfaces down" alarm is sent.
•
When at least one of the IP interfaces changes its state to "up", a clearing alarm is sent, namely, the "active ip interfaces found" alarm.
•
The "ip interface status down" alarm for each of the failed IP interfaces is correlated to the "All ip interfaces down" alarm.
Note
When an "all ip interfaces down" alarm is cleared by the "active ip interfaces down" alarm but there are still correlated "ip interface status down" alarms for some IP interfaces, the severity of the parent ticket is the highest severity among all of the correlated alarms. For example, if there is an uncleared "interface status down" alarm, the severity of the ticket remains Major, despite the fact that the "Active ip interfaces found" alarm has a Cleared severity.
Table 3-3 All IP Interfaces Down
Name
|
Description
|
Is-ticketable
|
Is-correlation-allowed
|
Correlated to
|
Severity
|
All ip interfaces down/Active ip interfaces found
|
Sent when all of the IP interfaces configured above a physical port change their oper status to "down"
|
Yes
|
Yes
|
Link Down/Configuration Change
|
Major
|
The "All ip interfaces down" alarm is sourced to the Layer1 component. All alarms from "the other side", for example, "device unreachable" correlate to the "All ip interfaces down" alarm.
IP Interface Failure Examples
Note
In all of the examples that follow it is assumed that the problems that result in the unmanaged cloud or the problems that occurred on the other side of the cloud (for example, an "unreachable" CE device from the point of view a PE device) cause the relevant IP interfaces' state to change to "down". This in turn causes the "ip interface status down" alarm to be sent.
If this is not the case, as in some Ethernet networks, and there is no change to the state of the IP interface, all of the events on the sub interfaces that are correlation flow capable, will try to correlate to other possible root causes, including "cloud problem".
Interface Example 1
In this example there is multipoint connectivity between a PE and number of CEs through an unmanaged Frame Relay network. All of the CEs (Router2 and Router3) have logical connectivity to the PE through a multipoint sub interface on the PE (Router10). The "Keep Alive" option is enabled for all circuits. A link is disconnected inside the unmanaged network that causes all the CEs to become unreachable.
Figure 3-4 Interface Example 1
The following failures are identified in the network:
•
A "device unreachable" alarm is generated for each CE
•
An "ip interface status down" alarm is generated for the multipoint IP interface on the PE
The following correlation information is provided:
•
The root cause is IP sub-interface down
•
All of the "device unreachable" alarms are correlated to the "ip interface status down" alarm on the PE
Interface Example 2
In this example there is point-to-point connectivity between a PE and a CE through an unmanaged Frame Relay network. CE1 became unreachable, and the status of the IP interface on the other side (on the PE1) changed state to "down". The "Keep Alive" option is enabled. The interface is shut down between the unmanaged network and CE1.
Figure 3-5 Interface Example 2
The following failures are identified in the network:
•
A "device unreachable" alarm is generated on the CE
•
An "ip interface status down" alarm is generated on the PE
The following correlation information is provided:
•
The root cause is "device unreachable"
–
The "ip interface status down" alarm is correlated to the "device unreachable" alarm
–
The syslogs and traps for the related sub interfaces are correlated to the "ip interface status down" alarm
Interface Example 3
In this example there is a failure of multiple IP interfaces above the same physical port (mixed point-to-point and multipoint Frame Relay connectivity). CE1 (Router2) has a point-to-point connection to PE1 (Router10). CE1 and CE2 (Router3) have multipoint connections to PE1. The IP interfaces on PE1 that are connected to CE1, and CE2 are all configured above Serial0/0. The "Keep Alive" option is enabled. A link is disconnected inside the unmanaged network that has caused all of the CEs to become unreachable.
Figure 3-6 Interface Example 3
The following failures are identified in the network:
•
All of the CEs become unreachable
•
An "ip interface status down" alarm is generated for each IP interface above Serial0/0 that has failed
The following correlation information is provided:
•
The root cause is "All IP interfaces down" on Serial0/0 port
•
The "ip interface status down" alarms are correlated to the "All IP interfaces down" alarm
•
The "device unreachable" alarms are correlated to the "All IP interfaces down" alarm
•
The syslogs and traps for the related subinterfaces are correlated to the "All IP interfaces down" alarm
Interface Example 4
In this example there is a link down. In a situation where a link down occurs, whether it involves a cloud or not, the link failure is considered to be the most probable root cause for any other failures. In this example, a link is disconnected between the unmanaged network and the PE.
Figure 3-7 Interface Example 4
The following failures are identified in the network:
•
A "link down" alarm is generated on Serial0/0
•
A "device unreachable" alarm is generated for each CE
•
An "ip interface status down" alarm is generated for each IP interface above Serial0/0
•
An "All interfaces down" alarm is generated on Serial0/0
The following correlation information is provided:
•
The "device unreachable" alarms are correlated to the "link down" alarm
•
The "ip interface status down" alarm is correlated to the "link down" alarm
•
The "All interfaces down" alarm is correlated to the "link down" alarm
•
All of the traps and syslogs for the sub interfaces are correlated to the "link down" alarm
Interface Example 5
In this example on the PE1 device that has multipoint connectivity, one of the circuits under the IP interface has gone down and the CE1 device which is connected to it has become unreachable. The status of the IP interface has not changed and other circuits are still operational.
Figure 3-8 General Interface Example
The following failures are identified in the network:
•
A "device unreachable" alarm is generated on CE1
•
A Syslog alarm is generated notifying the user about a circuit down
The following correlation information is provided:
•
"device unreachable" on the CE
–
The Syslog alarm is correlated by flow to the possible root cause, for example, a "device unreachable" alarm on CE1
ATM Examples
Similar examples involving ATM technology have the same result, assuming that a failure in an unmanaged network causes the status of the IP interface to change to "Down" (ILMI is enabled).
Ethernet, Fast Ethernet, Giga Ethernet Examples
Interface Example 6
In this example there is an unreachable CE due to a failure in the unmanaged network.
Figure 3-9 Interface Example 5
The following failures are identified in the network:
•
A "device unreachable" alarm is generated on the CE
•
A "Cloud problem" alarm is generated
The following correlation information is provided:
•
No alarms are generated on a PE for Layer1, Layer2 or for the IP layers
•
The "device unreachable" alarm is correlated to the "Cloud problem" alarm
Note
This behavior may change depending on the "correlate-to-cloud"value.
Interface Example 7
In this example there is a link down on the PE that results in the CE becoming unreachable.
Figure 3-10 Interface Example 6
The following failures are identified in the network:
•
A "link down" alarm is generated on the PE
•
An "ip interface status down" alarm is generated on the PE
•
A "device unreachable" alarm is generated on the CE.
The following correlation information is provided:
•
"Link down" on the PE
–
The "ip interface status down" alarm on the PE is correlated to the "link down" alarm
–
The "device unreachable"alarm on the CE is correlated to the "link down" alarm on the PE
–
The traps and syslogs for the sub interface are correlated to the "link down" alarm on the PE
Interface Registry Parameters
"ip interface status down"
The following "ip interface status down" parameters can be controlled through the Registry:
•
is-correlation-allowed
•
severity
•
timeout
•
expiration-time
•
flow-activation-message
•
flow-delay
•
time-stamp-delay
•
weight
•
is-ticketable
Note
For more information about these parameters see the Event and Alarm Configuration Parameters chapter.
"All ip interfaces down"
The following "All ip interfaces down" parameters can be controlled through the Registry:
•
is-correlation-allowed
•
is-ticketable
•
severity
•
activate-flow
•
correlate
•
timeout
•
expiration-time
•
weight
Note
For more information about these parameters see the Event and Alarm Configuration Parameters chapter.