Table Of Contents
Correlation Logic
Root-Cause Correlation Process
Root-Cause Alarms
Correlation Flows
Box-Level Correlation
Network Correlation Flows
Using Weights
Correlating TCA
Correlation Logic
This chapter describes how Cisco ANA performs correlation logic decisions:
•
Root-Cause Correlation Process—Describes the root-cause correlation concept.
•
Root-Cause Alarms—Describes the root-cause alarm and weights concepts.
•
Correlation Flows—Describes network and box-level correlation flows.
Root-Cause Correlation Process
Root-cause correlation is implemented in various stages within the Cisco ANA VNEs. Initially, the system tries to find the root-cause alarm. When a VNE detects a fault and opens an alarm, it attempts to find another open alarm within the same device, which qualifies as the root-cause of the new alarm. For example, in the case of a "link-down syslog" alarm , the VNE will look for a root-cause alarm within the device, for example, "link down". When such a root cause is found and qualified, the correlation relationship is set in the alarm database. This process is box-level correlation.
A more complex scenario is finding the root cause in a different device, which could be many network hops away. In the above example, the link-down alarm could cause multiple BGP Neighbor Down alarms throughout the network. In such cases, the BGP Neighbor Down is configured by default to actively go and search for a root cause in other VNEs, by initiating an network correlation flow. In this example, the VNE that detected the BGP Neighbor Down uses the network topology model maintained in the Cisco ANA fabric to trace the path to its lost neighbor. During this trace it will encounter the faulty link, and qualify it as the BGP Neighbor Down root cause.
The following figure illustrates the local and active correlation processes.
Figure 2-1 Root-Cause Correlation Process
The correlation mechanisms are highly configurable (per alarm), as described in the following sections.
Root-Cause Alarms
Potential root-cause alarms have a determined weight according to the specific event customization. Refer to "Event and Alarm Configuration Parameters" for additional information about setting the weights. For example, a link-down alarm is configured to allow other alarms to correlate to it, thus when a link-down event is recognized, other alarms that occur in the network may choose to correlate to it, hence identifying it as the cause for their occurrence. However an event that is configured to be the cause for other alarms can in its turn correlate to another alarm. The topmost alarm in the correlation tree is the root cause for all the alarms.
Correlation Flows
The VNEs utilize their internal device component model (DCM) in order to perform the actual correlation. This action is considered to be a correlation flow. There are two basic correlation mechanisms used by the VNE:
1.
Box-Level Correlation (correlation in the same VNE).
2.
Network Correlation Flows (correlation across VNEs).
Each event can be configured to:
•
Not correlate at all.
•
Perform box-level correlation.
•
Perform box-level correlation and network correlation should the box-level correlation fail.
For more information about these parameters, see "Event and Alarm Configuration Parameters".
Box-Level Correlation
When the root cause problem is at the box level, attempts to correlate to other events are restricted to the specific VNE. This means that the correlation flow does not cross the DCM models of more than one VNE. An example is a port-down syslog event correlating to a port-down event.
An exception for this behavior is the link-down alarm. Since a link entity connects two endpoints in the DCM model, it involves the DCM of two different VNEs, but on each VNE the events are correlated to their own copy of the link-down event.
Network Correlation Flows
Network problems and their effects are not always restricted to one network element. This means that a certain event could have the capability of correlating to an alarm several hops away. To do this the correlation mechanism within the VNE uses an active correlation flow that runs on the internal VNE's DCM model and tries to correlate along a specified network path to an alarm. This is similar to the Cisco ANA PathTracer operation when it traces a path on the DCM model from point A to point Z, except that it is trying to correlate to a root-cause alarm along the way, rather than just tracing a path. This method is usually applicable for problems in the network layer and above (OSI network model) that might be caused due to a problem upstream or downstream. An example is an OSPF Neighbor Down event caused by a link-down problem in an upstream router. Another important distinction between Cisco ANA PathTracer and the correlation flow is that the correlation flow may run on an historical snapshot of the network.
Using Weights
In cases where there are multiple potential root causes along the same service path, Cisco ANA enables the user to define a priority scheme (weight) which can determine the actual root cause.
The correlation system will use the following information to identify more precisely the root-cause alarm:
•
weight: -2—weightless. The flow will not collect weightless alarms and no network correlation to the alarm is possible.
•
weight: -1—max weight. The correlation flow will stop if it encounters a max weight alarm, and will choose that alarm as the root cause.
•
weight: >=0 The correlation flow will collect the alarm, but will not stop.
The correlation mechanism will choose the alarm with the highest weight as the root cause for the alarm that triggered the network correlation flow.
Correlating TCA
TCAs participate in the correlation mechanism, and can correlate or be correlated to other alarms.