Table Of Contents
Correlation Logic
Root-Cause Correlation Process
Root-Cause Alarms
Correlation Flows
Network Correlation Flows
Box-Level Correlation
Using Weights
Correlating TCA
Correlation Logic
This chapter describes how Cisco ANA performs correlation logic decisions.
Root-Cause Correlation Process describes the root-cause correlation concept.
Root-Cause Alarms describes the root-cause alarm and weights concepts.
Correlation Flows describes network and box-level correlation flows.
Root-Cause Correlation Process
Root-cause correlation is implemented in various stages within the Cisco ANA VNEs. Initially, the system tries to find the root-cause alarm. When a VNE detects a fault (and opens an alarm), it attempts to find another open alarm within the same device, which qualifies as the root-cause of the new alarm. For example, in the case of a "link down syslog" alarm , the VNE will look for a root-cause alarm within the device, for example, "link down". When such a root-cause is found and qualified, the correlation relationship is set in the alarm DB. This process is named Box-Level Correlation.
A more difficult scenario is finding the root-cause in a different device, which could be many network hops away. In the above example, the Link-down alarm could cause multiple "BGP Neighbor down" alarms throughout the network. In such cases, the BGP Neighbor down is configured by default to actively go and search for a root-cause in other VNEs, by initiating an Network Correlation Flow. In this example, the VNE that detected the BGP Neighbor down uses the network topology model maintained in the Cisco ANA fabric to trace the path to its lost neighbor. During this trace it will encounter the faulty link, and qualify it as the BGP Neighbor down root-cause.
The following figure illustrates the local and active correlation processes.
Figure 2-1 Root-Cause Correlation Process
The correlation mechanisms are highly configurable (per alarm), as described in the following sections.
Root-Cause Alarms
Potential Root-Cause alarms have a determined weight according to the specific event customization. Refer to the Event and Alarm Configuration Parameters section for additional information about setting the weights. For example, a `Link-Down' alarm is configured to allow other alarms to correlate to it, thus when a `Link-Down' event is recognized other alarms that occur in the network may choose to correlate to it, hence identifying it as the cause for their occurrence. However an event that is configured to be the cause for other alarms can in its turn correlate to another alarm. The topmost alarm in the correlation tree is the Root-cause for all the alarms.
Correlation Flows
The VNEs utilize their internal DCM (Device Component Model) in order to perform the actual correlation. This action is considered to be a `correlation flow'. There are two basic correlation mechanisms used by the VNE:
•
Box Level correlation (correlation in the same VNE)
•
Network correlation (correlation across VNEs).
Each event can be configured to:
•
Not correlate at all
•
Perform Box-level correlation
•
Perform Box-level correlation and Network correlation should the Box-level correlation fail.
For more information about these parameters, see the Event and Alarm Configuration Parameters section.
Network Correlation Flows
Network problems and their effects are not always restricted to one network element. This means that a certain event could have the capability of correlating to an alarm several hops away. To actually do so the correlation mechanism within the VNE uses an active correlation flow that runs on the internal VNEs DCM model and `tries' to correlate along a specified network path to an alarm. This is similar to the Cisco ANA PathTracer operation when it traces a path on the DCM model from point `A' to point `Z' with the distinction of trying to correlate to a Root-Cause alarm along the way, rather than just tracing a path. This method is usually applicable for problems in the Network layer and above (OSI Network Model) that might be caused due to a problem up or down stream. An example is an OSPF Neighbor Down event caused by a Link Down problem in an up stream router. Another important distinction between Cisco ANA PathTracer and the correlation flow is that the correlation flow may run on a historical snapshot of the network.
Box-Level Correlation
In contrast to Network Correlation Flows when the Root-Cause problem is on the `box' level the attempts to correlate other events are restricted to the specific VNE. This means that the correlation flow doesn't cross the DCM models of more than one VNE. An example is a Port Down syslog event correlating to a Port Down event. An exception for this behavior is the Link Down alarm. Since a `Link' entity connects two End points in the DCM model, it involves the DCM of two different VNEs, but on each VNE the events are correlated to their own `copy' of the link-down event.
Using Weights
In cases where there are multiple potential root-causes along the same service path, Cisco ANA enables the user to define a priority scheme (weight) which can determine the actual root-cause.
The correlation system will use the following information to identify more precisely the root-cause alarm:
•
weight: -2—weightless. The flow will not collect weightless alarms and no network correlation to the alarm is possible.
•
weight: -1—max weight. The correlation flow will stop if it encounters a max weight alarm, and will choose that alarm as the root-cause.
•
weight: >=0 The correlation flow will collect the alarm, but will not stop.
The correlation mechanism will choose the alarm with the highest weight as the root-cause for the alarm that triggered the network correlation flow.
Correlating TCA
TCAs participate in the correlation mechanism and can correlate or be correlated to other alarms.