Cisco Active Network Abstraction Managing MPLS User Guide Version 3.5.2
Fault Management in MPLS Networks

Table Of Contents

Fault Management In MPLS Networks

MPLS Related Faults

MPLS Black Hole Found Alarm

Broken LSP Discovered Alarm

Black Hole To Link Down

BGP Related Faults

BGP Neighbor Loss

Traffic Engineering Faults

MPLS TE Tunnel Down and TE Tunnel Flapping

Tunnel Reoptimized

Layer 2 VPN Faults

Pseudo Wire (L2 VPN) MPLS Tunnel Down

Alarms Summary


Fault Management In MPLS Networks


This chapter describes the alarms that Cisco ANA detects and reports for BGP, MPLS TE (using RSVP TE), MPLS Black Holes, as well as alarm reports for Layer 2 and Layer 3 VPNs:

MPLS Related Faults—Describes the "MPLS black hole found" and "Broken LSP discovered" alarms.

BGP Related Faults—Describes the "BGP neighbor loss" alarm.

Traffic Engineering Faults—Describes the "MPLS TE tunnel down", "MPLS TE tunnel flapping" and "Tunnel reoptimized" alarms.

Layer 2 VPN Faults—Describes the "Pseudo Wire (L2 VPN) MPLS tunnel down" alarm.

Alarms Summary—Provides a brief description of the alarms for VPNs, including their severity and the up alarm (clearing alarm) for each.

Cisco ANA supports the following alarms:

MPLS Related Faults:

MPLS black hole found

Broken LSP discovered

BGP Related Faults:

BGP neighbor loss

Traffic Engineering Faults:

MPLS TE tunnel down

MPLS TE tunnel flapping

Tunnel reoptimized

Layer 2 VPN Faults:

Pseudo Wire (L2 VPN) MPLS tunnel down

The alarms are displayed in the ticket pane of the Cisco ANA NetworkVision window. For more information about the ticket pane, see the Cisco Active Network Abstraction NetworkVision User Guide.

MPLS Related Faults

This section includes descriptions of the following MPLS related faults:

MPLS black hole found

Broken LSP discovered


Note The MPLS black hole feature is only supported when the PEs are managed by the system.


MPLS Black Hole Found Alarm

A MPLS "black hole" is defined as an abnormal termination of a MPLS path (LSP) inside a MPLS network. A MPLS "black hole" exists when on a specific interface there are untagged entries destined for a known PE router. It is assumed that a router functions as a PE router if there are services using the MPLS network, such as L3 VPNs or Pseudo Wire (L2 VPN) MPLS Tunnels. Note that the untagged interfaces may exist in the network in normal situations. For example, where the boundary of the MPLS cloud has untagged interfaces this is still considered normal.

The existence of a MPLS "black hole" results in a loss of all the MPLS labels on a packet including the VPN information which lies in the inner MPLS label. So if a packet goes through an untagged interface, the VPN information is lost. The VPN information loss translates directly to VPN sites losing connectivity.

A "MPLS Black Hole Found" alarm is detected actively by the system, namely, service alarms are generated whenever Cisco ANA discovers a MPLS interface that has at least one untagged LSP leading to a known PE router.

Black hole alarms are detected either:

When the system is loaded for the first time and performs the initial discovery of the network.

Through the ongoing discovery process, which identifies changes in the network.

Broken LSP Discovered Alarm

The "MPLS Black Hole Found" alarm activates a backward flow on the specific untagged entry in order to traverse the full path of the LSPs passing through it. If Cisco ANA locates services (VRFs, Pseudo Wire L2 tunnels) along this path that are using these LSPs a "Broken LSP Discovered" alarm is issued. Such services can only be found on PE routers and they can be found on more than one PE router. The source of the "Broken LSP Discovered" alarm is the PE router on which the service was discovered and in many cases this router is different from the router that issued the "MPLS Black Hole Found" alarm.

"Broken LSP Discovered" alarms are correlated to the "MPLS Black Hole Found" alarm (except in the case of a Black hole alarm due to a link down as described on page 6-3).

The "Broken LSP Discovered" alarm is detected actively by the system, namely, service alarms are generated.

An example of a MPLS black hole scenario is provided below.

In the network described in this example, the shortest path from PE2 to PE3 is PE2<->P2<->PE3. The link between P2 and PE3 is a MPLS link, meaning interfaces on both side of the link are configured as MPLS interfaces. Also assume that for some reason the MPLS configuration is incomplete or incorrect, namely:

Only one interface is configured as a MPLS interface.

The label distribution protocol is configured differently on both interfaces (protocol mismatch).

In this case the label switching table on P2 and PE3 will have untagged entries for the LSPs between PE2 and PE3. If PE2 and PE3 have VPN services (VRFs, Pseudo Wire tunnels) the outcome will be that the data flow between PE2 and PE3 will be affected.

Figure 6-1 Example of a MPLS Black Hole Scenario

In this case Cisco ANA does the following:

Identifies untagged label switching entries on P2 and PE3.

Issues "MPLS Black Hole Found" alarms on the interfaces on both sides of the link (since the LSP is unidirectional).

Initiates a backward flow starting from the link on the specific untagged entries and identifies the 2 LSPs traversing the link, namely:

LSP from PE2 to PE3

LSP from PE3 to PE2

Issues "Broken LSP Discovered" alarms on both LSPs in PE2 and PE3, which are correlated to the corresponding "MPLS Black Hole Found" alarm.


Note The clearing alarm does not activate flows to locate the LSPs that were passing through it in order to issue a clearing alarm for Broken LSPs, but rather uses the auto clear functionality. The Gateway periodically reviews the tickets and checks if all the alarms under each ticket are cleared or configured as auto cleared alarms, and whether the Gateway correlation timeout has passed, and in this case the Gateway closes the ticket.

Using this functionality, once the "MPLS Black hole" alarm is cleared, then after a specific time interval (configured Gateway correlation timeout) has passed, the Gateway will be able to close the ticket since all the alarms correlated to "MPLS Black hole" are "Broken LSP" which are configured as auto cleared.


Black Hole To Link Down

In a case where a link down event in a MPLS network has caused an IP reroute and therefore LDP redistribution, a case may arise where new LSPs are now redirected through a non-MPLS segment thereby creating a black hole.

In this case the "Broken LSP Discovered" alarms are issued as described in Broken LSP Discovered Alarm, but all the broken LSPs that are found are correlated to the "Link Down" alarm and not to the "MPLS Black Hole Found" alarm.

BGP Related Faults

Cisco ANA monitors BGP neighbor information and makes correlation and impact analysis information available to users.

This section includes a description of the BGP related faults.

BGP Neighbor Loss

In IP/MPLS VPN networks, when BGP connectivity is lost to a specific device, the resulting BGP connection loss translates directly to VPN sites losing connectivity.

The VNE models the BGP connection between routers and actively monitors its state. A BGP neighbor loss alarm is generated from both sides of the connection in the case of a connectivity loss, resulting in alarms and tickets being issued and users viewing impact analysis information.

The correlation engine identifies various faults that affect the BGP connection and reports them as the root cause for the BGP neighbor loss alarm. For example, Link down, CPU over utilized, and Link data loss.


Note "BGP Neighbor Loss" alarms are not correlated to each other but are correlated to the root cause of the connectivity loss.


The "BGP Neighbor Loss" alarm is detected actively by the system, namely, service alarms are generated.

The system also supports "BGP neighbor down" syslogs.

Traffic Engineering Faults

This section includes a description of the following Traffic Engineering related faults:

MPLS TE tunnel down

MPLS TE tunnel flapping

Tunnel reoptimized

MPLS TE Tunnel Down and TE Tunnel Flapping

When a TE tunnel's operational status changes to down and the tunnel is not flapping, the system generates a "Tunnel Down" alarm.

The correlation engine identifies various faults that affect the TE tunnel's status and reports on them as the root cause for the TE "Tunnel Down" alarm, for example, Link down.

Multiple up and down alarms that are generated during a short time interval are suppressed and displayed as a "Tunnel Flapping" alarm (according to the specific flapping configuration).

The "MPLS TE Tunnel Down" and the "TE Tunnel flapping" alarms are detected actively by the system, namely, service alarms are generated.

The system also supports "MPLS TE Tunnel Down" syslogs, which are correlated to the service alarm.

Tunnel Reoptimized

Tunnel reoptimization occurs when a tunnel is up and its route changes but the tunnel continues to remain up. When a TE tunnel is reoptimized to take a different path, the system parses the tunnel reoptimized syslog, if such a syslog is available, and displays this syslog as a ticket.

The "Tunnel Reoptimized" alarm is generated from a syslog message sent by the router.

Layer 2 VPN Faults

This section includes a description of the Layer 2 VPN fault, Pseudo Wire (L2 VPN) MPLS tunnel down.

Pseudo Wire (L2 VPN) MPLS Tunnel Down

A "Pseudo Wire MPLS Tunnel Down" alarm is issued when the pseudo wire link goes down, namely, the pseudo wire tunnel is reported as down from both the devices (based on the status of the tunnel), and the tunnel is not flapping.

The correlation engine identifies various faults that affect the Pseudo Wire tunnel status and reports on them as the root cause for the "Pseudo Wire MPLS Tunnel Down" alarm, for example, Link down.

Cisco ANA traces the LSE path to the edge of the PWE3 tunnel and marks the edges of the tunnel as affected.

The "Pseudo Wire MPLS Tunnel Down" alarm is detected actively by the system, namely, service alarms are generated.

Alarms Summary

The following section describes the alarms that may be displayed in the ticket pane of the Cisco ANA NetworkVision window for VPNs, including their severity and the up alarm for each:

Table 6-1 Alarms Displayed In the Ticket Pane 

Alarm
Default Severity
Description
Up Alarm

BGP Neighbor Loss

Red (critical)

The "BGP Neighbor Loss" alarm is generated whenever BGP connectivity is lost to a specific device.

BGP Neighbor Found

MPLS Black Hole Found

Dark blue (information)

A "MPLS Black Hole Found" alarm is generated whenever Cisco ANA discovers a MPLS interface that has at least one untagged LSP leading to a known PE router.

MPLS Black Hole Cleared

Broken LSP Discovered

Orange (major)

The "MPLS Black Hole Found" alarm activates a backward flow on the specific untagged entry in order to traverse the full path of the LSPs passing through it. The "Broken LSP Discovered" alarm is generated whenever Cisco ANA locates services (VRFs, Pseudo Wire L2 tunnels) along this path that are using these LSPs.

N/A

MPLS TE Tunnel Down

Orange (major)

The "MPLS TE Tunnel Down" alarm is generated whenever a TE tunnel's operational status changes to down and the tunnel is not flapping.

MPLS TE Tunnel Up

MPLS TE Tunnel Flapping

Orange (major)

The "TE Tunnel flapping" alarm is generated whenever multiple up and down alarms are generated during a short time interval and they are suppressed.

Is the last state of the tunnel after it has stopped flapping

Pseudo Wire (L2 VPN) MPLS Tunnel Down

Yellow (minor)

The "Pseudo Wire MPLS Tunnel Down" alarm is generated whenever the pseudo wire link goes down, namely, the pseudo wire tunnel is reported as down from both the devices (based on the status of the tunnel).

Layer 2 Tunnel Up

Tunnel Reoptimized

Dark Blue (information)

The "Tunnel Reoptimized" alarm is generated from a syslog message sent by the router whenever a tunnel is up and its route changes but the tunnel continues to remain up.

N/A


For more information about the ticket pane, see the Cisco Active Network Abstraction NetworkVision User Guide.