Guest

Dynamic Multipoint VPN (DMVPN)

Utilize EEM to Troubleshoot IGP Flaps/Outage over VPN

Cisco - Utilize EEM to Troubleshoot IGP Flaps/Outage over VPN

Document ID: 113696

Updated: Sep 24, 2012

Contributed by Jay Young Taylor, Cisco TAC Engineer.

   Print

Introduction

Many cases are opened with the symptom "EIGRP/OSPF/BGP flaps over my DMVPN/GRE/sVTI tunnel". In order to troubleshoot this issue, the first question that needs to be answered is, "Is this a VPN, Routing Protocol or ISP issue?"

The way this can be tested is to find out if the underlying transport is still functioning correctly during the time of the flap/outage. Unfortunately, this data is usually reviewed post-event and is impossible to determine this piece of data. This document provides information about the use of IP Service Level Agreements (SLAs), track objects and Embedded Event Manager (EEM) in order to collect this information during the time of the issue.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

  • IP SLAs

  • EEM

Components Used

The information in this document is based on Cisco IOS® Software Release 15.2(4)M code on a 881, but any recent code (15.0(1)M or later) will have this support.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

Feature Information

IP SLAs are processes that run on the router in the background that test a varying number of network conditions. In this document general IP connectivity is tested using the "icmp-echo" test.

After that the IP SLA's state is tracked using a track object. Then, using an EEM applet, the state of the network in the syslog buffer can be recorded by taking actions when the track object's state changes.

With the network state included inline with the syslogs, you can retro-actively understand the current state of the network during the flap/outage and determine whether there was a crypto, transport, or IGP issue.

Troubleshooting Methodology

eem-tshoot-igp-01.gif

Two separate SLAs are used to track each layer of IP connectivity:

  • Public IP Address to Public IP address (172.18.3.52 ---> 172.20.5.43)

    ip sla 100
           icmp-echo 172.20.5.43 source-interface FastEthernet4
           frequency 5
       ip sla schedule 100 life forever start-time now
  • Tunnel IP Address to Tunnel IP address (10.1.12.100 ----> 10.1.12.1)

    ip sla 200
           icmp-echo 10.1.12.1 source-interface FastEthernet4
           frequency 5
       ip sla schedule 200 life forever start-time now

These SLAs will send a single ping packet every 5 seconds to the defined peers. If the ping responds the SLA will be marked "OK". If it does not respond it will be marked "Timeout". Then, track objects are used to track the status of the SLA.

  • Public IP Address to Public IP address track

    track 100 ip sla 100
         delay down 15 up 15
  • Tunnel IP Address to Tunnel IP address track

    track 200 ip sla 200
         delay down 15 up 15

When the track object changes, a message can be inserted in the Syslogs.

  • Public IP Address to Public IP address track

    event manager applet ipsla100down 
        event track 100 state down
        action 1.0 syslog msg "Physical SLA probe failed!"
    event manager applet ipsla100up 
       event track 100 state up
       action 1.0 syslog msg "Physical SLA probe came up!"
  • Tunnel IP Address to Tunnel IP address track

    event manager applet ipsla200down 
        event track 200 state down
        action 1.0 syslog msg "Tunnel SLA probe failed!"
    event manager applet ipsla100up 
       event track 200 state up
       action 1.0 syslog msg "Tunnel SLA probe came up!"

Data Analysis

When an outage occurs, collect the output of the show log command.

Look for the SLA messages above.

During the outage, if you see:

  • Both SLAs fail. This means:

    • Layer 3 connectivity across the Internet between the two peers was interrupted. This needs further investigation.

    • There is no problem with the tunnel. It is failing because it is a victim of the interruption above.

  • The Physical SLA does not fail but the Tunnel SLA does. This means:

    • Layer 3 connectivity across the Internet between the two peers is working correctly.

    • There is a problem with the tunnel. Further investigation of the tunnel is necessary.

  • Neither of the SLAs fail. This means:

    • Layer 3 connectivity across the Internet between the two peers is working correctly.

    • Layer 3 unicast connectivity across the Tunnel between the two peers is working correctly.

    • Layer 3 multicast connectivity across the Tunnel is unknown. This can be tested by pinging the multicast address used by the IGP.

    • If the above test works then this indicates an application issue (EIGRP/OSFP/BGP). Further protocol investigation is necessary.

Related Information

Updated: Sep 24, 2012
Document ID: 113696