Many cases are opened with the symptom "EIGRP/OSPF/BGP flaps over my DMVPN/GRE/sVTI tunnel". In order to troubleshoot this issue, the first question that needs to be answered is, "Is this a VPN, Routing Protocol or ISP issue?"
The way this can be tested is to find out if the underlying transport is still functioning correctly during the time of the flap/outage. Unfortunately, this data is usually reviewed post-event and is impossible to determine this piece of data. This document provides information about the use of IP Service Level Agreements (SLAs), track objects and Embedded Event Manager (EEM) in order to collect this information during the time of the issue.
Cisco recommends that you have knowledge of these topics:
The information in this document is based on Cisco IOS® Software Release 15.2(4)M code on a 881, but any recent code (15.0(1)M or later) will have this support.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
IP SLAs are processes that run on the router in the background that test a varying number of network conditions. In this document general IP connectivity is tested using the "icmp-echo" test.
After that the IP SLA's state is tracked using a track object. Then, using an EEM applet, the state of the network in the syslog buffer can be recorded by taking actions when the track object's state changes.
With the network state included inline with the syslogs, you can retro-actively understand the current state of the network during the flap/outage and determine whether there was a crypto, transport, or IGP issue.
Two separate SLAs are used to track each layer of IP connectivity:
These SLAs will send a single ping packet every 5 seconds to the defined peers. If the ping responds the SLA will be marked "OK". If it does not respond it will be marked "Timeout". Then, track objects are used to track the status of the SLA.
When the track object changes, a message can be inserted in the Syslogs.
When an outage occurs, collect the output of the show log command.
Look for the SLA messages above.
During the outage, if you see:
Both SLAs fail. This means:
Layer 3 connectivity across the Internet between the two peers was interrupted. This needs further investigation.
There is no problem with the tunnel. It is failing because it is a victim of the interruption above.
The Physical SLA does not fail but the Tunnel SLA does. This means:
Neither of the SLAs fail. This means:
Layer 3 connectivity across the Internet between the two peers is working correctly.
Layer 3 unicast connectivity across the Tunnel between the two peers is working correctly.
Layer 3 multicast connectivity across the Tunnel is unknown. This can be tested by pinging the multicast address used by the IGP.
If the above test works then this indicates an application issue (EIGRP/OSFP/BGP). Further protocol investigation is necessary.