Guest

IP Routing

WAN Virtualization Using Over The ToP (OTP)

  • Viewing Options

  • PDF (702.2 KB)
  • Feedback

Introduction

With the introduction of Over the ToP (OTP), Cisco has empowered enterprise customers to regain control of their WAN deployments. By focusing on simplicity, OTP helps remove the complexity of the deployment of branch networks utilizing Multiprotocol Label Switching (MPLS) Virtual Private Network (VPN), and adds the ability to utilize lower cost public networks.
Traditional MPLS VPN support deployments consist of a set of sites interconnected by an MPLS provider core network. At each customer site, one or more Customer Edge (CE) devices attach to one or more Provider Edge (PE) devices. MPLS VPN support for Enhanced Interior Gateway Routing Protocol (EIGRP) requires service providers to configure the EIGRP between PE and CE to those customers that require native support for EIGRP. Yet PE/CE deployments offer a number of challenges for enterprise customers, specifically, the following:

• Either EIGRP or Border Gateway Protocol (BGP) must be run between the PE/CE

• Service providers must enterprise routes via Multiprotocol internal BGP (MP-iBGP)

• BGP route propagation impacts enterprise network convergence

• Provider often limits the number of routes being redistributed

• Route flaps within sites and results in BGP convergence events

• Route metric changes result in new extended communities flooded into the core

In addition, the need for the service provider to carry site specific routes mean the CE devices must be co-supported, and the enterprise customer must consider the following:

• Managed services is required, even if not needed

• Control of traffic flow using multiple providers can be problematic

• Changing providers requires coordination of switch over to prevent route loops

OTP simplifies this. With OTP, enterprise customers can view the WAN as a virtual extension of the network and transparently extend their infrastructure OVER the provider's network. The advantage of this approach includes the following:

• No special requirements on the service provider (this is a provider independent solution)

• No special requirements on the enterprise customers network

• Support for both IPv4 and IPv6

• No route redistribution or site tag management

• No limitation on the number of routes being exchanged between sites

• A single routing protocol solution (convergence is not depending on the service provider)

• Works with both traditional managed and non-managed internet connections

• Compliments an L3 any-to-any architecture (optional hair pinning of traffic)

• Support for multiple WAN connections and multiple WAN providers

• Support connections are not part of the MPLS VPN backbone (aka "backdoor" links)

Drawback of Existing Solutions

First, let's recap the challenges/drawbacks of existing options.
Using EIGRP on PE/CE Link:
BGP extended communities to carry EIGRP metric information across MP-BGP cloud. This allows to have internal EIGRP routes end to end, and to compare metric with the backdoor link. Here are the six defined types:

• 0x8801 (AS#, delay)

• 0x8802 (Reliability, Hop #, BW)

• 0x8803 (Load, MTU)

• 0x8804 (Remote AS#, Origin RID)

• 0x8805 (External Protocol, External metric)

• 0x8800 (Route Type, Tag)

BGP cost extended community is used to support a multi-homed site. The cost community has the format stated below:
Cost:POI: ID:value

Cost

Extended Community type, set to 0x4301

Point of Insertion (POI)

Defines at which step in BGP path selection process this attribute is used. Set to 128 for the absolute POI (pre-bestpath)

ID

128 (Internal) or 129 (external)

Value

EIGRP composite metric

The goal is to prefer the iBGP route from originating PE, over the locally redistributed EIGRP route that might be learned in a multi-homed site. Cost community is inserted during EIGRP to BGP redistribution on the originating PE.
On the remote PE, it's compared with the cost community of BGP locally redistributed route from EIGRP (if any). The cost of the locally redistributed EIGRP route is higher since it is incremented while being propagated within the local site between the PEs. The iBGP route is then preferred over the EIGRP route received from 2nd PE and is installed in VRF RIB despite EIGRP getting a lower admin distance (90 < 200) than iBGP (it's done automatically, no need to tune the admin distance).

Note: This routing feedback avoidance mechanism involves only PEs (CEs and customer routers are not involved)

• Site of Origin (SoO)

Since DUAL has no visibility across the MPLS cloud, common parameters between BGP and EIGRP are needed to avoid routing feedback when backdoor links are present. SoO ext community already exists in BGP (used in multihomed site to avoid routing feedback of local routes) SoO support for EIGRP was added in release 12.3(8)T

The goal is to avoid injecting in MPLS/VPN routes from another site (learned through backdoor link):

• SoO marking:

Ingress PE sets SoO in BGP update:

– According to site-map attached to VRF interface if there is no SoO in EIGRP redistributed route

– Equal to SoO value already present in EIGRP redistributed route if any egress PE sets SoO in EIGRP update according to SoO value in BGP update received

• SoO checking:

– During the import process the SoO value in BGP update is checked against the SoO value of the site-map attached to VRF interface. The update is propagated to CE only if there is no match (this check is done regardless of protocol used on PE/CE link).

– At reception of EIGRP update, the SoO value in the EIGRP update is checked against the SoO value of site-map attached to the incoming interface. This update is accepted only if there is no match (this check can optionally be done on backdoor router).

Pros/Cons of the EIGRP on PE/CE Link Option:

• Allows E2E internal EIGRP routes (simplifies the backdoor link scenario)

• WAN connections are 'transparent' for end-customers (there is no other protocol involved)

– Slow convergence (BGP, import process, etc.)

– Backdoor links require SoO filtering

– Very few providers offer EIGRP PE/CE

Using BGP on a PE/CE Link:

If the provider does not support EIGRP, EIGRP customers are typically left using eBGP on PE/CE. This implies that some (basic) BGP skills are needed on the customer side, or they will have to go for a managed CPE solution (where the provider takes care of the CE). BGP easily handles a dual-homed site, however some special care should be taken when there are backdoor links. Indeed, routes learned through MPLS/VPN are all external routes, and can't be compared with internal routes learned through the backdoor link. The solution is to use a separate EIGRP process on the backdoor link and utilize a mutual redistribution with the `campus' EIGRP process.
Pros/Cons of the BGP on PE/CE Link Option:

• Providers friendly (no need for redistribution on PE)

• Easy to manage dual-homed sites

• Slow convergence

• A need to use different EIGRP AS# for backdoor links

• BGP knowledge and skills needed at the customer site

EIGRP OTP Architecture

As the name implies, suppose EIGRP Over the ToP allows the customer to establish EIGRP adjacencies across the MPLS/VPN provider cloud. An EIGRP targeted adjacency between CEs is created. This EIGRP neighborship is done via unicast packets, using the CE 'WAN' IP address. This "over the top" peering allows EIGRP to exchange customer prefixes directly between CEs. Customer prefixes are NOT injected in the providers VRF routing table. In order to allow for proper forwarding of user traffic across the MPLS/VPN cloud, user packets are encapsulated on the CE. The encapsulation header uses the WAN IP address of the CEs, which are known in the MPLS/VPN cloud.

Control Plane

OTP control plane consists in an EIGRP targeted adjacency between CEs. Neighborship is established using the CE WAN address, i.e. address of CE on the PE/CE link, so there is no need for any dynamic routing protocol between the PE/CE. The PE just needs to redistribute the connected routes.
This adjacency is using unicast packets and the CE needs to know the IP of the remote CE. In the first phase of OTP, only static neighbors are allowed. With manual neighbor configuration, it wouldn't scale to establish full mesh peering between all CEs. Instead, the concept of Route Reflector, i.e. CEs peer with RRs only is used and RRs reflect the routes they receive to other CEs. Each CE is configured with the RRs WAN address and each RR is configured in EIGRP promiscuous mode, i.e. to accept incoming 'connections' (similar to BGP listen feature).
RR reflects the routes untouched, i.e. without changing the metric, and keeping next-hop unchanged
(no next-hop-self).
Since the next-hop is not changed (customer site 1 receives an update from the RR with next-hop = customer site 100), the RR doesn't play any role in the data plane, so it could be placed in any site. User traffic does NOT follow control plane traffic; it takes the optimum path in the MPLS/VPN cloud. Below is an entry in the routing table on the CE:
CE#show ip route
...
D 10.1.100.0/24 [90/1536640] via 172.16.100.2, 2w3d, LISP0
It points to a LISP interface (automatically created) and with next-hop learned through EIGRP.
How does OTP handle the challenges described in the Introduction?
a) Dual/multi homed sites are handled by creating a separate EIGRP session with RR (one per connection). DUAL has full visibility of all links so it handles potential routing feedback. Normal EIGRP mechanisms (see here) can be used to select primary/fallback links or to configure load balancing (even unequal cost load sharing is possible).
b) Backdoor connections: DUAL has full visibility on OTP and backdoor connections, so normal EIGRP loop prevention and metric tuning can be used to handle backdoor connections.

Data Plane

Since the customer prefixes are not known in the VRF of provider, customer traffic can't be natively forwarded through the provider cloud, but needs to be encapsulated by CEs before being sent through the provider cloud.
OTP leverages existing LISP encapsulation which:

• Allows dynamic multi-point tunneling

• Provides instance ID field to optionally support virtualization across WAN (see EVN WAN Extension section)

OTP does NOT use LISP control plane (map server/resolver, etc.) instead it uses EIGRP to exchange routes and provide the next-hop, which LISP encapsulation uses to reach remote prefixes.
For a given remote prefix:

encapsulation header source IP = local WAN address

encapsulation header dst IP = next-hop of related EIGRP route.

The diagram below illustrates the encapsulation details of a user packet when travelling across MPLS/VPN cloud, from left to right:

• Left CE adds LISP encapsulation with source IP set to its local WAN IP (172.16.1.2) and destination IP set to the remote CE WAN IP (172.16.100.2)

• Left PE forwards the LISP encapsulation packet by adding the MPLS label(s) used to reach the connected network of remote CE (172.16.100.0/30)

• Packet travels across MPLS cloud using optimal path to reach egress PE (i.e. the packet doesn't path through RR or top PE)

• Right PE removes MPLS label and forwards LISP encapsulated packet to right CE

• Right CE removes the LISP encapsulation and forwards packet to end user

LISP encapsulation uses 36 bytes:

• IP header (20 Bytes)

• UDP header (8 Bytes)

• LISP header (8 Bytes)

• Destination UDP port = 4343

• LISP flags:

– N: set to 1 if there is a nonce

– I: set to 1 if there is instance ID

• Nonce: 24 bits pseudo-randomly generated (anti-spoofing mechanism)

• Instance ID: Not currently supported

Deployment Scenarios

OTP has been designed to operate within several WAN network topologies to fit the needs of different enterprise customers. Support for point-to-point peering, route reflectors to simplify large scale branch office deployment, and data encryption, allows customers to extend full network capability to mobile workers, telecommuters, and remote data centers. In the next few sections the basics related to some of the more common deployments will be covered.

OTP Peering

OTP does not provide for dynamic discovery of other peers. Instead it relies on manual configuration to specify which routers peer with which routers using the "neighbor" option under EIGRP. OTP supports two modes, point-to-point for finer control, and route-refactor for greater scaling.
Point-to-Point Peering: Point-to-point offers the simplest form of configuration within OTP, and allows OTP to form a peer with a targeted router. This option is controlled by the additional "remote" keyword on the neighbor statement. Once the configuration has been entered, EIGRP will begin sending Hello messages to the address specified. When a Hello message is likewise received from the proper address, routes will then be exchanged.
Route Reflector Peering: If the network has many sites, then OTP offers Route Reflectors (RRs) to form a half-mesh topology and ensure connectivity among all sites in the network. A Route Reflector is an EIGRP peer that receives route updates from remote sites and "reflects" the routes to other sites. Route Reflectors are configured using the keyword "unicast-listen". This option enables the Route Reflectors to listen for unicast Hello messages from other sites, and upon receiving the first Hello message, automatically forms a peering relationship. OTP supports the use of dual or multiple Route Reflectors for redundancy.

Site-to-Site traffic

While some customers may desire all traffic to pass thought the hub, it does not represent the most efficient use of bandwidth, and could lead to congestion. To improve OTP's ability to scale to 500 remote sites, OTP can be configured to preserve the next-hop address of the advertising site when routing information is sent to other sites using the "no next-hop-self" configuration under EIGRP. For more information on this command see [ EIGRP Command Reference]
Site Redundancy: The add path support feature enables hubs to advertise multiple best paths to connected sites. A typical OTP deployment would consists of dual hubs (for hub redundancy) connected to more than one service provider (for service-provider redundancy) and provides up to four additional paths to connected sites. This option is configured using the "add-paths" configuration under EIGRP. If, for example there are two spokes (spoke-1 and spoke-2) at a site, and add-path is configured on the hub, both spoke-1 and spoke-2 will be advertised to other sites, thereby allowing for both redundancy (in the event of lost of connectivity to one of the spokes) and load balancing traffic to spoke-1 and spoke-2. For more information on this command see [ Add Path Support in EIGRP]

OTP Over Public Internet

In addition to being able to work over traditional MPLS/VPN managed solutions, OTP has been designed with low cost public networks in mind, by offering GETVPN support. GETVPN is managed encryption without the provisioning and management nightmare, which simplifies the provisioning and management of a VPN connection. Key benefits for pairing OTP and GET VPN includes the following:

Simplifies branch-to-branch instantaneous communications-Ensures low latency and jitter by enabling full-time, direct communications between sites, without requiring transport through a central hub.

Maximizes security-Provides encryption for MPLS networks while maintaining network intelligence, such as full-mesh connectivity, natural routing path, and Quality of Service (QoS).

Complies with governmental regulation and privacy laws-Helps meet security compliance and internal regulation by encrypting all WAN traffic.

Offers management flexibility-Eliminates complex peer-to-peer key management with group encryption keys.

For more information on GET VPN see [ Cisco IOS GETVPN Solution Deployment Guide]

OTP Configuration

Information on how to configure OTP can be found at [ EIGRP Over the Top]. To illustrate some of the configuration and show commands referenced, see the following topology below:
There are three sites, each connected via MPLS/VPN and backdoor connections are created via DMVPN tunnel across the Internet. This example uses OTP across the MPLS/VPN cloud. EIGRP over the DMVPN tunnel is also used. CE-3 is the NHS.

Configuration of Static Neighbor

On all CE devices, define statically the IP address of EIGRP RR(s) and define which interface is the WAN interface (i.e. used to reach RRs and all other CEs) .
router eigrp <name>
address-family ipv4 unicast autonomous-system <as#>
neighbor A.B.C.D <WAN-intf> remote <2-100> lisp-encap [1-1999]

• A.B.C.D: IP address of remote neighbor. Peered typically only with EIGRP RR. To avoid a single point of failure, it is recommend to use 2x RRs. In this case, 2x neighbor commands are needed (one per RR)

• WAN-intf: interface used to reach PE. This determines the source IP of the LISP encapsulation.

• remote <2-100>: determines the TTL value of the EIGRP packets (does NOT influence TTL of LISP packets, they are sent with TTL = 255)

• lisp-encap [1-1999]: enables LISP encapsulates to reach the prefix learned through that peer. By default, OTP uses LISP Top Id 0. If this LISP Top Id is already used on the router, specify which Id OTP should use.

In this setup, define on CE-1 and CE-2 a static neighbor, defining the IP address of RR (CE-3). When the first static neighbor using LISP encapsulation is defined, LISP interface is automatically created:
CE-1(config)#router eigrp OTP
CE-1(config-router)# address-family ipv4 unicast autonomous-system 1
CE-1(config-router-af)#neighbor 172.16.3.2 Ethernet1/0 remote 10 lisp-encap
CE-1(config-router-af)#
*Jul 19 08:42:17.757: %LINEPROTO-5-UPDOWN: Line protocol on Interface LISP0, changed state to up
CE-1(config-router-af)#
*Jul 19 08:42:23.076: %DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.3.2 (Ethernet1/0) is up: new adjacency
CE-1(config-router-af)#
Caution: WAN-intf IP address should be covered by the network statement for static neighbor/OTP to kick in
In this setup, use 'network 10.0.0.0' (to cover inside networks) and 'network 172.16.0.0' (to cover WAN addresses) on all CEs.
The bare minimum to start OTP on CE-1 is to have the network statement including 172.16.1.2 (local WAN IP address).

Configuration of Route-Reflector

RRs should be configured in EIGRP promiscuous mode, specifying the interface to use as a source IP (the IP that should be configured as the static neighbor on all CEs). This EIGRP feature is similar to BGP 'listen' feature and provides similar options.
router eigrp <name>
address-family ipv4 unicast autonomous-system <as#>
remote-neighbors source <WAN-intf> unicast-listen lisp-encap [1-1999] [allow-list <acl-name>] [max-neighbors <1-65535>]

• WAN-intf: interface used to reach PE

• lisp-encap [1-1999]: enables LISP encapsulates to reach the prefix learned through dynamic peers. By default, OTP uses LISP Top Id 0. If this LISP Top Id is already used on the router, specify which Id OTP should be used

• (optional) allow-list: uses a standard named acl to specify which remote peers could establish a peering

• max-neighbors: allow to specify a maximum number of peers the RR will accept

In this setup, CE-3 is defined as RR:
CE-3(config)#router eigrp OTP
CE-3(config-router)# address-family ipv4 unicast autonomous-system 1
CE-3(config-router-af)#remote-neighbors source Ethernet0/0 unicast-listen lisp-encap
CE-3(config-router-af)#
*Jul 19 10:56:36.324: %LINEPROTO-5-UPDOWN: Line protocol on Interface LISP0, changed state to up
Caution: The RR should be configured with no next-hop-self and with split-horizon disabled (it's NOT done automatically). This config is done in EIGRP af-interface config mode for WAN-intf:
router eigrp <name>
address-family ipv4 unicast autonomous-system <as#>
af-interface WAN-intf
no next-hop-self
no split-horizon

Configuration of Session Authentication

Configure SHA2 or MD5 authentication to secure remote peers. The key chain is defined in global config mode while authentication mode and key chain reference is done on the EIGRP af-interface config mode for WAN-intf:
key chain <name>
key 1
key-string <string>
!
router eigrp <name>
address-family ipv4 unicast autonomous-system <as#>
af-interface WAN-intf
authentication mode <md5|hmac-sha-256>
authentication key-chain <name>
In this setup, configure the below key chain on all CEs:
CE-1(config)#key chain OTP
CE-1(config-keychain)# key 1
CE-1(config-keychain-key)# key-string CISCO
And enable MD5 authentication with that key on the WAN-intf:
CE-1(config)#router eigrp OTP
CE-1(config-router)# address-family ipv4 unicast autonomous-system 1
CE-1(config-router-af)# af-interface Ethernet1/0
CE-1(config-router-af-interface)# authentication mode md5
CE-1(config-router-af-interface)# authentication key-chain OTP

Tuning Routes Learned/Sent Through OTP

Use all usual EIGRP mechanisms to tune routes received or sent through OTP.

• Summarization, filtering, offset-lists, etc. are associated with the WAN-intf:

router eigrp <name>
address-family ipv4 unicast autonomous-system <as#>
af-interface WAN-intf
summary-address A.B.C.D/nn [leak-map]
exit-af-interface
!
topology base
distribute-list prefix <name> in|out WAN-intf
offset-list <acl> in|out <0-2147483647> WAN-intf

• Interface bandwidth/delay can be tuned via LISP0 interface:

interface lisp0
bandwidth <1-10000000>
delay <1-16777215>
Unlike regular EIGRP rules, metric is modified before being sent out to the OTP neighbor and the receiving router does NOT modify metric. This means delay needs to be modified on the egress CE LISP interface (not on ingress CE)! Default bandwidth and delay of LISP0 interface are really low, making the path through OTP not very attractive from a metric point of view:
CE-1#show interface lisP 0
LISP0 is up, line protocol is up
Hardware is LISP
Interface is unnumbered. Using address of Ethernet1/0 (172.16.1.2)
MTU 17940 bytes, BW 56 Kbit/sec , DLY 5000 usec ,
reliability 5/255, txload 4/255, rxload 4/255
Encapsulation LISP, loopback not set
Keepalive set (10 sec)
Last input never, output never, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/0 (size/max)
5 minute input rate 1000 bits/sec, 1 packets/sec
5 minute output rate 1000 bits/sec, 1 packets/sec
105 packets input, 10500 bytes, 0 no buffer
Received 0 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
105 packets output, 14280 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 unknown protocol drops
0 output buffer failures, 0 output buffers swapped out
CE-1#

Outputs From Setup

In this example setup, EIGRP is peering through OTP and DMVPN. On CE-1, there is peering with the local site router (CE-1-IN):
CE-1#show eigrp address-family neighbors detail
EIGRP-IPv4 VR(OTP) Address-Family Neighbors for AS(1)
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
1 172.16.3.2 Et1/0 13 00:45:33 1 100 0 86
^^^^^^^^^^
---- peering to OTP RR
Remote Static neighbor (static multihop) (LISP Encap)
Time since Restart 00:44:40
Version 15.0/2.0, Retrans: 0, Retries: 0, Prefixes: 3
Topology-ids from peer - 0
2 10.200.1.3 Tu0 11 00:48:16 1 100 0 78 <----- peering to DMVPN
Version 15.0/2.0, Retrans: 0, Retries: 0, Prefixes: 4
Topology-ids from peer - 0
0 10.1.1.2 Et0/0 14 02:00:43 1 100 0 51
Version 14.0/2.0, Retrans: 1, Retries: 0
Topology-ids from peer - 0
CE-1#
CE-3, which is the EIGRP RR and the DMVPN NHS, gets peering to CE-1 and CE-2:
CE-3#show eigrp address-family neighbors detail
EIGRP-IPv4 VR(OTP) Address-Family Neighbors for AS(1)
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
1 172.16.2.2 Et0/0 12 00:50:31 1 100 0 44
Remote neighbor (unicast-listen) (LISP Encap)
Time since Restart 00:49:41
Version 15.0/2.0, Retrans: 0, Retries: 0, Prefixes: 3
Topology-ids from peer - 0
0 172.16.1.2 Et0/0 11 00:50:34 1 100 0 83
Remote neighbor (unicast-listen) (LISP Encap)
Time since Restart 00:49:41
Version 15.0/2.0, Retrans: 0, Retries: 0, Prefixes: 3
Topology-ids from peer - 0
2 10.200.1.1 Tu0 14 00:53:17 1 100 0 77
Version 15.0/2.0, Retrans: 0, Retries: 0, Prefixes: 2
Topology-ids from peer - 0
3 10.200.1.2 Tu0 13 3w2d 1 100 0 40
Version 15.0/2.0, Retrans: 0, Retries: 0, Prefixes: 2
Topology-ids from peer - 0
CE-3#
Looking at the EIGRP topology table on CE-3 for the prefix learned from CE-1, it was learned via OTP and the DMVPN tunnel:
CE-3#show eigrp address-family ipv4 topology 10.1.1.0/24
EIGRP-IPv4 VR(OTP) Topology Entry for AS(1)/ID(10.1.3.1) for 10.1.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 3407872000, RIB is 26624000
Descriptor Blocks:
10.200.1.1 (Tunnel0), from 10.200.1.1, Send flag is 0x0 <----- route learned from CE-1 via DMVPN
Composite metric is (3407872000/131072000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 51000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 1
Originating router is 172.16.1.2
ECMP Mode: Advertise by default
172.16.2.2 (Ethernet0/0), from 172.16.2.2, Send flag is 0x0
^^^^^^^^^^ ----- route learned from CE-2 via OTP
Composite metric is (18715209142/18649673142), route is Internal
Vector metric:
Minimum bandwidth is 56 Kbit
Total delay is 107000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 3
Originating router is 172.16.1.2
172.16.1.2 (Ethernet0/0), from 172.16.1.2, Send flag is 0x0
^^^^^^^^^^ ----- route learned from CE-1 via OTP
Composite metric is (12161609142/12096073142), route is Internal
Vector metric:
Minimum bandwidth is 56 Kbit
Total delay is 7000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 1
Originating router is 172.16.1.2
CE-3#
With default bandwidth and delay of the LISP0 interface, OTP routes are less preferable than routes learned via the DMVPN tunnel. To make OTP route for 10.1.1.0/24 more preferable, change the delay/bandwidth on the CE-1 LISP0 interface:
CE-1(config)#interface LISP0
CE-1(config-if)#bandwidth 10000
CE-1(config-if)#delay 1000
CE-1(config-if)#
CE-1#clear ip eigrp neighbors
CE-1#
CE-3#show eigrp address-family ipv4 topology 10.1.1.0/24
EIGRP-IPv4 VR(OTP) Topology Entry for AS(1)/ID(10.1.3.1) for 10.1.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 851968000, RIB is 6656000
Descriptor Blocks:
172.16.1.2 (Ethernet0/0), from 172.16.1.2 , Send flag is 0x0
^^^^^^^^^^ ----- route from CE-1 via OTP preferred!
Composite metric is (851968000/786432000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 12000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 1
Originating router is 172.16.1.2
ECMP Mode: Advertise by default
10.200.1.2 (Tunnel0), from 10.200.1.2, Send flag is 0x0
Composite metric is (4128768000/851968000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 62000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 2
Originating router is 172.16.1.2
10.200.1.1 (Tunnel0), from 10.200.1.1, Send flag is 0x0
Composite metric is (3407872000/131072000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 51000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 1
Originating router is 172.16.1.2
CE-3#

Note: Change the bandwidth/delay on lisp interface of CE-2 and CE-3 as well as using the OTP path to reach all destinations.

Looking at the CE-2 topology, notice the route via OTP is not received:
CE-2#show eigrp address-family ipv4 topology 10.1.1.0/24
EIGRP-IPv4 VR(OTP) Topology Entry for AS(1)/ID(10.1.2.1) for 10.1.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 4128768000, RIB is 32256000
Descriptor Blocks:
10.200.1.3 ( Tunnel0 ), from 10.200.1.3, Send flag is 0x0
Composite metric is (4128768000/851968000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 62000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 2
Originating router is 172.16.1.2
CE-2#
It's because RR is not configured automatically with split-horizon disabled:
CE-3#show eigrp address-family interfaces detail e0/0
EIGRP-IPv4 VR(OTP) Address-Family Interfaces for AS(1)
Xmit Queue PeerQ Mean Pacing Time Multicast Pending
Interface Peers Un/Reliable Un/Reliable SRTT Un/Reliable Flow Timer Routes
Et0/0 2 0/0 0/0 1 0/3 50 0
Hello-interval is 5, Hold-time is 15
Split-horizon is enabled
Next xmit serial <none>
Packetized sent/expedited: 97/24
Hello's sent/expedited: 461508/29
Un/reliable mcasts: 0/0 Un/reliable ucasts: 145/144
Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 2
Retransmissions sent: 3 Out-of-sequence rcvd: 0
Topology-ids on interface - 0
Authentication mode is md5, key-chain is "OTP"
CE-3#
Once EIGRP split-horizon on e0/0 on CE-3 is disabled:
CE-3(config)#router eigrp OTP
CE-3(config-router)# address-family ipv4 unicast autonomous-system 1
CE-3(config-router-af)# af-interface Ethernet0/0
CE-3(config-router-af-interface)# no split-horizon
CE-3(config-router-af-interface)#
%DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.2.2 (Ethernet0/0) is resync: split horizon changed
%DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.1.2 (Ethernet0/0) is resync: split horizon changed
Now the OTP route on CE-2 is received:
CE-2#show eigrp address-family ipv4 topology 10.1.1.0/24
EIGRP-IPv4 VR(OTP) Topology Entry for AS(1)/ID(10.1.2.1) for 10.1.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 851968000, RIB is 6656000
Descriptor Blocks:
172.16.3.2 (Ethernet0/0), from 172.16.3.2, Send flag is 0x0
Composite metric is (851968000/786432000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 12000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 1
Originating router is 172.16.1.2
10.200.1.3 (Tunnel0), from 10.200.1.3, Send flag is 0x0
Composite metric is (4128768000/851968000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 62000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 2
Originating router is 172.16.1.2
CE-2#
The metric on CE-2 is the same as on CE-3 (the RR doesn't change the metric of reflected routes) but the next-hop had been reset (172.16.3.2 = WAN-intf of CE-3). This leads to sub-optimal routing since it forces packets to flow (via LISP encaps) up to CE-3 and then down to CE-1:
CE-2#traceroute 10.1.1.2
Type escape sequence to abort.
Tracing the route to 10.1.1.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.1.3.1 1 msec 0 msec 1 msec <----- CE-3 (address allocated to LISP0)
2 172.16.1.2 [AS 1] 1 msec 1 msec 1 msec <----- CE-1 (address allocated to LISP0)
3 10.1.1.2 1 msec 1 msec *
CE-2#
To get optimal path, we need to configure RR to not reset the next-hop:
CE-3(config)#router eigrp OTP
CE-3(config-router)# address-family ipv4 unicast autonomous-system 1
CE-3(config-router-af)# af-interface Ethernet0/0
CE-3(config-router-af-interface)# no next-hop-self
%DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.1.2 (Ethernet0/0) is down: next_hop_self value changed
%DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.2.2 (Ethernet0/0) is down: next_hop_self value changed
%DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.1.2 (Ethernet0/0) is up: new adjacency
%DUAL-5-NBRCHANGE: EIGRP-IPv4 1: Neighbor 172.16.2.2 (Ethernet0/0) is up: new adjacency
CE-3(config-router-af-interface)#
Then CE-1 as next-hop on CE-2 is seen and traffic flows directly to CE-1:
CE-3#show eigrp address-family interfaces detailail e0/0
EIGRP-IPv4 VR(OTP) Address-Family Interfaces for AS(1)
Xmit Queue PeerQ Mean Pacing Time Multicast Pending
Interface Peers Un/Reliable Un/Reliable SRTT Un/Reliable Flow Timer Routes
Et0/0 2 0/0 0/0 1 0/3 50 0
Hello-interval is 5, Hold-time is 15
Split-horizon is disabled
Next xmit serial <none>
Packetized sent/expedited: 104/27
Hello's sent/expedited: 461890/31
Un/reliable mcasts: 0/0 Un/reliable ucasts: 154/153
Mcast exceptions: 0 CR packets: 0 ACKs suppressed: 3
Retransmissions sent: 3 Out-of-sequence rcvd: 0
Next-hop-self disabled, next-hop info forwarded
Topology-ids on interface - 0
Authentication mode is md5, key-chain is "OTP"
CE-3#
CE-2#show eigrp address-family ipv4 topology 10.1.1.0/24
EIGRP-IPv4 VR(OTP) Topology Entry for AS(1)/ID(10.1.2.1) for 10.1.1.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 851968000, RIB is 6656000
Descriptor Blocks:
172.16.1.2 (Ethernet0/0 ), from 172.16.3.2 , Send flag is 0x0
Composite metric is (851968000/786432000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 12000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 1
Originating router is 172.16.1.2
10.200.1.3 (Tunnel0), from 10.200.1.3, Send flag is 0x0
Composite metric is (4128768000/851968000), route is Internal
Vector metric:
Minimum bandwidth is 10000 Kbit
Total delay is 62000000000 picoseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1436
Hop count is 2
Originating router is 172.16.1.2
CE-2#
CE-2#traceroute 10.1.1.2
Type escape sequence to abort.
Tracing the route to 10.1.1.2
VRF info: (vrf in name/id, vrf out name/id)
1 172.16.1.2 [AS 1] 1 msec 1 msec 1 msec <--- go directly (via LISP encaps) to CE-1
2 10.1.1.2 1 msec 0 msec *
CE-2#
Take a look now at the RIB/CEF table:
CE-2#show ip route 10.1.1.0
Routing entry for 10.1.1.0/24
Known via "eigrp 1", distance 90, metric 6656000, type internal
Redistributing via eigrp 1
Last update from 172.16.1.2 on LISP0, 00:09:49 ago
Routing Descriptor Blocks:
* 172.16.1.2 , from 172.16.3.2, 00:09:49 ago, via LISP0
Route metric is 6656000, traffic share count is 1
Total delay is 12000 microseconds, minimum bandwidth is 10000 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1
CE-2#
CE-2#show ip cef 10.1.1.0 internal
10.1.1.0/24, epoch 0, RIB[I], refcount 5, per-destination sharing
sources: RIB
feature space:
IPRM: 0x00028000
ifnums:
LISP0 (13): 172.16.1.2
path B2CC1260, path list B2D9A1EC, share 1/1, type attached nexthop, for IPv4
nexthop 172.16.1.2 LISP0, adjacency IP midchain out of LISP0, addr 172.16.1.2 B2AF5470
output chain: IP midchain out of LISP0, addr 172.16.1.2 B2AF5470 IP adj out of Ethernet0/0, addr 172.16.2.1 B0F02FD0
CE-2#
CE-2#show adjacency lisP 0 internal
Protocol Interface Address
IP LISP0 172.16.1.2(6) <----- Adjacency of CE-1
0 packets, 0 bytes
epoch 0
sourced in sev-epoch 1
Encap length 36
4500000000004000FF1120C8AC100202
AC100102000010F7000000008088C58B
00000000
L2 destination address byte offset 0
L2 destination address byte length 0
Link-type after encap: ip
LISP
Next chain element:
IP adj out of Ethernet0/0, addr 172.16.2.1
parent oce 0xB0F03040
frame originated locally (Null0)
L3 mtu 1464
mtu update from interface suppressed
Flags (0x4808E6)
Fixup disabled
HWIDB/IDB pointers 0xB273FFC0/0xB274DA78
IP redirect disabled
Switching vector: IPv4 midchain adj oce
Post encap features: LISP
LISP source RLOC 172.16.2.2
term adj IP adj out of Ethernet0/0, addr 172.16.2.1
LISP stack to 172.16.1.2 in Default (0x0)
nh tracking enabled: 172.16.1.2/32
IP adj out of Ethernet0/0, addr 172.16.2.1
nexthop adj observers:
- LISP ios adj mgr
Adjacency pointer 0xB2AF5470
Next-hop 172.16.1.2
IP LISP0 172.16.3.2 (6) <----- Adjacency of CE-3
0 packets, 0 bytes
epoch 0
sourced in sev-epoch 1
Encap length 36
4500000000004000FF111EC8AC100202
AC100302000010F700000000804B4295
00000000
L2 destination address byte offset 0
L2 destination address byte length 0
Link-type after encap: ip
LISP
Next chain element:
IP adj out of Ethernet0/0, addr 172.16.2.1
parent oce 0xB0F03040
frame originated locally (Null0)
L3 mtu 1464
mtu update from interface suppressed
Flags (0x4808E6)
Fixup disabled
HWIDB/IDB pointers 0xB273FFC0/0xB274DA78
IP redirect disabled
Switching vector: IPv4 midchain adj oce
Post encap features: LISP
LISP source RLOC 172.16.2.2
term adj IP adj out of Ethernet0/0, addr 172.16.2.1
LISP stack to 172.16.3.2 in Default (0x0)
nh tracking enabled: 172.16.3.2/32
IP adj out of Ethernet0/0, addr 172.16.2.1
nexthop adj observers:
- LISP ios adj mgr
Adjacency pointer 0xB2AF5730
Next-hop 172.16.3.2
CE-2#
In the CEF adjacency table, the pre-built LISP encapsulation for CE-1 is:
4500000000004000FF1120C8AC100202
AC100102000010F7000000008088C58B
00000000

Decode:
IP -> 4500000000004000FF1120C8AC100202
AC100102
flags: 0x4 -> DF bit set (see MTU/Fragmentation Issues)
TTL: 0xFF -> 255
Src IP: 0xAC100202 -> 172.16.2.2 (local WAN-intf IP of CE-2)
Dst IP: 0xAC100102 -> 172.16.1.2 (WAN-intf of CE-1)
UDP -> 000010F700000000
Src Port: 0x0000 (not predefined - IOS uses same source port for a given adjacency)
Dst port: 0x10F7 -> 4343 (Dst port is fixed)
LISP -> 8088C58B00000000
flags: 0x80 -> N bit set (Nounce bit)
Nounce: 0x88C58B (pseudo-randomly generated and different for each adjacency;
neighbor computes its own nounce for traffic in other direction)
EPC (Embedded Packet Capture) can now be used to capture LISP encapsulated packets:
PE1#monitor capture buffer CAP size 10000 max-size 1000
(-> set buffer size in KB and max-size of capture packets, default = 68 Bytes)
PE1#monitor capture point ip cef E0/0-IN-OUT e0/0 both
PE1#monitor capture point associate E0/0-IN-OUT CAP
PE1#monitor capture point start E0/0-IN-OUT
Now, perform some pings from CE-1-IN to CE-2 and CE-3 to generate traffic:
CE-1-IN#ping 10.1.2.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
CE-1-IN#ping 10.1.3.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.1.3.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
CE-1-IN#
Next, stop the capture on PE1 and export the buffer to a file:
PE1#monitor capture point stop E0/0-IN-OUT
PE1#monitor capture buffer CAP export unix:capture.pcap
The following table provides informational decodes of some packets:

Echo request from CE-1 to CE-2 (captured on PE1 e0/0 in)
Echo reply from CE-2 to CE-1 (captured on PE1 e0/0 out)
Internet Protocol Version 4, Src: 172.16.1.2 (172.16.1.2), Dst: 172.16.2.2 (172.16.2.2)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 136
Identification: 0x00ba (186)
Flags: 0x02 (Don't Fragment)
Fragment offset: 0
Time to live: 254
Protocol: UDP (17)
Header checksum: 0x2086 [correct]
Source: 172.16.1.2 (172.16.1.2)
Destination: 172.16.2.2 (172.16.2.2)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: intellistor-lm (1539), Dst Port: unicall (4343)
Source port: intellistor-lm (1539)
Destination port: unicall ( 4343 )
Length: 116
Checksum: 0x0000 (none)
Locator/ID Separation Protocol (Data)
Flags: 0x80
1... .... = N bit (Nonce present): Set
.0.. .... = L bit (Locator-Status-Bits field enabled): Not set
..0. .... = E bit (Echo-Nonce-Request): Not set
...0 .... = V bit (Map-Version present): Not set
.... 0... = I bit (Instance ID present): Not set
.... .000 = Reserved: 0x00
Nonce: 8973467 (0x88ec9b)
Internet Protocol Version 4, Src: 10.1.1.2 (10.1.1.2), Dst: 10.1.2.1 (10.1.2.1)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 100
Identification: 0x00aa (170)
Flags: 0x00
Fragment offset: 0
Time to live: 254
Protocol: ICMP (1)
Header checksum: 0xa4ea [correct]
Source: 10.1.1.2 (10.1.1.2)
Destination: 10.1.2.1 (10.1.2.1)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Internet Control Message Protocol
Internet Protocol Version 4, Src: 172.16.2.2 (172.16.2.2), Dst: 172.16.1.2 (172.16.1.2)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 136
Identification: 0x009c (156)
Flags: 0x02 (Don't Fragment)
Fragment offset: 0
Time to live: 252
Protocol: UDP (17)
Header checksum: 0x22a4 [correct]
Source: 172.16.2.2 (172.16.2.2)
Destination: 172.16.1.2 (172.16.1.2)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
User Datagram Protocol, Src Port: intellistor-lm (1539), Dst Port: unicall (4343)
Source port: intellistor-lm (1539)
Destination port: unicall ( 4343 )
Length: 116
Checksum: 0x0000 (none)
Locator/ID Separation Protocol (Data)
Flags: 0x80
1... .... = N bit (Nonce present): Set
.0.. .... = L bit (Locator-Status-Bits field enabled): Not set
..0. .... = E bit (Echo-Nonce-Request): Not set
...0 .... = V bit (Map-Version present): Not set
.... 0... = I bit (Instance ID present): Not set
.... .000 = Reserved: 0x00
Nonce: 8963467 (0x88c58b)
Internet Protocol Version 4, Src: 10.1.2.1 (10.1.2.1), Dst: 10.1.1.2 (10.1.1.2)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 100
Identification: 0x00aa (170)
Flags: 0x00
Fragment offset: 0
Time to live: 255
Protocol: ICMP (1)
Header checksum: 0xa3ea [correct]
Source: 10.1.2.1 (10.1.2.1)
Destination: 10.1.1.2 (10.1.1.2)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Internet Control Message Protocol

Note: The Identification field in the IP header of LISP encapsulation is incremented for each packet -> this allows for detecting packet loss in a capture.

Finally, below is the capture of a targeted EIGRP packet from CE-1 to CE-3 (RR):
EIGRP targeted packet (captured on PE-1 e0/0 in)
Internet Protocol Version 4, Src: 172.16.1.2 (172.16.1.2), Dst: 172.16.3.2 (172.16.3.2)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0xc0 (DSCP 0x30: Class Selector 6 ; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
Total Length: 100
Identification: 0x0000 (0)
Flags: 0x00
Fragment offset: 0
Time to live: 9
Protocol: EIGRP (88)
Header checksum: 0x545e [correct]
Source: 172.16.1.2 (172.16.1.2)
Destination: 172.16.3.2 (172.16.3.2)
[Source GeoIP: Unknown]
[Destination GeoIP: Unknown]
Cisco EIGRP
Version: 2
Opcode: Hello (5)
Checksum: 0x8688 [correct]
Flags: 0x00000000
Sequence: 0
Acknowledge: 0
Virtual Router ID: 0 (Address-Family)
Autonomous System: 1
Authentication MD5
Parameters
Software Version: EIGRP=15.0, TLV=2.0
Targeted EIGRP packets are unicasted, using TTL configured in neighbor command.
They are marked with IP precedence 6 (CS6) as any EIGRP packet.

MTU and Fragmentation Issues

Since OTP adds an extra header (36 bytes), it needs to deal with potential MTU/fragmentation issues. The DF bit is always set in LISP encapsulation. This is to prohibit the re-assembly operation on the egress CE. The idea here is to force fragmentation before encapsulation, so re-assembly is done by end-users. For the ingress CE to be able to perform fragmentation before encapsulation, it needs to know the max MTU that can go through the provider cloud with OTP encapsulation.
This is hopefully done automatically if the MTU of the WAN interface is supported end to end across the provider cloud.
If this is not the case (i.e. there are lower MTU links within the provider cloud), change manually the IP MTU of the WAN interface to match the lowest MTU within the provider cloud. Otherwise, the PMTUD is broken for end-users and this may lead to connectivity issues over OTP.

Note: Check the calculated max mtu by looking at the CEF adjacencies on the LISP interface. In the case below, the WAN-intf gets 1500 MTU, so L3 mtu = 1464 (1500 - 36):

CE#show adjacency lisP 0 int | i mtu
L3 mtu 1464
mtu update from interface suppressed
What About Big Packets Across Setup?
If CE-2 is pinged with a 1500 bytes packet from CE-1-IN, without DF bit set, it fails:
CE-1-IN#ping 10.1.2.1 size 1500
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
CE-1-IN#
This is because the MTU of CE-1 WAN-intf (1500) is not supported end to end through the provider cloud. The MPLS interface has a 500 bytes MTU, so end-to-end MTU with 2x MPLS labels is 1492. When the CE-1 receives a 1500 packet, it fragments it according to L3 mtu of CEF adjacency of next-hop (1464):
CE-1#show ip cef 10.1.2.1
10.1.2.0/24
nexthop 172.16.2.2 LISP0
CE-1#
CE-1#show adjacency lisP 0 in
Protocol Interface Address
IP LISP0 172.16.2.2 (6)
144 packets, 86476 bytes
epoch 0
sourced in sev-epoch 8
Encap length 36
4500000000004000FF1120C8AC100102
AC100202000010F7000000008088EC9B
00000000
L2 destination address byte offset 0
L2 destination address byte length 0
Link-type after encap: ip
LISP
Next chain element:
IP adj out of Ethernet1/0, addr 172.16.1.1
parent oce 0xB0F3B6F0
frame originated locally (Null0)
L3 mtu 1464
mtu update from interface suppressed
Flags (0x4808E6)
Fixup disabled
HWIDB/IDB pointers 0xB172E300/0xB22E8288
IP redirect disabled
Switching vector: IPv4 midchain adj oce
Post encap features: LISP
LISP stack to 172.16.2.2 in Default (0x0)
nh tracking enabled: 172.16.2.2/32
IP adj out of Ethernet1/0, addr 172.16.1.1
nexthop adj observers:
- LISP ios adj mgr
LISP source RLOC 172.16.1.2
term adj IP adj out of Ethernet1/0, addr 172.16.1.1
Adjacency pointer 0xB1908598
Next-hop 172.16.2.2
The big fragment (1500 bytes) then gets dropped by PE1 since it exceeds MTU.
Running 'debug ip icmp' on CE-1 confirms this:
CE-1#deb ip icmp
ICMP packet debugging is on
CE-1#
ICMP: dst (172.16.1.2) frag. needed and DF set unreachable rcv from 172.16.1.1 mtu:1492
CE-1#
ICMP: dst (172.16.1.2) frag. needed and DF set unreachable rcv from 172.16.1.1 mtu:1492
CE-1#
Any packet (without a DF bit) bigger than 1464 will be dropped in the same manner.
Only the packets that can pass through with LISP encapsulates without fragmentation are successful: 1492 - 36 = 1456 bytes
CE-1-IN#ping 10.1.2.1 size 1456
Type escape sequence to abort.
Sending 5, 1456-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms
CE-1-IN#
Packets between 1457 and 1464 are not fragmented by CE-1 but are also dropped by PE1:
CE-1-IN#ping 10.1.2.1 size 1457
Type escape sequence to abort.
Sending 5, 1457-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
CE-1-IN#
In practice, most applications use PMTUD so packets are sent with a DF bit to allow for the end-user to adapt MTU of the connection when receiving ICMP type/code 3/4 (packets that are too big).
PMTUD will NOT work properly if there is a mismatch between the MTU of the WAN-intf and end-to-end MTU.
In this setup, when an applications using PMTUD needs to send a packet of 1500 bytes, it will receive back a ICMP packet that is too big from CE-1 with mtu=1464.
CE-1-IN#deb ip icmp
ICMP packet debugging is on
CE-1-IN#ping 10.1.2.1 size 1500 df
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
Packet sent with the DF bit set
M
ICMP: dst (10.1.1.2) frag. needed and DF set unreachable rcv from 10.1.1.1 mtu:1464.M
ICMP: dst (10.1.1.2) frag. needed and DF set unreachable rcv from 10.1.1.1 mtu:1464.M
Success rate is 0 percent (0/5)
CE-1-IN#
ICMP: dst (10.1.1.2) frag. needed and DF set unreachable rcv from 10.1.1.1 mtu:1464
CE-1-IN#
It will then adapt the MTU of the connection to 1464 and these packets will be dropped on PE1. The ICMP 3/4 generated by PE1 to CE-1 will not adapt the l3 mtu of adjacency (the mtu update from the interface suppressed), i.e. there is no mechanism like such as tunnel PMTUD for LISP. The real MTU of the connection will then not be sent to the end-user.
CE-1-IN#ping 10.1.2.1 size 1464 df
Type escape sequence to abort.
Sending 5, 1464-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
Packet sent with the DF bit set
.....
Success rate is 0 percent (0/5)
CE-1-IN#

Note: The ICMP 3/4 received by CE-1-IN!

CE-1#deb ip icmp
ICMP packet debugging is on
CE-1#
ICMP: dst (172.16.1.2) frag. needed and DF set unreachable rcv from 172.16.1.1 mtu:1492
ICMP: dst (172.16.1.2) frag. needed and DF set unreachable rcv from 172.16.1.1 mtu:1492
CE-1#show adjacency lisp0 in | i mtu
L3 mtu 1464
mtu update from interface suppressed
The PMTUD is broken and the application will fail. The solution is to manually adapt ip mtu of the WAN-intf :
CE-1(config)#int e1/0
CE-1(config-if)#ip mtu 1492
CE-1(config-if)#
CE-1#show adjacency lisp0 in | i mtu
L3 mtu 1456
mtu update from interface suppressed
End-users then receive the correct MTU to avoid fragmentation through OTP cloud:
CE-1-IN#ping 10.1.2.1 size 1500 df-bit
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 10.1.2.1, timeout is 2 seconds:
Packet sent with the DF bit set
M
*Jul 24 10:58:11.370: ICMP: dst (10.1.1.2) frag. needed and DF set unreachable rcv from 10.1.1.1 mtu:1456 .M
*Jul 24 10:58:13.373: ICMP: dst (10.1.1.2) frag. needed and DF set unreachable rcv from 10.1.1.1 mtu:1456.M
Success rate is 0 percent (0/5)
CE-1-IN#
*Jul 24 10:58:15.374: ICMP: dst (10.1.1.2) frag. needed and DF set unreachable rcv from 10.1.1.1 mtu:1456
CE-1-IN#