This document describes how to troubleshoot eBGP (External Border Gateway Protocol) when the session is stuck in active state due to incorrect LPTS (Local Packet Transport Services) entries.
Contributed by William Xu, Cisco TAC Engineer.
Cisco recommends that you have knowledge of these topics:
LPTS for IOS XR
The information in this document is based on ASR9000 (Aggregation Services Router) platforms.
The information in this document was created from devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any commands.
When you configure eBGP, the session can be stuck in active indefinitely if:
There is no update-source command configured
There is a topology change which causes traffic to take a different path
These symptoms present when this issue occurs:
IP addresses are reachable
Both BGP peers remain stuck in active
Packet capture shows that the routers send many TCP resets
show tcp trace error indicates this error for BGP sessions.
Feb 18 09:32:15.393 tcp/error 0/RSP0/CPU0 t9 Lpts set the drop flag for 179 -> 5368, drop packet (pak 0xb1cf80f3) and send a RST
In summary, the root cause of the issue is that LPTS entries are not updated by the routing and forwarding change. It means they remain in a stale state after the topology changes.
There are some enhancements done for BGP. These two scenarios cover more detail about this issue.
Note: iBGP (Internal Border Gateway Protocol) normally does not hit this issue since update-source is always used.
Scenario 1 - Multihop EBGP with Topology Change
You can build a multihop eBGP sessions between ASR9K-1 and ASR9K-3. The peer IP addresses are 188.8.131.52 and 184.108.40.206 at the physical interfaces. There is no update-source command configured. With the current topology, the session stays in the active state. This is expected because both routers will use the interface in subnet 220.127.116.11/24 as the egress interface.
You can shut down the direct link between ASR9K-1 and ASR9K-3. Then, the peer addresses are reachable via ASR9K-2 which is the multihop link, thus ping is successful. The source IP addresses match at both ends, but the BGP session is still in an active state.
When the BGP neighbors are configured, LPTS entries are created according to the CEF (Cisco Express Forwarding) table. For ASR9K-1, IP address 18.104.22.168 is reachable via 22.214.171.124/24 subnet. Therefore, the relevant entries in LPTS are available. It allows BGP neighbor to connect port 179 with local IP address 126.96.36.199. Since it tries to initiate a TCP session from local port 26036, you can see another entry for it.
When the link between ASR9K-1 and ASR9K-3 goes down, the peers are reachable via ASR9K-2 path with a new local source IP address. But the topology change does not trigger the LPTS update. The original entry with port 179 stays with the original local IP address. This prevents the router to allow ingress TCP requests to the new local IP address. Hence, the BGP session at both ends remains stuck in an active state.
Scenario 2 - eBGP with Update Source Address Change
You can deploy an eBGP session between ASR9K-1 and ASR9K-3. The IP addresses are 188.8.131.52 and 184.108.40.206. As per the new plan, you changed the IP addresses to 220.127.116.11 and 18.104.22.168. If you configure eBGP first and then update the IP addresses at the interfaces, the EBGP session is stuck in an active state.
The cause is same as the scenario 1. Once you configure the eBGP session, the LPTS entries are generated according to the local egress interface at that point.
This enhancement introduced from XR release 6.1.1. In this release, when BGP tries to re-establish the session, LPTS updates its entries with the new local IP address . The update time depends on the hold time configuration at both ends. You can still wait for sometimes to see the session up.
Even with this enhancement, a BGP session still can be stuck in an active state if you have configured passive mode. The reason is obvious. If BGP does not try to re-establish the session, the local IP address is not checked. Hence the LPTS entries are not updated.
There is another enhancement for this situation from XR release 6.2.1.
CSCvb15128- BGP session stuck in active while router has Passive BGP mode configured