GSR
IOS 12.0(32)SY8
用户发现两个IBGP邻居不停的flapping,有一个明显的规律是bgp 邻居建立起来之后经过大约5分钟的时间就会因为holdtimer超时而down掉,然后邻居又会马上建立起来。
从log中我们可以发现这个IBGP邻居断掉以及重建的规律。每次都是因为holdtimer超时,并且是因为对端收不到本端发出去keepalive报文。
Dec 6 13:28:36: %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up Dec 6 13:33:55: %BGP-3-NOTIFICATION: received from neighbor 2.2.2.2 4/0 (holdtime expired) 0 bytes Dec 6 13:33:55: %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Down BGP Notification received Dec 6 13:34:22: %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up Dec 6 13:39:37: %BGP-3-NOTIFICATION: received from neighbor 2.2.2.2 4/0 (holdtime expired) 0 bytes Dec 6 13:39:37: %BGP-5-ADJCHANGE: neighbor 2.2.2.2 Down BGP Notification received
R1# show ip bgp vpnv all neighbors 2.2.2.2
BGP neighbor is 2.2.2.2, remote AS 65350, internal link
Description: To_ R2
Member of peer-group NXVRRgroup for session parameters
BGP version 4, remote router ID 202.100.126.219
BGP state = Established, up for 00:00:56
Last read 00:00:51, last write 00:00:56, hold time is 180, keepalive interval is 60
seconds
Neighbor capabilities:
Route refresh: advertised and received(new)
Four-octets ASN Capability: advertised and received
Address family VPNv4 Unicast: advertised and received
Message statistics:
InQ depth is 0
OutQ depth is 0
Sent Rcvd
Opens: 35 35
Notifications: 2 28
Updates: 935784 467
Keepalives: 133137 147643
Route Refresh: 0 1
Total: 1068931 148175
Default minimum time between advertisement runs is 0 seconds
For address family: VPNv4 Unicast
BGP table version 1316545, neighbor version 0/0
Output queue size : 0
Index 3, Offset 0, Mask 0x8
Route-Reflector Client
Member of update-group 3
NXVRRgroup peer-group member
NEXT_HOP is always this router
Sent Rcvd
Prefix activity: ---- ----
Prefixes Current: 3591 184 (Consumes 12512 bytes)
Prefixes Total: 0 184
Implicit Withdraw: 0 0
Explicit Withdraw: 0 0
Used as bestpath: n/a 46
Used as multipath: n/a 0
Outbound Inbound
Local Policy Denied Prefixes: -------- -------
Total: 0 0
Number of NLRIs in the update sent: max 0, min 0
Address tracking is enabled, the RIB does have a route to 2.2.2.2
Connections established 35; dropped 34
Last reset 00:01:17, due to BGP Notification received, hold time expired
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Mininum incoming TTL 0, Outgoing TTL 255
Local host: 1.1.1.1, Local port: 179
Foreign host: 2.2.2.2, Foreign port: 24434
Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)
Event Timers (current time is 0x74A9E3D88):
Timer Starts Wakeups Next
Retrans 2 0 0x0
TimeWait 0 0 0x0
AckHold 4 3 0x0
SendWnd 0 0 0x0
KeepAlive 0 0 0x0
GiveUp 0 0 0x0
PmtuAger 0 0 0x0
DeadWait 0 0 0x0
iss: 1432533502 snduna: 1432533575 sndnxt: 1432533575 sndwnd: 65463
irs: 4098882880 rcvnxt: 4098886860 rcvwnd: 61556 delrcvwnd: 3979
SRTT: 836 ms, RTTO: 3946 ms, RTV: 1137 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 300 ms, ACK hold: 200 ms
Flags: passive open, nagle, path mtu capable, gen tcbs,
SACK option permitted
Datagrams (max data segment is 4394 bytes):
Rcvd: 8 (out of order: 0), with data: 6, total data bytes: 3979
Sent: 5 (retransmit: 0, fastretransmit: 0), with data: 1, total data bytes: 72
R1#ping Protocol [ip]: Target IP address: 2.2.2.2 Repeat count [5]: Datagram size [100]: 2200 //我们发现当datagram大小为2200的时候此路径都不通 Timeout in seconds [2]: Extended commands [n]: y Source address or interface: loopback0 Type of service [0]: Set DF bit in IP header? [no]: yes Validate reply data? [no]: Data pattern [0xABCD]: Loose, Strict, Record, Timestamp, Verbose[none]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 5, 2200-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: ..... Success rate is 0 percent (0/5)
此问题原因是因为客户的IGP环境发生了改变,某一台路由器的流量出口选择了一条备份链路,但是此链路接口mtu很小,导致bgp update报文在此被堵塞而造成holdtimer超时。
Show ip bgp *