无线 : 思科 ASR 5000 系列

5000系列的ASR :“BGPPeerSessionDown”陷阱在不到在残破的连接事件以后的保持计时器期限内出现

2016 年 10 月 27 日 - 机器翻译
其他版本: PDFpdf | 英语 (2016 年 4 月 21 日) | 反馈

简介

本文解释介入的定时,当边界网关协议(BGP)对等体用BGPPeerSessionDown陷阱标下来关于触发它事件的定时时。它采取为了对等体能得到标下来的时间比保持计时器的时期是值较少。此特定问题在思科聚合服务路由器(ASR) 5000报告,但是相等适用于ASR 5500。

贡献由Dave Damerjian, Cisco TAC工程师。

问题

在这个特定情况下,有在Demux信息包交换的卡德(PSC)的一npumgr process restart 1在ASR 5000由于微引擎问题,不是那不常见一个瞬变问题(没有对RMA的需要) :

2015-Jun-13+13:51:44.198 [sft 58000 info] [1/0/4255 <sft:100>
sft_monitor.c:115]
[software internal system critical-info syslog] SFT : Forced 1 times RX
packet at slot 1, cpu 0, inst 100, inflight packets 30

2015-Jun-13+13:51:45.306 [sft 58000 info] [1/0/4255 <sft:100>
sft_monitor.c:115]
[software internal system critical-info syslog] SFT : Forced 81 times RX
packet at slot 1, cpu 0, inst 100, inflight packets 110

2015-Jun-13+13:51:45.205 [sft 58000 info] [1/0/4255 <sft:100>
sft_monitor.c:115]
[software internal system critical-info syslog] SFT : Forced 71 times RX
packet at slot 1, cpu 0, inst 100, inflight packets 100

Sat Jun 13 13:51:45 2015 Internal trap notification 73 (ManagerFailure)
facility npumgr instance 1 card 1 cpu 1

2015-Jun-13+13:51:45.335 [npuctrl 16019 error] [8/0/4729 <npuctrl:0>
rl_sf_handler.c:2570] [software internal system syslog] SF CTRL:
monitoring_recovery:
Task packet test failed on failed_card 1, calling npuctrl_sf_insert_card()

2015-Jun-13+13:51:48.469 [npuctrl 16019 error] [8/0/4729 <npuctrl:0>
rl_sf_handler.c:2558] [software internal system syslog] SF CTRL:
monitoring_recovery:
too many sf insert calls on failed_card 1, cnt = 1 calling
npuctrl_restart_npumgr()

Sat Jun 13 13:51:48 2015 Internal trap notification 150 (TaskFailed)
facility npumgr instance 1 on card 1 cpu 1

2015-Jun-13+13:51:48.470 [npuctrl 16020 info] [8/0/4729 <npuctrl:0>
npuctrl_func.c:230] [software internal system critical-info syslog]
CTRL: restart npumgr instance 1

2015-Jun-13+13:51:48.547 [rct 13012 info] [8/0/4643 <rct:0> rct_task.c:323]
[software internal system critical-info syslog] Death notification of task
npumgr/1 on 1/1 sent to parent task npuctrl/0

Sat Jun 13 13:51:58 2015 Internal trap notification 1099 (ManagerRestart)
facility npumgr instance 1 card 1 cpu 1

Sat Jun 13 13:51:58 2015 Internal trap notification 151 (TaskRestart)
facility npumgr instance 1 on card 1 cpu 1

2015-Jun-13+13:51:58.376 [npuctrl 16018 info] [8/0/4729 <npuctrl:0>
npuctrl_msg.c:241] [software internal system critical-info syslog]
task facility npumgr instance 1 created

工程扫描仪很好捕获它: 

%%%%%%%%%%%%% SFT : Forced X times RX packet at slot Y %%%%%%%%%%%%%
May be a case of Ucode storage corruption. Please check techzone article
2015-Jun-13+13:51:48.729 [sft 58000 info] [1/0/4255  sft_monitor.c:115]
[software internal system critical-info syslog] SFT : Forced 321 times
RX packet at slot 1, cpu 0, inst 100, inflight packets 238(Count: 33,         
First seen: 2015-Jun-13+13:51:44.903,     
Last seen: 2015-Jun-13+13:51:48.729)

这些简单网络管理协议(SNMP)陷阱指示企业网关的所有BGP对等体沿着走的一个10秒窗口:

Sat Jun 13 13:52:00 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS14 ipaddr 55.54.84.107

Sat Jun 13 13:52:02 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS16 ipaddr 55.54.84.123

Sat Jun 13 13:52:03 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS06 ipaddr 55.54.84.43

Sat Jun 13 13:52:04 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS04 ipaddr 55.54.84.26

Sat Jun 13 13:52:04 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS14 ipaddr 55.54.84.106

Sat Jun 13 13:52:04 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS05 ipaddr 55.54.84.35

Sat Jun 13 13:52:04 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS02 ipaddr 55.54.84.11

Sat Jun 13 13:52:04 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn EXGWin ipaddr 55.55.245.4

Sat Jun 13 13:52:05 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS16 ipaddr 55.54.84.122

Sat Jun 13 13:52:05 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS12 ipaddr 55.54.84.91

Sat Jun 13 13:52:05 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS01 ipaddr 55.54.84.3

Sat Jun 13 13:52:05 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS11 ipaddr 55.54.84.83

Sat Jun 13 13:52:05 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS15 ipaddr 55.54.84.115

Sat Jun 13 13:52:05 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS01 ipaddr 55.54.84.2

Sat Jun 13 13:52:06 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS04 ipaddr 55.54.84.27

Sat Jun 13 13:52:06 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS05 ipaddr 55.54.84.34

Sat Jun 13 13:52:06 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS11 ipaddr 55.54.84.82

Sat Jun 13 13:52:06 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS06 ipaddr 55.54.84.42

Sat Jun 13 13:52:07 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Ingress ipaddr 55.55.245.5

Sat Jun 13 13:52:07 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS03 ipaddr 55.54.84.18

Sat Jun 13 13:52:07 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS10 ipaddr 55.54.84.254

Sat Jun 13 13:52:08 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS03 ipaddr 55.54.84.19

Sat Jun 13 13:52:08 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS15 ipaddr 55.54.84.114

Sat Jun 13 13:52:09 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS02 ipaddr 55.54.84.10

Sat Jun 13 13:52:10 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS13 ipaddr 55.54.84.98

Sat Jun 13 13:52:10 2015 Internal trap notification 119 (BGPPeerSessionDown)
vpn Egress-MPLS12 ipaddr 55.54.84.90

BGP在Demux PSC在这种情况下是卡有问题的1被控制。因此它是不意外的为了BGP能断开。另外,因为这是一活动相互机箱会话恢复(ICSR) -技术机箱,有服务冗余协议(SRP)切换:

[local]Enterprise_XGW> show srp call-loss statistics
Switchover-9  started at : Sat Jun 13 13:52:06 2015,  took 3 seconds to finish.
    Switchover reason : BGP failure
    Total number of active calls at switchover time : 714711

解决方案

问题

如果事件在13:51:45每陷阱/日志,是否比BGP保持计时器的时期不会预计为了对等体能沿着走没有快?

答案

所有这些对等体的BGP设置是相同的象此:

timers bgp keepalive-interval 10 holdtime-interval 60

 当配置在60秒时,与对等体的协商尊重较低值,是30秒:

******** show ip bgp neighbors *******
Saturday June 13 14:42:38 UTC 2015
BGP neighbor is 55.55.245.4, remote AS 22394, local AS  64873, external link
  BGP version 4, remote router ID 55.54.244.197
  BGP state = Established,up for 5d04h29m
  Hold time is 30 seconds, keepalive interval is 10 seconds
  Configured Hold time is 60 seconds, keepalive interval is 10 seconds

如何能去下来在13:52:00和13:52:10之间解释的对等体,当事件在13:51:45 ?

答案是很可能,连接减弱了由于网络处理器单元(NPU)问题,在第一本日志显示前。例如,请做假定5秒在13:51:40。每个BGP对等体对发送/接收保活每10秒,中的每一“独自地循环”。BGP对等体对不是所有同步的给互相关于keep-alive间隔,虽然每个对有10秒同一设置。您能假设,在时间所有10秒的粒,所有对等体发送了Keepalive,因为保活间隔是10秒。 如果连接中断了在13:51:40,则所有对等体对发送了某时他们的最后Keepalive在根据什么和13:51:40之间的13:51:30他们的周期是(请记住每个对与其他对是无关的)。在这种情况下,没有在此时间范围以后接收的进一步Keepalive,意味着30秒终止将发生在13:52:00 - 13:52:10范围内,精密地是,当所有对等体标了得下来。

简而言之,在此刻连接是残破的后(是否那能确定是另一个问题), BGP将预计标下来一些时间在保持时间时间间隔和保持时间时间间隔之间减同意的保活间隔。在这种情况下那是在20和30秒之间。

相关信息



Document ID: 119157