 
            
            
        此产品的文档集力求使用非歧视性语言。在本文档集中,非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言,文档中可能无法确保完全使用非歧视性语言。 深入了解思科如何使用包容性语言。
思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言,希望全球的用户都能通过各自的语言得到支持性的内容。 请注意:即使是最好的机器翻译,其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任,并建议您总是参考英文原始文档(已提供链接)。
本文档介绍修复以下故障的后续步骤:
"Code" : "F0321",
"Description" : "Controller <id> is unhealthy because: Data Layer Partially Degraded Leadership",
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0321",
"Code" : "F0321",
"Description" : "Controller 3 is unhealthy because: Data Layer Partially Diverged"
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0321",
"Code" : "F0325",
"Description" : "Connectivity has been lost to the leader for some data subset(s) of a service on <node >, the service may have unexpectedly restarted or failed",
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0325",
"Code" : "F0323",
"Description" : "Lost connectivity to leader for some data subset(s) of Access <Service> on <controller >",
"Dn" : "topology/pod-<POD-ID>/node-<NODE-ID>/av/node-<NODE-ID>/fault-F0323",
如果您有与Intersight连接的ACI交换矩阵,则会代表您生成服务请求,以指明在Intersight-Connected ACI交换矩阵中发现此故障的实例。
当APIC集群不正常时,会引发此特定故障。当分片/副本中的任一个关闭(在“acidiag rvread output”中以“\”表示)时,会看到数据层部分分化。 当以“X”表示的APIC中完全缺少副本或数据库时,也会发生此故障。我们需要修复任何底层问题并恢复集群的运行状况。
如果交换矩阵处于生产状态,请勿尝试任何侵入性步骤(例如断电、重新加载或解压缩)以解决群集问题。收集并上传TS文件到TAC案例,了解恢复APIC集群的确切步骤。
通过运行此命令,它将执行多项检查,包括与APIC的连接。所有测试结果都应返回良好。如果我们发现除OK以外的任何问题,我们需要调查问题的原因。
######## Sample output on a healthy cluster ########
apic1# acidiag cluster
Admin password:
Running...
Checking Wiring and UUID: OK
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: OK
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: OK
Checking Shard Convergence: OK
Checking Leadership Degration: Optimal leader for all shards
Ping OOB IPs:
APIC-1: 10.197.204.149 - OK
APIC-2: 10.197.204.150 - OK
APIC-3: 10.197.204.151 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - OK
APIC-2: 10.0.0.2 - OK
APIC-3: 10.0.0.3 - OK
Checking APIC Versions: Same (5.2(4d))
Checking SSL: OK
Full file system(s): None
Done!
######## Sample output on a unhealthy cluster ########
apic1# acidiag cluster
Admin password:
Running...
Checking Wiring and UUID: switch(302) reports apic(3) has wireIssue: unapproved-ctrlr
Checking AD Processes: Running
Checking All Apics in Commission State: OK
Checking All Apics in Active State: OK
Checking Fabric Nodes: OK
Checking Apic Fully-Fit: OK
Checking Shard Convergence: OK
Checking Leadership Degration: Non optimal leader for shards : 3:1,3:2,3:4,3:5,3:7,3:8,3:10,3:11,3:13,3:14,3:16,3:17,3:19,3:20,3:22,3:23,3:25,3:26,3:28,3:29,3:31,3:32,6:1,6:2,6:4,6:5,6:7,6:8,6:10,6:11,6:13,6:14,6:16,6:17,6:19,6:20,6:22,6:23,6:25,6:26,6:28,6:29,6:31,6:32,9:1,9:2,9:4,9:5,9:7,9:8,9:10,9:11,9:13,9:14,9:16,9:17,9:19,9:20,9:22,9:23,9:25,9:26,9:28,9:29,9:31,9:32,10:1,10:2,10:4,10:5,10:7,10:8,10:10,10:11,10:13,10:14,10:16,10:17,10:19,10:20,10:22,10:23,10:25,10:26,10:28,10:29,10:31,10:32,11:1,11:2,11:4,11:5,11:7,11:8,11:10,11:11,11:13,11:14,11:16,11:17,11:19,11:20,11:22,11:23,11:25,11:26,11:28,11:29,11:31,11:32,14:1,14:2,14:4,14:5,14:7,14:8,14:10,14:11,14:13,14:14,14:16,14:17,14:19,14:20,14:22,14:23,14:25,14:26,14:28,14:29,14:31,14:32,16:1,16:2,16:4,16:5,16:7,16:8,16:10,16:11,16:13,16:14,16:16,16:17,16:19,16:20,16:22,16:23,16:25,16:26,16:28,16:29,16:31,16:32,22:1,22:2,22:4,22:5,22:7,22:8,22:10,22:11,22:13,22:14,22:16,22:17,22:19,22:20,22:22,22:23,22:25,22:26,22:28,22:29,22:31,22:32,23:1,23:2,23:4,23:5,23:7,23:8,23:10,23:11,23:13,23:14,23:16,23:17,23:19,23:20,23:22,23:23,23:25,23:26,23:28,23:29,23:31,23:32,33:1,34:1,34:2,34:4,34:5,34:7,34:8,34:10,34:11,34:13,34:14,34:16,34:17,34:19,34:20,34:22,34:23,34:25,34:26,34:28,34:29,34:31,34:32,35:1,35:2,35:4,35:5,35:7,35:8,35:10,35:11,35:13,35:14,35:16,35:17,35:19,35:20,35:22,35:23,35:25,35:26,35:28,35:29,35:31,35:32,36:1,39:1,39:2,39:4,39:5,39:7,39:8,39:10,39:11,39:13,39:14,39:16,39:17,39:19,39:20,39:22,39:23,39:25,39:26,39:28,39:29,39:31,39:32
Ping OOB IPs:
APIC-1: 10.197.204.184 - OK
APIC-2: 10.197.204.185 - OK
APIC-3: 10.197.204.186 - OK
Ping Infra IPs:
APIC-1: 10.0.0.1 - OK
APIC-2: 10.0.0.2 - OK
APIC-3: 10.0.0.3 - OK
Checking APIC Versions: Same (5.2(3e))
Checking SSL: OK
Full file system(s): None
Done!
请确保APIC SSD运行正常,并且在ACI交换矩阵上未出现这些故障之一 — F2730、F2731和F2732。以下是在APIC CLI上运行的命令,以查找是否存在这些故障或是否可以在GUI上验证这些故障(System > Faults)
#####  Example: 
# faultRecord
ack             : no
cause           : equipment-wearout
changeSet       : available:unspecified, blocks:unspecified, capUtilized:0, device:Solid State Device, fileSystem:/dev/sdb, firmwareVersion:Dxxxxxxx, mediaWearout:1, model:INTEL SSDSC2BB120G4, mount:/dev/sdb, name:/dev/sdb, operSt:ok, serial:ABCDxxxxxxxxxxxXYZ, used:unspecified
childAction     : 
code            : F2730
created         : 2022-01-10T03:13:08.026+00:00
delegated       : no
descr           : Storage unit /dev/sdb on Node 3 with hostname apic1.cisco.com mounted at /dev/sdb has 1% life remaining
dn              : topology/pod-2/node-3/sys/ch/p-[/dev/sdb]-f-[/dev/sdb]/fault-F2730
domain          : infra
highestSeverity : warning
lastTransition  : 2022-01-10T03:13:08.026+00:00
lc              : raised
occur           : 1
origSeverity    : warning
prevSeverity    : warning
rule            : eqpt-storage-wearout-warning
severity        : warning
status          : 
subject         : equipment-wearout
type            : operational
# faultRecord
ack             : no
cause           : equipment-wearout
changeSet       : available:unspecified, blocks:unspecified, capUtilized:0, device:Solid State Device, fileSystem:/dev/sdb, firmwareVersion:Dxxxxxxx, mediaWearout:1, model:INTEL SSDSC2BB120G4, mount:/dev/sdb, name:/dev/sdb, operSt:ok, serial:ABCDxxxxxxxxxxxXYZ, used:unspecified
childAction     : 
code            : F2731
created         : 2022-01-10T03:13:08.026+00:00
delegated       : no
descr           : Storage unit /dev/sdb on Node 3 mounted at /dev/sdb has 1% life remaining
dn              : topology/pod-2/node-3/sys/ch/p-[/dev/sdb]-f-[/dev/sdb]/fault-F2731
domain          : infra
highestSeverity : major
lastTransition  : 2022-01-10T03:13:08.026+00:00
lc              : raised
occur           : 1
origSeverity    : major
prevSeverity    : major
rule            : eqpt-storage-wearout-major
severity        : major
status          : 
subject         : equipment-wearout
type            : operational检查是否所有DME进程都在运行
运行ps -aux | egrep "svc|nginx.bin|dhcp"
预期输出如下:
apic1# ps -ef | egrep "svc|nginx.bin|dhcp"
root      3063     1  5 22:08 ?        00:04:40 /mgmt//bin/nginx.bin -p /data//nginx/
root      8889     1  7 21:53 ?        00:06:43 /mgmt//bin/svc_ifc_appliancedirector.bin --x
ifc       8891     1  1 21:53 ?        00:01:29 /mgmt//bin/svc_ifc_policydist.bin --x
root      8893     1  2 21:53 ?        00:02:28 /mgmt//bin/svc_ifc_bootmgr.bin --x
ifc       8894     1  1 21:53 ?        00:01:41 /mgmt//bin/svc_ifc_vmmmgr.bin --x
ifc       8895     1  2 21:53 ?        00:02:14 /mgmt//bin/svc_ifc_topomgr.bin --x
ifc       8901     1  2 21:53 ?        00:02:22 /mgmt//bin/svc_ifc_observer.bin --x
root      8903     1  1 21:53 ?        00:01:40 /mgmt//bin/svc_ifc_plgnhandler.bin --x
ifc       8914     1  1 21:53 ?        00:01:34 /mgmt//bin/svc_ifc_domainmgr.bin --x
ifc       8915     1  2 21:53 ?        00:02:04 /mgmt//bin/svc_ifc_dbgr.bin --x
ifc       8917     1  1 21:53 ?        00:01:34 /mgmt//bin/svc_ifc_edmgr.bin --x
ifc       8918     1  1 21:53 ?        00:01:22 /mgmt//bin/svc_ifc_vtap.bin --x
ifc       8922     1  2 21:53 ?        00:02:09 /mgmt//bin/svc_ifc_eventmgr.bin --x
ifc       8925     1  3 21:53 ?        00:03:15 /mgmt//bin/svc_ifc_reader.bin --x
ifc       8929     1  1 21:53 ?        00:01:34 /mgmt//bin/svc_ifc_idmgr.bin --x
ifc       8930     1  1 21:53 ?        00:01:26 /mgmt//bin/svc_ifc_licensemgr.bin --x
ifc       8937     1  3 21:53 ?        00:03:18 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc       8941     1  1 21:53 ?        00:01:34 /mgmt//bin/svc_ifc_scripthandler.bin --x
root     11157     1  1 21:54 ?        00:01:29 /mgmt//bin/dhcpd.bin -f -4 -cf /data//dhcp/dhcpd.conf -lf /data//dhcp/dhcpd.lease -pf /var/run//dhcpd.pid --no-pid bond0.3902
root     11170     1  4 21:54 ?        00:04:15 /mgmt//bin/svc_ifc_ae.bin --x
admin    17094 16553  0 23:27 pts/0    00:00:00 grep -E svc|nginx.bin|dhcp
您可以检查故障DME的故障代码F1419。
apic1# show faults code F1419 history
ID                     : 4294971876
Description            : Service policymgr failed on apic bgl-aci02-apic1 of fabric
                        POD02 with a hostname bgl-aci02-apic1
Severity               : major
DN                     : subj-[topology/pod-1/node-1/sys/proc/proc-
                        policymgr]/fr-4294971876
Created                : 2022-03-21T18:29:20.570+12:00
Code                   : F1419
Type                   : operational
Cause                  : service-failed
Change Set             : id (Old: 5152, New: 0), maxMemAlloc (Old: 1150246912, New:
                        0), operState (Old: up, New: down)
Action                 : creation
Domain                 : infra
Life Cycle             : soaking
Count Fault Occurred   : 1
Acknowledgement Status : no
如果apic之间失去连接,原因之一可能是布线问题。Acidiag Cluster命令还将显示链路上存在的布线问题。以下是所有可能的布线问题:
ctrlr-uuid-mismatch - APIC UUID不匹配(重复的APIC ID)
fabric-domain-mismatch — 相邻节点属于不同的交换矩阵
布线不匹配 — 无效连接(枝叶到枝叶、主干到非枝叶、枝叶交换矩阵端口到非主干等)
adajeceny-not-detected — 交换矩阵端口上无LLDP邻接关系
infra-vlan-mismatch — 枝叶和APIC之间的基础设施VLAN不匹配。
pod-id-mismatch - APIC和枝叶之间的Pod ID不匹配
unapproved-ctrlr - APIC与连接的枝叶之间的SSL握手未完成。
unapproved-serialnumber — 检测到在Apic的数据库中不存在的节点。
如果DME进程状态部分的输出与预期输出不匹配。尝试使用“acidiag start <DME>”启动DME,例如,如果svc_ifc_eventmgr缺失,则尝试“acidiag start eventmgr”
apic1# ps -aux | egrep "svc|nginx.bin|dhcp"
root      5112  7.3  0.4 1033952 323180 ?      Ssl  Mar10 3073:27 /mgmt//bin/svc_ifc_appliancedirector.bin --x
ifc       5117  1.7  0.6 1062664 439876 ?      Ssl  Mar10 720:52 /mgmt//bin/svc_ifc_topomgr.bin --x
ifc       5118  2.1  2.2 2164512 1468200 ?     Ssl  Mar10 884:11 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc       5119  1.5  0.3 1115984 256904 ?      Ssl  Mar10 664:51 /mgmt//bin/svc_ifc_licensemgr.bin --x
ifc       5120  1.5  0.5 1088252 356760 ?      Ssl  Mar10 666:26 /mgmt//bin/svc_ifc_edmgr.bin --x
root      5121  1.6  0.6 1125948 423392 ?      Ssl  Mar10 698:11 /mgmt//bin/svc_ifc_bootmgr.bin --x
ifc       5123  2.3  1.2 1474388 800564 ?      Ssl  Mar10 994:15 /mgmt//bin/svc_ifc_eventmgr.bin --x
ifc       5126  1.5  8.2 6032524 5363184 ?     Ssl  Mar10 635:58 /mgmt//bin/svc_ifc_reader.bin --x
root      5130  4.6  0.6 1092480 439580 ?      Ssl  Mar10 1927:08 /mgmt//bin/svc_ifc_ae.bin --x
ifc       5132  1.6  0.8 1312136 567420 ?      Ssl  Mar10 689:43 /mgmt//bin/svc_ifc_vmmmgr.bin --x
ifc       5133  1.5  0.5 1064176 346760 ?      Ssl  Mar10 659:03 /mgmt//bin/svc_ifc_domainmgr.bin --x
ifc       5135  1.8  1.6 1736876 1099924 ?     Ssl  Mar10 770:39 /mgmt//bin/svc_ifc_observer.bin --x
root      5141  1.5  0.7 1092948 458156 ?      Ssl  Mar10 663:41 /mgmt//bin/svc_ifc_plgnhandler.bin --x
ifc       5146  2.0  0.6 1037676 397236 ?      Ssl  Mar10 857:43 /mgmt//bin/svc_ifc_idmgr.bin --x
ifc       5148  1.3  0.3 650596 222336 ?       Ssl  Mar10 580:25 /mgmt//bin/svc_ifc_vtap.bin --x
ifc       5160  1.6  0.6 1098280 453492 ?      Ssl  Mar10 669:17 /mgmt//bin/svc_ifc_scripthandler.bin --x
root      7089  1.4  0.4 856360 315016 ?       Ssl  Mar10 592:04 /mgmt//bin/dhcpd.bin -f -4 -cf /data//dhcp/dhcpd.conf -lf /data//dhcp/dhcpd.lease -pf /var/run//dhcpd.pid --no-pid bond0.3903
admin    29834  0.0  0.0 112800  1780 pts/1    S+   17:22   0:00 grep -E svc|nginx.bin|dhcp
ifc      30432  1.4  0.6 894088 405968 ?       Ssl  Mar17 473:45 /mgmt//bin/svc_ifc_policydist.bin --x
root     31215  2.8  5.2 4503880 3397276 ?     Ssl  Apr05 124:08 /mgmt//bin/nginx.bin -p /data//nginx/与DME进程状态部分中提到的预期输出相比,上述输出中缺少svc_ifc_dbgr.bin。我们可以使用“acidiag restart dbgr”启动该过程
apic1# acidiag start dbgr
apic1# ps -aux | egrep "svc|nginx.bin|dhcp"
root      5112  7.3  0.4 1033952 323240 ?      Ssl  Mar10 3073:43 /mgmt//bin/svc_ifc_appliancedirector.bin --x
ifc       5117  1.7  0.6 1062664 439876 ?      Ssl  Mar10 720:56 /mgmt//bin/svc_ifc_topomgr.bin --x
ifc       5118  2.1  2.2 2164512 1468200 ?     Ssl  Mar10 884:16 /mgmt//bin/svc_ifc_policymgr.bin --x
ifc       5119  1.5  0.3 1115984 256904 ?      Ssl  Mar10 664:55 /mgmt//bin/svc_ifc_licensemgr.bin --x
ifc       5120  1.5  0.5 1088252 356760 ?      Ssl  Mar10 666:30 /mgmt//bin/svc_ifc_edmgr.bin --x
root      5121  1.6  0.6 1125948 423392 ?      Ssl  Mar10 698:15 /mgmt//bin/svc_ifc_bootmgr.bin --x
ifc       5123  2.3  1.2 1474388 800784 ?      Ssl  Mar10 994:21 /mgmt//bin/svc_ifc_eventmgr.bin --x
ifc       5126  1.5  8.2 6032524 5363184 ?     Ssl  Mar10 636:01 /mgmt//bin/svc_ifc_reader.bin --x
root      5130  4.6  0.6 1092480 439580 ?      Ssl  Mar10 1927:18 /mgmt//bin/svc_ifc_ae.bin --x
ifc       5132  1.6  0.8 1312136 567420 ?      Ssl  Mar10 689:46 /mgmt//bin/svc_ifc_vmmmgr.bin --x
ifc       5133  1.5  0.5 1064176 346760 ?      Ssl  Mar10 659:07 /mgmt//bin/svc_ifc_domainmgr.bin --x
ifc       5135  1.8  1.6 1736876 1099924 ?     Ssl  Mar10 770:43 /mgmt//bin/svc_ifc_observer.bin --x
root      5141  1.5  0.7 1092948 458156 ?      Ssl  Mar10 663:45 /mgmt//bin/svc_ifc_plgnhandler.bin --x
ifc       5146  2.0  0.6 1037676 397236 ?      Ssl  Mar10 857:48 /mgmt//bin/svc_ifc_idmgr.bin --x
ifc       5148  1.3  0.3 650596 222336 ?       Ssl  Mar10 580:28 /mgmt//bin/svc_ifc_vtap.bin --x
ifc       5160  1.6  0.6 1098280 453492 ?      Ssl  Mar10 669:21 /mgmt//bin/svc_ifc_scripthandler.bin --x
root      7089  1.4  0.4 856360 315016 ?       Ssl  Mar10 592:07 /mgmt//bin/dhcpd.bin -f -4 -cf /data//dhcp/dhcpd.conf -lf /data//dhcp/dhcpd.lease -pf /var/run//dhcpd.pid --no-pid bond0.3903
ifc       7609  126  0.5 987404 362824 ?       Ssl  17:25   0:02 /mgmt//bin/svc_ifc_dbgr.bin --x  <=====
admin     7762  0.0  0.0 112800  1668 pts/1    S+   17:26   0:00 grep -E svc|nginx.bin|dhcp
ifc      30432  1.4  0.6 894088 405968 ?       Ssl  Mar17 473:48 /mgmt//bin/svc_ifc_policydist.bin --x
root     31215  2.8  5.2 4503880 3397252 ?     Ssl  Apr05 124:13 /mgmt//bin/nginx.bin -p /data//nginx/运行“Acidiag start dbgr”后,进程再次启动。如果您看不到进程入门,请联系TAC进行进一步的故障排除。
如果有任何core文件,请运行show core,将其上传到SR。
apic1# show core                   
 Node  Module  Creation-Time  File-Size  Service       Process  Original-Location   Exit-Code  Death-Reason  Last-Heartbeat 
 ----  ------  -------------  ---------  ------------  -------  ------------------  ---------  ------------  -------------- 
 
Ctrlr-Id  Creation-Time          File-Size  Service       Process  Original-Location                         Exit-Code 
 --------  ---------------------  ---------  ------------  -------  ----------------------------------------  --------- 
 1         2021-10-05T21:19:55.0  204534444  eventmgr      22453    /dmecores/svc_ifc_eventmgr.bin_log.22453  134       
           00-07:00                                                 .tar.gz                                             捕获APIC TS日志并将其上传到SR,以便进一步进行故障排除。https://www.cisco.com/c/en/us/support/docs/cloud-systems-management/application-policy-infrastructure-controller-apic/214520-guide-to-collect-tech-support-and-tac-re.html
| 版本 | 发布日期 | 备注 | 
|---|---|---|
| 1.0 | 
                                            
                                                06-Apr-2022
                                            
                                         | 初始版本 | 
 
    
    
 反馈
反馈