This document describes Cisco ACI leaf and spine troubleshooting, including a triage table, switch-specific checks, and APIC-side correlation.
You can troubleshoot most ACI switch issues faster when you use an ordered command sequence instead of jumping directly into deep internal commands. Start with software and hardware baseline checks, continue with diagnostics and environmental state, and then correlate active switch issues on the APIC before moving into feature-specific commands.
| Goal |
Command |
What to Look for |
What to do Next |
|---|---|---|---|
| Confirm ACI mode and version |
show version |
ACI kick start image, expected release, sane reset reason |
If the switch is not in ACI mode, stop and correct the boot image first. |
| Verify module health |
show module |
Modules are 'ok' and online diagnostics are 'pass' |
If any active module is not 'ok' or diagnostics fail, treat it as a hardware issue first. |
| Verify power, fan, and thermal state |
show environment |
Operational PSUs are 'ok', fan state is 'ok', temperatures are 'normal' |
If the only anomaly is a redundant PSU in 'shut' state, verify design intent before escalating. |
| Verify diagnostic results |
show diagnostic result module all |
Tests show '.' for pass across active modules |
If any test is 'F', 'A', or 'I', correlate with module and fault output. |
| Check discovery and fabric baseline |
show discoveryissues |
System state, adjacency, infra VLAN, and policy download checks |
If discovery checks fail, fix baseline connectivity before troubleshooting tenants or routing. |
| Correlate on the APIC |
show faults leaf <node-id> or show faults history leaf <node-id> |
Fault code, severity, and affected DN |
Use the APIC view to separate active symptoms from already-cleared historical events. |
Before you interpret runtime state, verify that the node is discovered, registered, and running ACI mode software. For switch onboarding and baseline discovery checks, use the built-inshow discoveryissuescommand and confirm that the APIC reports the node as in-service.
leaf-A# show version
Software
BIOS: version 05.53
kickstart: version 16.1(3f) [build 16.1(3f)]
system: version 16.1(3f) [build 16.1(3f)]
PE: version 6.1(3f)
kickstart image file is: /bootflash/aci-n9000-dk9.16.1.3f.bin <--- ACI mode indicator
system image file is: /bootflash/auto-s
Hardware
cisco N9K-C93108TC-FX ("supervisor")
Device name: leaf-A
Last reset at 241000 usecs after Wed Mar 11 17:28:38 2026 JST
Reason: reset-requested-by-cli-command-reload
What good looks like: Kickstart and system lines are present, the kickstart image starts with 'aci-n9000', and the reset reason is explainable.
What bad looks like: Output shows a standalone NXOS image file with no ACI kickstart or system lines.
This section uses a fixed-form-factor leaf switch as the baseline. The output is based on a live ACI leaf and reflects both healthy and degraded conditions that are useful during triage.
Run this command in order to verify software level, image mode, uptime, and last reset reason.
leaf-A# show version
Software
BIOS: version 05.53
kickstart: version 16.1(3f) [build 16.1(3f)]
system: version 16.1(3f) [build 16.1(3f)]
PE: version 6.1(3f)
kickstart image file is: /bootflash/aci-n9000-dk9.16.1.3f.bin
system image file is: /bootflash/auto-s
Hardware
cisco N9K-C93108TC-FX ("supervisor")
Device name: leaf-A
Kernel uptime is 29 day(s), 19 hour(s), 52 minute(s), 45 second(s)
Last reset at 241000 usecs after Wed Mar 11 17:28:38 2026 JST
Reason: reset-requested-by-cli-command-reload
Service: PolicyElem Ch reload
Run this command in order to verify the line card state and the online diagnostic result at the module level.
leaf-A# show module Mod Ports Module-Type Model Status --- ----- ----------------------------------- ------------------ ---------- 1 54 48x10G+6x40/100G Switch N9K-C93108TC-FX ok Mod Online Diag Status --- ------------------ 1 pass <--- basic diagnostic baseline
What good looks like: The active module is ok and the online diagnostic state is pass.
What bad looks like: Module state is not ok or the diagnostic state is not pass.
Run this command in order to verify PSU, fan, and thermal state.
leaf-A# show environment Power Supply: Supply Model Output Capacity Status 1 NXA-PAC-500W-PE 0 W 500 W shut <--- redundant PSU not in use 2 NXA-PAC-500W-PE 219 W 500 W ok Fan: Fan1(sys_fan1) NXA-FAN-30CFM-F Status: ok Fan2(sys_fan2) NXA-FAN-30CFM-F Status: ok Fan3(sys_fan3) NXA-FAN-30CFM-F Status: ok Fan4(sys_fan4) NXA-FAN-30CFM-F Status: ok Temperature: 1 Inlet(1) 37 normal 1 outlet(2) 38 normal 1 x86 processor(3) 71 normal 1 Homewood(4) 56 normal
What good looks like: The active PSU is ok, fans are ok, and temperatures are normal.
What bad looks like: An operational PSU is failed, fan status is not ok, or any thermal sensor is not normal.
Run this command in order to validate the actual online diagnostic tests, not just the summary field in show module.
leaf-A# show diagnostic result module all Current bootup diagnostic level: bypass Module 1: 48x10G (Active) Test results: (. = Pass, F = Fail, I = Incomplete, U = Untested, A = Abort, E = Error disabled) 1) bios-mem-----------------------> . 2) mgmtp-lb-----------------------> . 22) cpu-cache----------------------> . 23) mem-health---------------------> . 24) ssd-acc------------------------> . 33) fpga-reg-chk-------------------> . 43) tahoe-mem----------------------> .
What good looks like: All required tests show '.'.
What bad looks like: Any F, I, or A result for active hardware.
Run this command in order to validate onboarding, adjacency, infra VLAN, and controller reachability. This is one of the most useful first-pass commands for leaf switches.
leaf-A# show discoveryissues
Check 3 HW Modules Check
Test01 Fans status check PASSED
Test02 Power Supply status check FAILED
[Warn] Operational state of sys/ch/psuslot-1/psu is: shut
[Info] Ignore this if it is a redundant power supply
Check 5 System State
Test01 Check System State PASSED
[Info] TopSystem State is : in-service
Check 8 Infra VLAN Check
Test01 Check if infra VLAN is received PASSED
[Info] Infra VLAN received is : 4093
Check 10 IS-IS Adj Info
Test01 check IS-IS adjacencies PASSED
[Info] IS-IS adjacencies found on interfaces:
[Info] eth1/54.30
[Info] eth1/51.31
[Info] eth1/53.32
Check 11 Reachability to APIC
Test01 Ping check to APIC FAILED
[Error] Ping to APIC IP 198.51.100.1 from 198.51.100.64 with MTU 1450 failed.
This example is useful because it shows a realistic mixed result - the node is in service and has fabric adjacencies, but controller reachability still fails while one redundant PSU is shut. You must interpret each failure in context instead of treating every failed line as equally severe.
This section uses a modular spine switch. The structure of the output is different from a fixed leaf because you must evaluate line cards, fabric modules, supervisors, and system controllers separately.
spine-A# show version
Software
BIOS: version 05.53
kickstart: version 16.1(3f) [build 16.1(3f)]
system: version 16.1(3f) [build 16.1(3f)]
PE: version 6.1(3f)
kickstart image file is: /bootflash/aci-n9000-dk9.16.1.3f-cs_64.bin <--- modular spine image
system image file is: /bootflash/auto-s
Hardware
cisco N9K-SUP-A+ ("supervisor")
Device name: spine-A
Last reset at 983000 usecs after Wed Mar 11 17:31:09 2026 JST
Reason: reset-requested-by-cli-command-reload
Run this command in order to verify every hardware plane in the chassis.
spine-A# show module Mod Ports Module-Type Model Status --- ----- ----------------------------------- ------------------ ---------- 1 32 32p 40/100G Ethernet Module N9K-X9732C-EX ok 2 32 32p 40/100G Ethernet Module N9K-X9732C-EX ok 3 36 36p 40/100G Ethernet Module N9K-X9736C-FX ok 22 0 Fabric Module N9K-C9504-FM-E ok 23 0 Fabric Module N9K-C9504-FM-E ok 24 0 Fabric Module N9K-C9504-FM-E ok 26 0 Fabric Module N9K-C9504-FM-E ok 27 0 Supervisor Module N9K-SUP-A+ active 28 0 Supervisor Module N9K-SUP-A+ standby 29 0 System Controller N9K-SC-A standby 30 0 System Controller N9K-SC-A active Mod Online Diag Status --- ------------------ 1 pass 2 pass 3 pass 22 pass 23 pass 24 pass 26 pass 27 pass 28 pass 29 pass 30 pass
What good looks like: Line cards, fabric modules, supervisors, and system controllers are all present and diagnostics are pass.
What bad looks like: Missing or non-ok fabric modules, supervisor failover anomalies, or any failed module diagnostics.
spine-A# show environment Power Supply: Supply Model Output Capacity Status 1 N9K-PAC-3000W-B 1031 W 3000 W ok 2 N9K-PAC-3000W-B 0 W 3000 W shut 3 N9K-PAC-3000W-B 992 W 3000 W ok 4 ------------ N/A W 0 W Absent Power Usage Summary: Power Supply redundancy mode (operational) Non-Redundant(combined) Total Power Output (actual draw) 1523 W Total Power Available for additional modules 1793 W Fan: Fan1(sys_fan1) N9K-C9504-FAN Status: ok Fan2(sys_fan2) N9K-C9504-FAN Status: ok Fan3(sys_fan3) N9K-C9504-FAN Status: ok Fan4(sys_fan4) N9K-C9504-FAN Status: ok Fan5(sys_fan5) N9K-C9504-FAN Status: ok Fan6(sys_fan6) N9K-C9504-FAN Status: ok Temperature: 1 ATOM processor(1) 32 normal 3 Homewood instance 2(3) 78 normal 22 LAC instance 1(2) 70 normal 27 x86 processor(4) 36 normal
This output is a good example of a chassis that is healthy even though one PSU is shut and another slot is Absent. The configured redundancy mode explains why the chassis is still operational.
spine-A# show diagnostic result module all Current bootup diagnostic level: bypass Module 1: 32p 40/100G Ethernet Module 1) bios-mem-----------------------> . 9) mvl4p-eobc-snake---------------> . 39) lcfc-conn----------------------> . 43) tahoe-mem----------------------> . Module 22: Fabric Module 10) mvl10p-snake-------------------> . 42) fclc-conn----------------------> . 43) tahoe-mem----------------------> . Module 27: Supervisor Module (Active) 24) ssd-acc------------------------> . 32) nvram-cksum--------------------> . 35) eobc-mon-----------------------> . Module 30: System Controller 11) bcm28p-snake-------------------> . 41) pcie-bus-----------------------> .
On a modular spine, the main value of this command is breadth. You can confirm that line cards, fabric modules, and supervisors all pass diagnostics in a single view.
After you validate the switch CLI baseline, move to the APIC in order to correlate the node with active and historical fault objects. This is the fastest way to determine whether the switch issue is isolated, policy-related, environmental, or already cleared.
apic-A# show version Role Pod Node Name Version ---------- --- ---- --------- ----------- controller 1 1 apic-A 6.1(3f) controller 1 2 apic-B 6.1(3f) controller 1 3 apic-C 6.1(3f) leaf 1 101 leaf-A n9000-16.1(3f) spine 1 201 spine-A n9000-16.1(3f)
Use this command in order to verify release alignment between controllers and switches before you assume a software mismatch.
apic-A# show faults leaf 101
Code : F0532
Severity : critical
Lifecycle : raised
DN : topology/pod-1/node-101/sys/phys-[eth1/11]/phys/fault-F0532
Description : Port is down, reason being Link Not Connected(Connected),
used by EPG on node 101 with hostname leaf-A
Code : F1451
Severity : minor
Lifecycle : raised
DN : topology/pod-1/node-101/sys/ch/psuslot-1/psu/fault-F1451
Description : Power supply shutdown.
Code : F1699
Severity : warning
Lifecycle : raised
DN : topology/pod-1/node-101/sys/time/prov-198.51.100.10/status/fault-F1699
Description : NTP configuration on Leaf leaf-A is not synced to NTP server
This output is useful because it immediately separates three domains - access ports used by EPGs, PSU state, and time synchronization.
apic-A# show faults history leaf 101 ID : 8589940065 Description : Port is down, reason:Link Not Connected(Connected), used by:Fabric Severity : minor Code : F1394 Action : modification Life Cycle : raised ID : 8589940026 Description : TCA: ingress drop packets rate value 233 raised above threshold 200 Severity : warning Code : F112128 Action : creation ID : 8589939383 Description : BGP peer is not established, current state Idle Severity : cleared Code : F0299 Action : deletion
Use the history view in order to distinguish active problems from transient events that have already recovered.
apic-A# moquery -c topSystem -f 'top.System.name=="spine-A"' # top.System dn : topology/pod-1/node-201/sys name : spine-A role : spine state : in-service oobMgmtAddr : 198.51.100.201 version : n9000-16.1(3f)
Use this query in order to confirm that the APIC view of the node matches the switch you are troubleshooting.
Problem: show environment or show discoveryissues reports a PSU in shut state.
Operational Check: Compare PSU state with the configured and operational redundancy mode in the same output.
Root Cause: In many lab and non-redundant deployments, one PSU is intentionally unused.
Solution: Treat the output as informational unless the active PSU is degraded or the redundancy mode does not match design intent.
Problem: show discoveryissues shows the node as in-service but APIC ping checks fail.
Configuration Check: Verify management and infra reachability design, including the APIC-facing path used by the test.
Operational Check: Confirm IS-IS adjacencies, infra VLAN deployment, and active APIC-side faults for the node.
Root Cause: The node can have enough baseline fabric state to join while still exposing controller reachability or policy download edge cases.
Solution: Use the APIC fault view and node management configuration to isolate whether the failure is management path related, tunnel related, or policy related.
Collect techsupport and escalate when one or more of these conditions exist:
| Revision | Publish Date | Comments |
|---|---|---|
1.0 |
12-Jun-2026
|
Initial Release |