Following protocols fail to work on control plane:
ARP resolution fail
Ports on Nexus 9000 reported down due to UDLD error for module 1 & 2.
N9K-1(config-if)# 2018 Oct 20 07:23:23 N9K-1 %ETHPORT-5-IF_ADMIN_UP: Interface port-channel100 is admin up . 2018 Oct 20 07:23:23 N9K-1 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members) 2018 Oct 20 07:23:23 N9K-1 last message repeated 1 time 2018 Oct 20 07:23:23 N9K-1 %ETHPORT-5-IF_DOWN_ERROR_DISABLED: Interface Ethernet2/2 is down (Error disabled. Reason:UDLD empty echo) 2018 Oct 20 07:23:23 N9K-1 last message repeated 1 time 2018 Oct 20 07:23:23 N9K-1 %ETHPORT-5-IF_DOWN_ERROR_DISABLED: Interface Ethernet2/1 is down (Error disabled. Reason:UDLD empty echo) sh 2018 Oct 20 07:23:25 N9K-1 last message repeated 1 time
Line cards fail due to L2ACLRedirect diagnostic test on chassis for module 1 & 2.
'Show module'
ModOnline Diag Status --------------------- 1Fail————————————cleared the module 1 and 2 error .[show logging nvram] 2Fail—————————————module 2 reloaded. 3Pass
Module 1 and 2:
11) L2ACLRedirect-----------------> E 12) BootupPortLoopback: U
Another Possible way customer can hit this state is SUP/LC from a T2 ASIC based chassis moved to Tahoe based chassis
Note: If you want to know more information about ASIC troubleshooting please contact cisco TAC
CSCvc36411 Upgrading from T2 to Tahoe based line cards / FM can cause diagnostic failure and TCAM issues
Analysis
This issue would be seen when TCAM Values set to 0 on N9K-2
Capturing on inband 2018-10-23 04:02:40.568119 b0:aa:77:30:75:bf -> ff:ff:ff:ff:ff:ff ARP Who has 1.1.1.1?Tell 1.1.1.2 2018-10-23 04:02:40.568558 cc:46:d6:af:ff:bf -> b0:aa:77:30:75:bf ARP 1.1.1.1 is at cc:46:d6:af:ff:bf 2018-10-23 04:02:48.574800 b0:aa:77:30:75:bf -> ff:ff:ff:ff:ff:ff ARP Who has 1.1.1.1?Tell 1.1.1.2 2018-10-23 04:02:48.575230 cc:46:d6:af:ff:bf -> b0:aa:77:30:75:bf ARP 1.1.1.1 is at cc:46:d6:af:ff:bf————arp reply packet sent by agg1.
ELAM on N9K-2 has ARP response from N9K-1
Note: Please contact Cisco TAC to verify ELAM capture
PING 1.1.1.1 (1.1.1.1): 56 data bytes 36 bytes from 1.1.1.2: Destination Host Unreachable Request 0 timed out 36 bytes from 1.1.1.2: Destination Host Unreachable Request 1 timed out 36 bytes from 1.1.1.2: Destination Host Unreachable Request 2 timed out 36 bytes from 1.1.1.2: Destination Host Unreachable Request 3 timed out 36 bytes from 1.1.1.2: Destination Host Unreachable
N9K-2# show ip arp | inc 1.1.1.1———arp not getting populated
To isolate arp issue add a static arp entry and disable UDLD
After static arp ping from 1.1.1.2 to 1.1.1.1 started working but it would fail again if UDLD is enabled
N9K-2(config)# ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2): 56 data bytes
64 bytes from 1.1.1.2: icmp_seq=0 ttl=255 time=0.32 ms 64 bytes from 1.1.1.2: icmp_seq=1 ttl=255 time=0.285 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=255 time=0.282 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=255 time=0.284 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=255 time=0.291 ms
Though ping works the UDLD errors would still be seen on the interface when enabled
No CoPP drops as seen below
N9K-2# show hardware internal cpu-mac inband active-fm traffic-to-sup
Active FM Module for traffic to sup: 0x00000016———————————————————————————Module 22.
N9K-2# show policy-map interface control-plane module 22 | inc dropp