This document explains how to troubleshoot when there is mac move on Nexus 9000
2018 Nov 14 15:53:26.943 N9K %-SLOT1-5-BCM_L2_LEARN_DISABLE: MAC Learning Disabled unit=0 2018 Nov 14 15:53:27.769 N9K %-SLOT1-5-BCM_L2_LEARN_ENABLE: MAC Learning Enabled unit=0
We understand the concept of mac learning and how a table is maintained on a switch, when it switch receives a frame, and associates the MAC address of the sender with the LAN port where it was received. Under loop condition, it may so happen that the same MAC is learnt via two different ports on the switch.
When BCM ASIC learns too many mac-addresses in a short duration, BCM_USD will disable/enable MAC learning in hardware and you can see below message coming up. It could be caused if there are too many mac-moves/flaps/loops or new mac learns/moves exceeds a certain threshold. By default, on Nexus9K, you may not see the logs, which specifically tells us that the switch is experiencing mac-moves. However, in case these movements are high, we will end up seeing the following logs -
2018 Nov 14 15:53:26.943 N9K %-SLOT1-5-BCM_L2_LEARN_DISABLE: MAC Learning Disabled unit=0 2018 Nov 14 15:53:27.769 N9K %-SLOT1-5-BCM_L2_LEARN_ENABLE: MAC Learning Enabled unit=0 2018 Nov 14 15:53:27.863 N9K %-SLOT1-5-BCM_L2_LEARN_DISABLE: MAC Learning Disabled unit=0 2018 Nov 14 15:53:28.770 N9K %-SLOT1-5-BCM_L2_LEARN_ENABLE: MAC Learning Enabled unit=0
These messages are indicative of an event in the mac table. When we have continuous mac moves in the environment, these messages could be seen. Basically, the switch received frames with the same source MAC on two or more interfaces at a very high rate. The switch has a mechanism to count the number of MAC "move-backs" and weigh them based on the number of times the MAC address moves. The dynamic MAC learning will be disabled by the switch, in order to protect the control-plane.
At this point, you may want to check the mac-move count to understand, if and how many mac-moves have been experienced on the device,
N9K# sh mac address-table notification mac-move MAC Move Notify Triggers: 1 Number of MAC Addresses added: 612336 Number of MAC Addresses moved: 612328 Number of MAC Addresses removed: 0
The output of "Number of MAC Addresses moved" suggests that, the switch is experiencing mac-moves.
The next obvious thing is to find out the mac address that is causing this problem and the vlans and the interface information where, this is being experienced. To look for this information, we need to raise the logging level of L2FM from the default value of 2 up to 5 on N9K platform.
N9K# conf t Enter configuration commands, one per line. End with CNTL/Z. N9K(config)# logging level l2fm 5 N9K(config)# end N9K# sho logging level l2fm Facility Default Severity Current Session Severity -------- ---------------- ------------------------ l2fm 2 5
Any mac moves at this point will be seen in the syslogs,
2018 Nov 14 16:04:23.881 N9K %L2FM-4-L2FM_MAC_MOVE2: Mac 0000.117d.e02e in vlan 741 has moved between Po6 to Eth1/3 2018 Nov 14 16:04:23.883 N9K %L2FM-4-L2FM_MAC_MOVE2: Mac 0000.117d.e02e in vlan 741 has moved between Po6 to Eth1/3
In such a case, we can detect and limit the number of times that a MAC address moves from one port to another. Until Cisco NX-OS Release 6.0(2)U3(1), when a loop was detected between two ports, MAC learning was disabled for 180 seconds. However, starting 7.0(3)I7(3), we can now configure the switch to bring down the port with the lower interface index, when such a loop is detected by using the "mac address-table loop-detect port-down" command.
N9K# conf t Enter configuration commands, one per line. End with CNTL/Z. N9K(config)# mac address-table loop-detect port-down ? <CR> N9K(config)# mac address-table loop-detect port-down N9K(config)# exit N9K#
Now, further loop detection after the above command has been enabled, will take down the interface with lower interface index,
2018 Nov 13 19:33:54.773 N9K %ETHPORT-5-IF_DOWN_NONE: Interface port-channel6 is down (None) 2018 Nov 13 19:33:59.046 N9K %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel6: Ethernet2/1 is down 2018 Nov 13 19:33:59.049 N9K %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel6: Ethernet2/2 is down 2018 Nov 13 19:33:59.166 N9K %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel6: first operational port changed from Ethernet2/1 to none 2018 Nov 13 19:33:59.235 N9K %ETHPORT-5-IF_DOWN_ERROR_DISABLED: Interface port-channel6 is down (Error disabled. Reason:error) 2018 Nov 13 19:33:59.244 N9K %ETHPORT-5-IF_DOWN_CFG_CHANGE: Interface Ethernet2/2 is down(Config change) 2018 Nov 13 19:33:59.252 N9K %ETHPORT-5-IF_DOWN_CFG_CHANGE: Interface Ethernet2/1 is down(Config change) 2018 Nov 13 19:34:05.269 N9K %ETHPORT-5-IF_DOWN_CHANNEL_ERR_DISABLED: Interface Ethernet2/2 is down (Channel error disabled) 2018 Nov 13 19:34:05.303 N9K last message repeated 1 time 2018 Nov 13 19:34:05.303 N9K %ETHPORT-5-IF_DOWN_CHANNEL_ERR_DISABLED: Interface Ethernet2/1 is down (Channel error disabled)
Use the following command to verify the currently configured action,
N9K# show mac address-table loop-detect Port Down Action Mac Loop Detect : disabled
We can confirm the index of the interface to verify if the correct interface was disabled, as per the feature,