This document describes how to troubleshoot the ARP storm, without any inband ARP traffic.
ARP storm is a common denial-of-service (DoS) attack you would see in the data center enviroment.
The common switch logic to handle ARP packet is that:
ARP packet with broadcast destination Media Access Control (MAC)
ARP packet with unicast destination MAC, which belongs to the switch
will be processed by ARP process in software if the Switch Virtual Interface (SVI) is up in the receiving Vlan.
By this logic, if there is one or more malicous hosts keep sending ARP request in a Vlan, where a switch is the gateway of that Vlan. The ARP request will be processed in software hence causes the switch being overwhelmed. In some older Cisco switch model and version, you will see that ARP process takes the CPU usage up to high level and system is too busy to handle other control plane traffic.The common way to trace such attack is to run inband capture to identify the source MAC of the ARP storm.
The switch however still see ARP related problem, e.g. direct connected host has incomplete ARP. Is it possiblely caused by ARP storm?
The answer is yes on Nexus 7000.
In the nexus 7000 linecard design, to support ARP packet process in CoPP, ARP request will drive a special logical interface (LIF) then be rate limited by CoPP in forwarding engine (FE). This happens no matter you have a SVI up for the Vlan or not.
Hence, while the final forwarding decision made by FE is to not send the ARP request to inband CPU (in the case no SVI up for the vlan), the CoPP counter is still updated. It leads to CoPP saturated with excessive ARP request and dropping legitimate ARP request/reply. In this scenario, you will not see any excessive inband ARP packets but still being affected by ARP storm.
We have an enhanced bug CSCub47533 filed for this CoPP day one behavior.
There could be a few options to identify the source of ARP storm in this scenario. One effective option is:
First identify which module the ARP storm comes from
N7K# sh policy-map interface control-plane class copp-system-p-class-normal Control Plane service-policy input copp-system-p-policy-strict
class-map copp-system-p-class-normal (match-any) match access-group name copp-system-p-acl-mac-dot1x match exception ip multicast directly-connected-sources match exception ipv6 multicast directly-connected-sources match protocol arp set cos 1 police cir 680 kbps bc 250 ms conform action: transmit violate action: drop module 3: conformed 4820928 bytes, 5-min offered rate 0 bytes/sec peak rate 104 bytes/sec at Thu Aug 25 08:12:12 2016 violated 9730978848 bytes, 5-min violate rate 6983650 bytes/sec peak rate 7632238 bytes/sec at Thu Aug 25 00:43:33 2016 module 4: conformed 4379136 bytes, 5-min offered rate 0 bytes/sec peak rate 38 bytes/sec at Wed Aug 24 07:12:09 2016 violated 0 bytes, 5-min violate rate 0 bytes/sec peak rate 0 bytes/sec ...
Second use ELAM Procedure to capture all the ARP packet hitting the module. You might need to do it several times. But if there is a storm going on, the chance you capture the violate ARP packet is much better than legetimate ARP packet. Identify the source MAC and Vlan from the ELAM capture.