This document describes common commands and processes to monitor CPU usage and troubleshoot high CPU usage issues on Cisco Nexus 7000 Series platforms. The commands and sample EEM script are based on Nexus 7000 Release 6.1 and earlier and are subject to change in future releases.
CPU Usage on Nexus 7000 Platforms
The Nexus 7000 platform is a Linux-based system with a preemptive scheduler that allows fair access to CPU resources for all processes. Unlike the Cisco Catalyst 6500 Series, there is no separate route processor (RP) and switch processor (SP). Supervisor Engine 1 has a dual-core processor, Supervisor Engine 2 has a quad-core processor, and Supervisor Engine 2E has two quad-core processors.
The Cisco NX-OS operating system takes advantage of preemptive CPU multitasking, so processes can take advantage of an idle CPU in order to complete tasks faster. Therefore, the history option may report CPU spikes that do not necessarily indicate a problem. However, if average CPU usage remains high compared to normal, baseline CPU usage for a particular network, you might need to investigate high CPU usage.
Default hardware rate limiters (HWRL) and default control plane policing (CoPP) are enabled to help protect the supervisor inband interface on Nexus 7000 platforms.
Commands and Scripts to Monitor Processes and CPUs
The Cisco CLI Analyzer (registered customers only) supports certain show commands. Use the Cisco CLI Analyzer in order to view an analysis of show command output.
show processes Command
Use this command in order to display information about active processes.
switch# show processes
PID State PC Start_cnt TTY Type Process ----- ----- -------- ----------- ---- ---- ------------- 1 S 41520eb8 1 - O init 2 S 0 1 - O kthreadd 3 S 0 1 - O migration/0 4 S 0 1 - O ksoftirqd/0 5 S 0 1 - O watchdog/0 6 S 0 1 - O migration/1 7 S 0 1 - O ksoftirqd/1 8 S 0 1 - O watchdog/1 9 S 0 1 - O events/0 10 S 0 1 - O events/1 11 S 0 1 - O khelper 12 S 0 1 - O kblockd/0
Current program counter in hexadecimal format
Number of times a process has been started or restarted
Terminal that controls the process. A hyphen (--) usually means a daemon not running on any particular terminal.
Name of the process
Uninterruptible sleep (usually I/O)
Runnable (on run queue)
Traced or stopped
Defunct (zombie) process
Should be running but currently not running
show system resources Command
Use this command in order to display system-related CPU and memory statistics.
switch#show system resources Load average: 1 minute: 0.36 5 minutes: 0.39 15 minutes: 0.44 Processes : 1068 total, 1 running CPU states : 0.5% user, 5.5% kernel, 94.0% idle Memory usage: 8245436K total, 3289920K used, 4955516K free Current memory status: OK
Number of processes that are running. The average reflects the system load over the past 1, 5, and 15 minutes.
Number of processes in the system and how many processes are actually running when the command is issued.
CPU usage percentage in user mode, kernel mode, and idle time in the last one second. For a dual-core Supervisor, CPU is averaged across both cores.
Total memory, used memory, free memory, memory used for buffers, and memory used for cache in kilobytes. Buffers and the cache are included in the used memory statistics.
show processes cpu Command
Use this command in order to show the CPU usage at the process level:
show system internal sysmgr service pid <pid> Command
Use this command in order to display additional details, such as restart time, crash status, and current state, on the process/service by PID.
switch# show system internal processes cpu top - 17:37:26 up 4 days, 18:37, 3 users, load average: 0.16, 0.35, 0.33 Tasks: 450 total, 2 running, 448 sleeping, 0 stopped, 0 zombie Cpu(s): 3.5%us, 4.5%sy, 0.0%ni, 91.2%id, 0.1%wa, 0.1%hi, 0.5%si, 0.0%st Mem: 8245436k total, 4193248k used, 4052188k free, 27668k buffers Swap: 0k total, 0k used, 0k free, 1919664k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2908 root 20 0 112m 8516 5516 S 7.5 0.1 264:58.67 pfm 31710 sjlan 20 0 3732 1656 1140 R 3.8 0.0 0:00.04 top 3192 root 20 0 334m 47m 11m S 1.9 0.6 25:38.39 netstack 3578 svc-isan 20 0 118m 13m 6952 S 1.9 0.2 24:59.08 stp 5151 root 20 0 209m 46m 11m S 1.9 0.6 38:55.52 netstack 5402 svc-isan 20 0 117m 15m 9140 S 1.9 0.2 36:09.08 stp 5751 root 20 0 209m 46m 10m S 1.9 0.6 41:20.58 netstack 6098 svc-isan 20 0 151m 15m 6188 S 1.9 0.2 3:58.40 mrib 6175 svc-isan 20 0 118m 16m 9580 S 1.9 0.2 47:12.00 stp 1 root 20 0 1988 604 524 S 0.0 0.0 0:06.52 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0.0 0.0 0:00.08 migration/0 4 root 15 -5 0 0 0 S 0.0 0.0 1:07.83 ksoftirqd/0
switch# show system internal sysmgr service pid 2908 Service "Platform Manager" ("platform", 5): UUID = 0x18, PID = 2908, SAP = 39 State: SRV_STATE_HANDSHAKED (entered at time Mon Oct 15 23:03:45 2012). Restart count: 1 Time of last restart: Mon Oct 15 23:03:44 2012. The service never crashed since the last reboot. Tag = N/A Plugin ID: 0
Sample EEM Script
This is an example script that captures intermittent high CPU usage. The values used as well as the commands issued can be modified depending on the requirements:
event manager applet HIGH-CPU event snmp oid 22.214.171.124.126.96.36.199.188.8.131.52.1.6.1 get-type exact entry-op ge entry-val 80 exit-val 30 poll-interval 5 action 1.0 syslog msg High CPU hit $_event_pub_time action 2.0 cli enable action 3.0 cli show clock >> bootflash:high-cpu.txt action 4.0 cli show processes cpu sort >> bootflash:high-cpu.txt
Note: It is necessary to define 'exit-val.' As the script collects data, it increases CPU utilization. A value for exit-val ensures that the script does not run in an endless loop.
High CPU Usage Caused by Process or Traffic
There is no process vs. interrupt CPU usage (as on Cisco IOS® software platforms) when CPU usage is monitored. A quick way to determine the cause of high CPU usage is to use the show system internal processes cpu command. Mostly likely, high CPU usage triggered by traffic would cause Netstack, as well as other features and processes such as Address Resolution Protocol (ARP) and Internet Group Management Protocol (IGMP), to run high.
Process Causes High CPU Usage
Depending upon the processes and issues that are causing high CPU usage, you may need to capture specific commands. These sections describe methods that might be helpful.
show system internal <feature> mem-stats/memstats | in Grand Command
Use this command in order to show the memory allocation for a process; use the 'in Grand' option to monitor the Grand total memory. A memory leak can cause a process to misbehave, which can result in high CPU usage.
Use the debug logfile command whenever possible to direct the output to a specified file and to avoid locking up the session to fill up the syslog. This is an example of debug Simple Network Management Protocol (SNMP):
switch# show log last 10
2012 Oct 17 17:51:06 SITE1-AGG1 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface
Ethernet10/10, operational Transmit Flow Control state changed to off
2012 Oct 17 17:51:09 SITE1-AGG1 %ETH_PORT_CHANNEL-5-PORT_SUSPENDED:
Ethernet10/10: Ethernet10/10 is suspended
2012 Oct 17 17:51:51 SITE1-AGG1 last message repeated 1 time
2012 Oct 17 17:51:51 SITE1-AGG1 %ETHPORT-5-IF_DOWN_LINK_FAILURE:
Interface Ethernet10/10 is down (Link failure)
2012 Oct 17 17:51:52 SITE1-AGG1 %ETHPORT-5-SPEED: Interface Ethernet10/10,
operational speed changed to 10 Gbps
2012 Oct 17 17:51:52 SITE1-AGG1 %ETHPORT-5-IF_DUPLEX: Interface
Ethernet10/10, operational duplex mode changed to Full
2012 Oct 17 17:51:52 SITE1-AGG1 %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface
Ethernet10/10, operational Receive Flow Control state changed to off
2012 Oct 17 17:51:52 SITE1-AGG1 %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface
Ethernet10/10, operational Transmit Flow Control state changed to off
2012 Oct 17 17:51:55 SITE1-AGG1 %ETH_PORT_CHANNEL-5-PORT_UP: port-channel11:
Ethernet10/10 is up
2012 Oct 17 17:51:56 SITE1-AGG1 %ETHPORT-5-IF_UP: Interface Ethernet10/10
is up in mode trunk
Use the debug-filter command when possible in order to minimize the output on a production system. For example, a packet loss causes unidirectional link detection (UDLD) empty echo:
Use these tools when traffic causes high CPU usage:
Ethanalyzer - Monitor the type of traffic to or from the CPU.
Configuration - Check the switch/interface/feature configuration
CoPP/Hardware Rate Limiter - Ensure CoPP and HWRL are configured properly. Sometimes the CPU might not run high because it is being protected by CoPP and rate limiters. Check CoPP and HWRL to see if there are drops for certain traffic/packets.
Note: Both CoPP and HWRL are available only from the default virtual device context (VDC). They are enforced by each individual I/O module. Aggregate traffic from multiple modules can still burden the CPU heavily.
Root Cause Analysis of High CPU Usage
A network outage can be resolved by user intervention, or it can recover by itself . If you suspect that high CPU usage caused a network outage, use these guidelines in order to investigate causes.
Symptoms of high CPU usage include control plane instability, data plane connectivity issues caused by control plane failure, protocol flapping such as Hot Standby Router Protocol (HSRP)/RP flapping, UDLD error disabling, Spanning Tree Protocol (STP) failure, and other connectivity issues.
show processes cpu history Command
If the switch was not reloaded or switched over, run the show processes cpu history command within 72 hours of the outage in order to see if high CPU usage occurred at the time of the event.
CoPP and HWRL
If high CPU usage was the root cause of a past outage, and if you suspect that the outage was triggered by network traffic, you can use CoPP and HWRL (hardware rate limiter) in order to help identify the type of traffic.
show policy-map interface control-plane Command
This is sample output from the show policy-map interface control-plane command:
switch# show policy-map interface control-plane Control Plane
service-policy input: copp-system-p-policy-strict
class-map copp-system-p-class-critical (match-any) match access-group name copp-system-p-acl-bgp match access-group name copp-system-p-acl-bgp6 match access-group name copp-system-p-acl-igmp match access-group name copp-system-p-acl-msdp match access-group name copp-system-p-acl-ospf
match access-group name copp-system-p-acl-pim match access-group name copp-system-p-acl-pim6 match access-group name copp-system-p-acl-rip match access-group name copp-system-p-acl-rip6 match access-group name copp-system-p-acl-vpc match access-group name copp-system-p-acl-eigrp match access-group name copp-system-p-acl-eigrp6 match access-group name copp-system-p-acl-mac-l2pt match access-group name copp-system-p-acl-mpls-ldp match access-group name copp-system-p-acl-mpls-oam match access-group name copp-system-p-acl-ospf6 match access-group name copp-system-p-acl-otv-as match access-group name copp-system-p-acl-mac-otv-isis match access-group name copp-system-p-acl-mpls-rsvp match access-group name copp-system-p-acl-mac-fabricpath-isis match protocol mpls router-alert match protocol mpls exp 6 set cos 7 police cir 39600 kbps , bc 250 ms module 1 : conformed 1108497274 bytes; action: transmit violated 0 bytes; action: drop
Bahrain registers (cleared by chip reset only) ======================================================= revision 0x00000108 scratchpad 0xaaaaaaaa MAC status 0x00000001 MAC SerDes synced 0x00000001 MAC status 2 0x000100f8 Auto-XOFF config 1 Auto-XOFF status 0
After NX-OS Version 5.X, 'events' is a command option that provides the time when the maximum packets per second (PPS) receive (RX) or transmit (TX) CPU rate is reached. This example shows how to determine the time when the last peak of CPU traffic was encountered:
switch# show hardware internal cpu-mac inband events
1) Event:TX_PPS_MAX, length:4, at 648617 usecs after Fri Oct 19 13:23:06 2012 new maximum = 926
2) Event:TX_PPS_MAX, length:4, at 648622 usecs after Fri Oct 19 13:15:06 2012 new maximum = 916
3) Event:TX_PPS_MAX, length:4, at 648612 usecs after Fri Oct 19 13:14:06 2012 new maximum = 915
4) Event:TX_PPS_MAX, length:4, at 648625 usecs after Fri Oct 19 13:12:06 2012 new maximum = 914
5) Event:TX_PPS_MAX, length:4, at 648626 usecs after Fri Oct 19 13:11:06 2012 new maximum = 911
6) Event:TX_PPS_MAX, length:4, at 648620 usecs after Fri Oct 19 13:08:06 2012 new maximum = 910
show system internal pktmgr internal vdc inband <int> Command
Use this command to identify the source of traffic punted to CPU.
switch# show system internal pktmgr internal vdc inband e1/5 Interface Src Index VDC ID Packet rcvd -------------------------------------------------------- Ethernet1/5 0xa1d 1 14640
Netstack is a complete IP stack implemented in the user space of Nexus 7000. Components include a L2 Packet Manager, ARP, Adjacency Manager, IPv4, Internet Control Message Protocol v4 (ICMPv4), IPv6, ICMPv6, TCP/UDP, and socket library. When traffic to the CPU is triggering high CPU usage, you often see that Netstack and its respective process are running high.
show system inband queuing status Command
This example shows how to display the Netstack queueing algorithm in use:
switch# show system inband queuing status Weighted Round Robin Algorithm Weights BPDU - 32, Q0 - 8, Q1 - 4, Q2 - 2 Q3 - 64
show system inband queuing statistics Command
This example shows the counters in kernel-loadable module (KLM) and user space process.
The KLM is a single instance that runs on the default VDC and operates on both the inband and management interface. The KLM comes in to the picture only during ingress packet processing for sending ingress frames to the right VDC's Netstack for processing.
switch# show system inband queuing statistics Inband packets unmapped to a queue: 0 Inband packets mapped to bpdu queue: 7732593 Inband packets mapped to q0: 686667 Inband packets mapped to q1: 0 Inband packets mapped to q2: 0 Inband packets mapped to q3: 20128 In KLM packets mapped to bpdu: 7732593 In KLM packets mapped to arp : 912 In KLM packets mapped to q0 : 686667 In KLM packets mapped to q1 : 0 In KLM packets mapped to q2 : 0 In KLM packets mapped to q3 : 20128 In KLM packets mapped to veobc : 0 Inband Queues: bpdu: recv 1554390, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 1 (q0): recv 686667, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 (q1): recv 0, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 (q2): recv 0, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 (q3): recv 20128, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0
show system internal pktmgr internal vdc global-stats Command
This command is similar to the preceding show system inband queuing statistics command and provides many details:
switch# show system internal pktmgr internal vdc global-stats
VDC KLM global statistics: Inband packets not mapped to a VDC: 0 Inband diag packets received: 998222 Weighted Round Robin Algorithm Weights BPDU - 32, Q0 - 8, Q1 - 4, Q2 - 2 Q3 - 64 Inband packets unmapped to a queue: 0 Inband packets mapped to bpdu queue: 7734430 (7734430) Inband packets mapped to q0: 686779 (686779) Inband packets mapped to q1: 0 (0) Inband packets mapped to q2: 0 (0) Inband packets mapped to q3: 20128 (20128) Pkt Size History : 2811395 for index 1 Pkt Size History : 274508 for index 2 Pkt Size History : 74284 for index 3 Pkt Size History : 43401 for index 4 Pkt Size History : 70915 for index 5 Pkt Size History : 35602 for index 6 Pkt Size History : 30085 for index 7 Pkt Size History : 29408 for index 8 Pkt Size History : 21221 for index 9 Pkt Size History : 15683 for index 10 Pkt Size History : 13212 for index 11 Pkt Size History : 10646 for index 12 Pkt Size History : 9290 for index 13 Pkt Size History : 50298 for index 14 Pkt Size History : 5473 for index 15 Pkt Size History : 4871 for index 16 Pkt Size History : 4687 for index 17 Pkt Size History : 5507 for index 18 Pkt Size History : 15416 for index 19 Pkt Size History : 11333 for index 20 Pkt Size History : 5478 for index 21 Pkt Size History : 4281 for index 22 Pkt Size History : 3543 for index 23 Pkt Size History : 3059 for index 24 Pkt Size History : 2228 for index 25 Pkt Size History : 4390 for index 26 Pkt Size History : 19892 for index 27 Pkt Size History : 524 for index 28 Pkt Size History : 478 for index 29 Pkt Size History : 348 for index 30 Pkt Size History : 447 for index 31 Pkt Size History : 1545 for index 32 Pkt Size History : 152 for index 33 Pkt Size History : 105 for index 34 Pkt Size History : 1424 for index 35 Pkt Size History : 43 for index 36 Pkt Size History : 60 for index 37 Pkt Size History : 60 for index 38 Pkt Size History : 46 for index 39 Pkt Size History : 58 for index 40 Pkt Size History : 829 for index 41 Pkt Size History : 32 for index 42 Pkt Size History : 26 for index 43 Pkt Size History : 1965 for index 44 Pkt Size History : 21 for index 45 Pkt Size History : 1 for index 46 Pkt Size History : 1 for index 48 Pkt Size History : 1 for index 51 Pkt Size History : 1 for index 52 Pkt Size History : 1 for index 53 Pkt Size History : 3 for index 55 In KLM packets mapped to bpdu: 7734430 In KLM packets mapped to arp : 912 In KLM packets mapped to q0 : 686779 In KLM packets mapped to q1 : 0 In KLM packets mapped to q2 : 0 In KLM packets mapped to q3 : 20128 In KLM packets mapped to veobc : 0 In KLM Queue Mapping (0 1 2 3 4) Data Available in FDs (0 0 0 0 0) Inband Queues: bpdu: recv 1556227, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 1 (q0): recv 686779, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 (q1): recv 0, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 (q2): recv 0, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 (q3): recv 20128, drop 0, congested 0 rcvbuf 2097152, sndbuf 262142 no drop 0 Mgmt packets not mapped to a VDC: 227551 Mgmt multicast packets dropped: 92365 Mgmt multicast packets delivered: 0 Mgmt packets broadcast to each VDC: 23119 Mgmt debugging packets copied: 0 Mgmt IPv6 multicast packets delivered: 0 Mgmt IPv6 link-local packets delivered: 0 Mgmt LLDP packets received: 0
show system internal pktmgr interface ethernet <int> Command
Use this command in order to look at the packet rate as well as type of traffic (unicast or multicast) for CPU-punted traffic from an interface.
Use this command in order to check if packets are reaching the packet manager in the ingress path and if packets are being sent out by the packet manager. This command can also help you determine if there are problems with mbuffers in either the receive or transmit path.
switch# show system internal pktmgr stats Route Processor Layer-2 frame statistics
No of packets passed by PM Policy database 876452 No of packets dropped by PM Policy database 0 No of packets bypassed by PM Policy database 424480 No of packets dropped by PM originating from kernel 0
VPC Frame Statistics VPC Mgr reg state 1, im-ext-sdb-state 1 Ingress BPDUs qualified for redirection 0 Ingress BPDUs redirected to peer 0 Egress BPDUs qualified for redirection 0 Egress BPDUs dropped due to remote down 0 Egress BPDUs redirected to peer 0 Ingress pkts qualified for peergateway tunneling 0 Ingress pkts tunneled to peer with peergateway conf 0 Peer-gw pkts tunneled tx : From VPC+ leg 0, From VPC leg 0, From l2mp network 0 From orphan port in VPC+ 0, from orphan port in VPC 0 For ARP 0, IP 0, IPv6 0, unknown 0 Total Tunneled packets received from peer 0 Local delivery 0, Transmit down 0, peer-gw tunneled 0 Tunnel rx packets drop due to local vpc leg down 0 Peer-gw pkts tunneled rx : From VPC+ leg 0, VPC leg 0, From l2mp network 0 From orphan port in VPC+ 0, from orphan port in VPC 0 For ARP 0, IP 0, IPv6 0, unknown 0