Cisco Nexus 5000 Troubleshooting Guide
Troubleshooting System Management Issues
Downloads: This chapterpdf (PDF - 117.0KB) The complete bookPDF (PDF - 3.24MB) | Feedback

Troubleshooting System Management Issues

Table Of Contents

Troubleshooting System Management Issues

SNMP

SNMP memory usage continuously increasing

SNMP not responding

SNMP not responding and show snmp command reports SNMP has timed out

Not able to perform SNMP SET operation

SNMP on BRIDGE-MIB

Logging

System is not responsive

Syslog server not getting messages from DUT

CPU Processes

High CPU Utilization

Traps

Traps not received

DNS

DNS resolution not working correctly

Specified domain not removed from domain-list


Troubleshooting System Management Issues


The system management features of the Cisco Nexus 5000 Series switch allow you to monitor and manage your network for efficient device use, role-based access control, SNMP communications, diagnostics, and logging.

This chapter describes how to identify and resolve problems that can occur with system management and the Cisco Nexus 5000 Series switch.

This chapter includes the following sections:

SNMP

Logging

CPU Processes

Traps

DNS

SNMP

SNMP memory usage continuously increasing

The show proc mem | inc snmp command shows continuously increasing SNMP memory usage.

Possible Cause

SNMP memory usage increases when SNMP requests are processed from different monitoring stations. Typically, this situation stabilizes over time. If the memory increases continuously without stabilizing, then some of the SNMP requests are causing a memory leak.

Solution

Review the output from the show system internal snmp mem-stats detail command.

Take example snapshots with the following commands while processing SNMP requests:

show clock

show system internal mem-stats detail

show tech snmp

SNMP not responding

No response or delayed response for SNMP request.

Possible Cause

If the switch CPU utilization is high during the SNMP operations such as GET, GETNEXT and WALK, the response may be very slow or there is no responsethat results in a time-out.

Solution

While SNMP is not responding, check CPU utilization with the following commands:

show proc cpu history

show proc cpu sort

The output from this command shows which Nexus 5000 component is using the greatest amount of CPU resources.

SNMP not responding and show snmp command reports SNMP has timed out

SNMP is not responding and the show snmp command reports that SNMP has timed out.

Possible Cause

The SNMP process might have exited, but the process did not crash.

Solution

Use show system internal sysmgr service name snmpd command which should show the state to be"SRV_STATE_HANDSHAKED.

Example:

Service "snmpd" ("snmpd", 74):
UUID = 0x1A, PID = 4131, SAP = 28
State: SRV_STATE_HANDSHAKED (entered at time Mon Jun 14 17:12:15 2010).
Restart count: 1
Time of last restart: Mon Jun 14 17:12:14 2010.
The service never crashed since the last reboot.
Tag = N/A
Plugin ID: 0

Not able to perform SNMP SET operation

The following error appears when trying to perform the SNMP SET operation:

bash-2.05b$ snmpset -v2c -c private 10.78.25.211 .1.3.6.1.4.1.9.9.305.1.1.6.0 i 1
Error in packet.
Reason: notWritable
 
   

Possible Cause

The SNMP community does not have write permission.

Solution

Check the output of the show snmp community command to ensure that the write permission is enabled.

Example:

Community            Group / Access      context    acl_filter
 
   
private               network-operator               
public                network-admin 
 
   
Only "network-admin" has write permissions.
                    snmpset -v2c -c public 10.78.25.211 .1.3.6.1.4.1.9.9.305.1.1.6.0 i 1
                   enterprises.9.9.305.1.1.6.0 = 

SNMP on BRIDGE-MIB

The SNMP GET on BRIDGE-MIB operation does not return correct values and results in errors.

Possible Cause

The BRIDGE-MIB may not be supported.

Solution

Check the release notes to make sure that BRIDGE-MIB is supported on NX-OS Release 4.2(1) or later releases.

Logging

System is not responsive

System performance is significantly slower or non responsive.

Possible Cause

Some system resources may be over-utilized. For example, an incorrect logging level might generate many messages resulting in an impact on system resources.

Solution

Check the logging level on the chassis. If you have a logging level setting, such as 6 or 7, many messages are generated and performance can be impacted. Use the following commands to display the amount of resources that are being used.

show proc cpu | inc syslogd

show proc cpu

show run | inc logging

show system resource

Syslog server not getting messages from DUT

Although the syslog server is configured, the destination syslog server is not receiving messages from DUT.

Possible Cause

Syslog server might not be accessible or the logging level might not be appropriate.

Solution

Check to see if the destination syslog server is accessible from VRF management. Use the ping <dest-ip> vrf management command to ping the server.

Check that the syslog configuration on the DUT has use-vrf management.

Example:

logging server 10.193.12.1 5 use-vrf management
 
   

Check that the appropriate logging level is enabled to send logging messages. Use the show logging info command. If the logging level is not appropriate, then set the appropriate level using the logging level <feature> <log-level> command.

CPU Processes

High CPU Utilization

CPU experiences brief high utilization.

Possible Cause

Brief high utilization caused by CPU multitasking.

Solution

Spikes of high CPU utilization on the Cisco Nexus 5000/5500 switch is normal activity.

The show system resources command displays the high level CPU utilization for the supervisor module. The show process cpu command with the sort option displays all of the processes sorted by the highest CPU utilization per process. The show process cpu history command displays the CPU history in three increments: 60 seconds, 60 minutes, 72 hours. Viewing the CPU history is valuable when correlating a network event with the past CPU utilization.

Cisco NX-OS takes advantage of preemptive CPU multitasking, so processes can take advantage of an idle CPU to complete tasks faster. Therefore the show process cpu history command might display CPU spikes that are not necessarily a problem. Additional investigation is required if the average CPU remains close to 100%.

Example:

 
   
switch# show processes cpu sort 
 
   
PID    Runtime(ms)  Invoked   uSecs  1Sec    Process
-----  -----------  --------  -----  ------  -----------
 3611     57354660  30766347   1864    7.0%  statsclient
 4011    110298193  27004447   4084    5.3%  fcpc
 3561     96792384  87683659   1103    3.5%  gatosusd
 3685          862      8678     99    1.8%  netstack
    1        39116    447596     87    0.0%  init
<text ommited>
 
   
switch# show processes cpu history 
                                                                
    41 11  11111111131 11  11   1  12811111121 11 1122 1111111 1
    808008808093532901815792389618988723121180936746257022006081
100                                                             
 90                                  #                          
 80                                  #                          
 70                                  #                          
 60                                  #                          
 50 #                                #                          
 40 #                                #                          
 30 #               #               ##      #        #          
 20 #       # # #  ##   #          ###      #   #  ###      #   
 10 ############################################################
    0....5....1....1....2....2....3....3....4....4....5....5....
              0    5    0    5    0    5    0    5    0    5    
<text ommited>
 
   

Traps

Traps not received

The results of traps are not received.

Possible Cause

The traps might not be enabled or the SNMP host might not be accessible.

The following are possible causes:

Traps might not be enabled.

The SNMP host might not be accessible.

A firewall might be blocking access.

An access list might be blocking UDP port 162.

Solution

Use the following commands to check whether the proper VRF is configured for the SNMP host and that the trap is enabled:

snmp-server enable traps <trapname>

snmp-server host <x.x.x.x> use-vrf <vrf-name>

where x.x.x.x is the IP address of the trap receiving device.

DNS

DNS resolution not working correctly

When specifying a host name using DNS or VRF, the host name is not resolved and an error occurs.

Possible Cause

The DNS client is not configured correctly.

Solution

Use the following commands to configure the DNS client:

config t

vrf context management

ip host name <address1 [address2... address6]>

ip domain-name name [use-vrf <vrf-name>]

ip domain-list name [use-vrf <vrf-name>]

ip name-server <server-address1 [server-address2... server-address6]>< [use-vrf vrf-name>]

ip domain lookup

show hosts

copy running-config startup-config

Specified domain not removed from domain-list

When using the no ip domain-list <name> command to remove a specified domain from the domain-list, only the most recently added domain is removed.

Possible Cause

The no ip domain-list <name> command is not locating the specified domain.

Solution

There are two possible workarounds:

To remove a domain using the no ip domain-list <name> command that is not the most recently added domain to the domain-list, you must temporarily remove every domain in the domain-list until reaching the desired domain. Then you must add back the temporarily removed domains to the domain-list.

An alternative approach is to copy the startup-config and delete the desired domain with a text editor. Then you must load the edited startup-config back onto the device.