Symptom : When an ISSU occurs from Cisco NX-OS Release 5.2(6b) to Release 6.2(1), LLDP command-line interface (CLI) commands are not available. In this situation, LLDP is running and traffic is flowing normally even after the ISSU, but the CLI commands are not available.
This symptom might be seen when feature-set fcoe was enabled on the original image, and feature lldps commands were working in the original image. Following the ISSU to Cisco NX-OS Release 6.2.1 image, the commands are not available.
Workaround : Following the ISSU, enter the feature lldp command on the switch to make the LLDP commands available on the switch.
switch(config)# feature lldp
switch(config)# show lldp ?
Symptom : On the MDS 9513 switch, when an MSM-18/4 module boots up, it sends a request to the supervisor module to mount the modflash on the MSM-18/4 module. If there is a timeout or error in response, the following syslog message displayed:
2011 Jul 14 01:18:13 sw-dc5-br2-12 %LC_MNT_MGR-SLOT3-2-LC_MNT_MGR_ERROR: SUP mount failed. MTS receive timedout
2011 Jul 14 01:19:06 sw-dc5-br2-12 %PROC_MGR-SLOT3-2-ERR_MSG: ERROR: PID 1144 (lc_mnt_mgr) exited abnormally, exit status (0xa)
2011 Jul 14 01:19:06 sw-dc5-br2-12 %MODULE-2-MOD_MINORSWFAIL: Module 3 (serial: JAE1141ZB43) reported a failure in service lc_mnt_mgr
This issue might be seen when the supervisor module is unusually busy and cannot process the mount request from the MSM-18/4 module, or the actual mount command on the supervisor takes a long time.
Workaround : Reload the MSM-18/4 module in the same slot/module where the modflash mount failed. A request will be sent to the supervisor to mount the modflash.
Symptom : If beaconing is configured on some ports, they might stop blinking after a supervisor switchover or module reload.
Condition: This might be seen following a supervisor switchover or module reload.
Workaround : None.
Symptom: After an In Service Software Upgrade (ISSU), In Service Software Downgrade (ISSD), or supervisor switchover, devices fail to FLOGI into the switch, and the following error is logged in the syslog:
%FLOGI-1-MSG_FLOGI_REJECT_FCID_ERROR after upgrade/switchover
Condition: This situation occurs if one or all of the following occur:
1. The Max flogi key is greater than 65535. The key can get this high if there are repeated FLOGIs on an interface. After the key exceeds 65535, this issue occurs. However, this situation does not impact end devices.
2. If a supervisor switchover, such as ISSU, ISSD, or system switchover occurs when the key is greater than 65535, Fibre Channel Identifiers (FC IDs) can be dropped from the FLOGI table. The end devices continue to function normally until they are logged out and then attempt to relogin.
3. If after both 1 and 2 above have occurred and then an end device is rebooted on the affected interface, that end device might not be able to log back in.
Workaround: You must first resolve the issue with the device on the interface with the Max flogi key over 65535, such as FLOGI rejects or port security, to prevent the FLOGI key from incrementing.
If the Max flogi key value is greater than 65535 before any supervisor switchover, ISSU, or ISSD, use the shutdown and then no shutdown command on the interface. Consequently, the Max flogi key value must be checked before any supervisor switchover. However, if the supervisor switchover has already occurred and logging in are failing, you must follow either of these steps:
– Contact Cisco TAC to implement a nondisruptive recovery. This requires special files not accessible to customers.
– Suspend the VSAN and wait for 5 minutes and then unsuspend the VSAN of the affected devices on the switch. This action is disruptive to all devices in that VSAN connected to this switch.
More Information: For detailed information about this issue, see the General Upgrading Guidelines.
Symptom: One of the symptoms are observed:
– A zone member goes offline.
– A Registered State Change Notification (RSCN) is not sent to zone members.
– The device-mapping entry, port world wide name (pWWN) associated to a device alias is not displayed in the Dynamic Port VSAN Membership (DPVM) database
– The show running-configuration ivr command does not display the changes when a device alias member in an Inter-VSAN routing zone (IVR zone) is renamed.
– The port world wide name (pWWN) associated with the device alias is found dissociated in the port security database.
– A device alias member is not found in the port security database.
Condition: This situation occurs when all or one of the conditions is met:
– A user attempts to perform a device-alias operation in batch, such as renaming an offline device alias to an existing online device alias or vice versa.
– A device alias was renamed.
– A device alias was deleted and an existing device alias is renamed to the deleted device alias in the same commit.
– A device alias, which is not configured, resides in the DPVM database and an online device is renamed to the former.
– The IVR distribute option is enabled and the device alias is in enhanced mode, and the changes in a device alias is not updated to the IVR running configuration.
Workaround: Add the offline member to the device alias database, revert to the previous name if you have renamed a device alias, and flap the ports that are connected to the affected zone member.
Symptom: Although the http-server feature is enabled by default, Element Manager cannot be downloaded.
Condition: This situation occurs on the Cisco MDS 9710 Director running MDS NX-OS Release 6.2(3) software.
Workaround: You must enable the HTTP server by using the feature http-server command.
Symptom : In a virtual SAN (VSAN), the inter-switch link (ISL) might fail after entering the suspend command, followed by the no suspend command.
Condition: This situation occurs if the no suspend command is entered after a VSAN suspend operation.
Workaround: After entering the suspend command, wait at least from 5 to 15 minutes and then use the no suspend command.
Symptom: Traffic between two Cisco MDS 9250i switches might stop when write acceleration is enabled during a traffic flow.
Condition: This situation occurs if there are more than 11 tunnels.
Workaround: Disable the write acceleration feature on all of the tunnels or move all tunnels in a port channel.
Symptom: Several control protocols are impacted because of the FCoE data traffic congestion in the traffic flows passing through or originating or terminating on a Cisco MDS 9250i switch that runs the default 7e network-qos policy.
Condition: This is a known limitation with 7e policy. With the 7e template, all control and data FCoE traffic is sent to a single queue. When congestion in the network is present, in addition to data packets, control packets also are impacted, which results in timeouts and drops for several control protocols. Control protocols might display errors.
Workaround: Use the 6e template throughout fabric so that control and data traffic are placed in different queues and do not impact each other.
Symptom: After a switch upgrade to NX-OS 6.2(1) or later, a previously working AAA authenticated user who is configured for non network-operator privileges (such as network-admin) only receives network-operator privileges. This user is no longer able to configure the switch via CLI or SNMP.
The CLI user will show as having 'network-operator' role:
switch# show user-account fieldsupport
If the SNMP user exists, it will show as having 'network-operator' role:
switch# show snmp user fieldsupport
User Auth Priv(enforce) Groups
____ ____ _____________ ______
fieldsupport md5 des(no) network-operator
Condition: This issue only affects logins that meet all of the following conditions:
1) are logins to MDS switches
2) are authenticated remotely via RADIUS
3) have multiple vendor-specific attributes (VSAs) defined as a single Cisco-AV Pair, for example, shell and SNMP version 3 settings:
shell:roles="operations-user fieldsupport" snmpv3:auth=SHA priv=AES-128
This issue does not occur if the 'shell:roles' VSA is defined alone (even with multiple roles assigned).
Workaround: On the AAA server, create a separate RADIUS policy for NX-OS 6.2(x) users that splits Cisco-AV Pairs into true attribute pairs. For example:
Cisco-AVPair #1: shell:roles="operations-user fieldsupport
Cisco-AVPair #2: snmpv3:auth=SHA priv=AES-128
Assign this policy conditionally on the requesting RADIUS client IP address (that is, a Cisco MDS switch mgmt0 IP address). Continue to use the original policy with the old format for RADIUS authentication requests from switches running NX-OS earlier than 6.2(1).
If the RADIUS server does not support conditional assignment of policies by RADIUS client IP address then an alternate method is possible. Create a local user on the switch with local role assignment which will override the remotely supplied role using the following commands:
switch(config)# no username <userid>
switch(config)# username <userid> password ! role fieldsupport
Further Problem Description: This issue was introduced in NX-OS release 6.2(1) due to changes to make RADIUS VSA handling consistent across NX-OS platforms.
Symptom: On the Cisco MDS 9250i switch, if the Transmission Control Protocol (TCP) connections are set to 5, tape acceleration cannot be enabled.
Condition: The tcp-connections command is used to set the TCP connection to 5.
Workaround: Set TCP connections to 2.
Symptom: Inserting and removing an SFP in quick succession might cause the read operation to fail before completion with NACK errors, and the searching for supported speeds fails. This situation might prevent a port from coming up and the following error appears:
speed not supported by transceiver
Condition: This situation occurs with specific SFPs during removal and reinsertion.
Workaround: Avoid quickly removing and inserting an SFP. After removing an SFP, wait for few seconds before reinserting it.
Symptom: The Switched Port Analyzer (SPAN) feature does not work after a member of a port channel is removed.
Condition: This situation occurs only when the monitor session source is a port channel.
Workaround: Restore the member in the port channel and then create a monitor session and add the port channel.
Symptom: The In-Service Software Upgrade (ISSU) or In-Service Software Downgrade (ISSD) of more than one Cisco MDS 9250i switch that are connected to one another might cause the VE/E links between them to go down.
Condition: This situation occurs if an ISSU or ISSD is simultaneously performed on Cisco MDS 9250i switches that are interconnected.
Workaround: Perform an ISSU or ISSD in a progressive method. For example, after the upgrade or downgrade completes on one switch, move on to another.
Symptom: On the Cisco MDS 9250i switch that is connected to a Cisco Nexus 5000 switch for virtual Fibre Channel links, an In-Service Software Upgrade (ISSU) operation becomes a disruptive process.
Condition: This situation occurs if the Cisco Nexus 5000 switch in Nport virtualization (NPV) mode is connected to the Cisco MDS 9250i switch and the host and or target is connected to the Cisco Nexus 5000 switch.
Workaround: Before an ISSU, set the FCoE fka-adv-period on the MDS 9250i switch to the maximum.
Symptom: If there is a Switched Port Analyzer (SPAN) session to inband and the switch is reloaded, the SPAN session becomes inactive, and packets are not spanned to the SPAN destination.
Condition: A Cisco MDS switch is reloaded when a SPAN session exists from an inband port.
Workaround: Delete the SPAN session and add the session again.
Symptom: During an In-Service Software Upgrade (ISSU) the Cisco MDS 9250i switch displays the following errors:
2013 Oct 10 14:09:07 sw234-9250i %ACLTCAM-2-ACL_TCAM_PHY_TCAM_WRITE_FAILED: Value write to hardware TCAM failed(ASIC: 1, Input TCAM, Address: 775, Num Entries: 1, Error: Broken pipe).
2013 Oct 10 14:09:07 sw234-9250i %ACLTCAM-2-ACL_TCAM_PHY_TCAM_WRITE_FAILED: Value write to hardware TCAM failed(ASIC: 2, Input TCAM, Address: 609, Num Entries: 1, Error: Broken pipe).
Condition: The Cisco MDS 9250i switch has 200 I/O Accelerator (IOA) disk flows.
More information: The Cisco MDS 9250i switch supports up to 180 disk flows.
Symptom: The Cisco MDS 9250i switch displays a powered-off power supply module as redundant.
Condition: Two supply modules are operational and one is in powered off.
Workaround: Ensure that all three supply modules are operational.
Symptom: For a Cisco MDS 9250i switch, an FCoE Ethernet Interface Monitor window displays Rx and Tx as 0.
Condition: This situation occurs when the interval for monitoring the Ethernet interface is set to 5 seconds in Data Center Network Manager (DCNM).
Workaround: Set the interval for monitoring an Ethernet interface to 10 or higher.
Symptom: On a Cisco MDS 9000 switch, when a fan fails, the board LED and fan LED turn red.
Condition: This situation occurs when one of the fans fails.
More information: If a fan LED and the board LED are red, it is a fan failure. If only the board LED is red, it indicates that shutdown is in progress because of an over temperature condition.
Symptom: An operational FCIP tunnel resets randomly. The following type of syslog messages are seen:
%ETHPORT-5-IF_DOWN_SOFTWARE_FAILURE: Interface IPStorage1/2 is down (Port software failure)
%PORT-5-IF_DOWN_SOFTWARE_FAILURE: %$VSAN 1%$ Interface fcip2 is down (Port software failure)
Condition: This issue occurs only on the Cisco MDS 9250i switch if IP compression is enabled on an FCIP tunnel and data is being sent from the port. All IP compression modes are affected.
Workaround: Disable IP compression on both ends of the affected FCIP link.
Symptom : A Bad IPv6 host address error appears when the snmp-server hostname is configured instead of the IP address. This issue occurs when the domain name and the nameserver IP address are configured.
Workaround : Configure the IP addresses instead of the hostname.
Symptom The Cisco MDS 9710 Director does not allow a copy running saving configuration and a switch reload operation.
Condition: Active Fibre Channel Redirect (FC-Redirect) configurations are present in Cisco MDS 9710 Director.
Workaround : Remove the Cisco MDS 9710 Director from the fabric.
Symptom: The Cisco MDS 9250i switch incorrectly displays that the Ternary Content Addressable Memory (TCAM) is full even if enough memory is available.
Condition: This situation occurs when the IOA flows go into the security region and some TCAM entries are deleted and added.
Symptom : The security service crashes when configuring an SSH authentication key.
Configuring SSH keys multiple times within 10 minutes results in a HAP reset that resets the active supervisor.
Condition : This issue intermittently occurs when configuring an SSH authentication key.
Workaround : To avoid the supervisor reset, do not configure more than 2 SSH keys per 10 minutes.
Symptom : An egress FCoE interface log output discards during congestion even though pause frames are sent upstream on the ingress interface. Pause frames received on the egress interface do not prevent the output discards.
Affected ingress interfaces can be identified when the 'ENABLED' field is 1 in the output of the following module-level command:
s how hardware internal qengine inst inst-num table vq_voq_td
where inst-num = quotient of ((the port number - 1) / 4. For example, to verify Ethernet1/1 is affected using “slot 1” and “inst 0” as arguments to the above command:
switch# slot 1 show hardware internal qengine inst 0 table vq_voq_td | include "port|ENABLE|^0"
INDEX QUEUE PKT TYPE VL THRESHOLD ENABLE
Condition : This issue only applies to interfaces with a “no drop” CoS, that is, FCoE interfaces. An interface will be affected by this issue only after a supervisor switchover (this includes ISSU/ISSD switchovers) and then the interface flaps for any reason (this includes moving the interface into a port channel).
For Nexus 7000/7700 switches, the first affected release is Cisco NX-OS Release 6.2(2).
For MDS 9500/9700 switches, the first affected release is Cisco MDS NX-OS release 6.2(7).
Workaround : To nondisruptively restore the “no drop” functionality, set the priority flow control to “on” and back to “auto” for each affected ingress interface. If the interface is a member of a port channel then the change should be done at the port channel interface level. For example:
switch(config-if)# interface port-channel 1
switch(config-if)# priority-flow-control mode on
switch(config-if)# priority-flow-control mode auto
The above workaround can only be applied to interfaces which are up. This will restore the potency of pause frames on the Ethernet interfaces. However, further port flaps will cause the issue to recur on the interface.
Further Problem Description: By default, FCoE traffic is no-drop class and can be affected by this issue. Also, congestion is usually found in network designed to be oversubscribed or when slow drain devices are present in a network. To recover permanently and nondisruptively, follow these steps:
1. Apply the priority-flow-control mode on to all affected interfaces
2. Upgrade the system to a fixed version of NX-OS
3. Apply the priority-flow-control mode auto to all the previously affected interfaces.
Symptom : Users remotely authenticated by RADIUS or TACACS+ cannot login to the system after ISSU. Also, the aaa group configuration has a deadtime greater than the maximum of 1440 minutes, for example:
switch# show running-config tacacs+
aaa group server tacacs+ TACACS_GROUP
Conditions : This issue only occurs for RADIUS or TACACS+ server groups.
Workaround : To recover after this issue has occurred:
1. Use a local account to login then reestablish a connection to the aaa servers with the one of the following commands:
test aaa server tacacs+ a.b.c.d
test aaa server radius a.b.c.d
This must be done for all server addresses in the affected group.
2. Reconfigure the deadtime of the server group to a value within the range of 0 to 1440. After the deadtime is within range, it can be removed with the no deadtime command.
To prevent this issue before an upgrade, initialise the deadtime and save the config, then remove it and save the config again. For example, for TACACS+:
(config)# tacacs-server deadtime 1
(config)# aaa group server tacacs tacacsgroup
(config)# no tacacs-server deadtime 1
(config)# aaa group server tacacs tacacsgroup
Symptom : The RSCN or ZONE service crashes with the following syslog message:
%SYSMGR-2-SERVICE_CRASHED: Service "rscn" (PID 5405) hasn't caught signal 11 (core will be saved).
%SYSMGR-2-SERVICE_CRASHED: Service "zone" (PID 5430) hasn't caught signal 6 (core will be saved)
A Cisco MDS 9700 switch can incur a switchover, however in most cases, the crash occurs again before the standby is available and the dual supervisor switch will reload.
----- reset reason for Supervisor-module 5 (from Supervisor in slot 5) ---
1) At 161169 usecs after Thu Dec dd hh:mm:ss 2014
Reason: Reset triggered due to HA policy of Reset
----- reset reason for Supervisor-module 6 (from Supervisor in slot 6) ---
1) At 422003 usecs after Thu Dec dd hh:mm:ss 2014
Reason: Reset triggered due to HA policy of Reset
Condition : This issue occurs only when "port" format RSCNs are configured and an RSCN is sent on the relevant VSAN. RSCNs are sent, for example, after activating zoneset changes or a link changing state. Further, only the following platforms are affected:
Cisco MDS 9710 Switch
Cisco MDS 9706 Switch
Cisco Nexus 7000 Switch
Cisco Nexus 7710 Switch
This issue does not occur when RSCNs are sent with "fabric" format.
Workaround : Use the default RSCN address format by removing the following lines from the switch configuration:
no zone rscn address-format port vsan
Note that some end devices may not support receiving RSCNs in this format.
Further Problem Description: This wrong data is constructed by the zone server. It can corrupt its own heap while creating the payload to put into MTS.
The crash can be either in the zone server or RSCN. It is just which module runs into the issue first. The fix that went in is to prevent both.