Symptom: End device does not have a valid fabric login even though switch sends FLOGI accept to the end device.
Condition: This issue only occurs on MDS 9700 switches on ports where:
1. end devices are connecting at 16 G, and
2. DPVM is enabled on the port, and
3. the port and dynamic VSANs are different.
Symptom: The Storage Media Encryption (SME) cluster creation fails and the following error message appears:
Log file is:\Documents and Settings\username\ciscosmartcardapplet.log
[Wed Mar 21 15:50:40 IST 2012]:Thread[thread applet-com.cisco.dcbu.web.client.sme.applet.SmartcardApplet-1,4,https://172.16.1.1/sme/jsp/-threadGroup]:Initializing...
[Wed Mar 21 15:50:42 IST 2012]:Thread[thread applet-com.cisco.dcbu.web.client.sme.applet.SmartcardApplet-1,4,https://172.16.1.1/sme/jsp/-threadGroup]:Module is
Module Name: gclib.dll [Wed Mar 21 15:50:42 IST 2012]:Thread[Thread-11,4,https://172.16.1.1/sme/jsp/-threadGroup]:
[Wed Mar 21 15:50:42 IST 2012]:Thread[Thread-11,4,https://172.16.1.1/sme/jsp/-threadGroup]:Looking for smartcard reader...
[Wed Mar 21 15:50:51 IST 2012]:Thread[Thread-11,4,https://172.16.1.1/sme/jsp/-threadGroup]:Exception reading smartcard:CKR_OPERATION_NOT_INITIALIZED
[Wed Mar 21 15:50:51 IST 2012]:Thread[Thread-11,4,https://172.16.1.1/sme/jsp/-threadGroup]:<fieldset class="cuesStatusBoxError"
Condition: This issue occurs if you create an SME cluster for recovery option as 2 of 3 by using Mozilla Firefox, version 11.0 or Microsoft Internet Explorer, version 8, on the Microsoft XP operating system, and then rekey the master key.
Symptom : When an In-Service Software Upgrade (ISSU) is performed from Cisco NX-OS Release 5.2(6b) to Release 6.2(1), LLDP command-line interface (CLI) commands are not available. In this situation, LLDP is running and traffic is flowing normally even after the ISSU, but the CLI commands are not available.
This symptom might be seen when feature-set fcoe was enabled on the original image, and feature lldps commands were working in the original image. Following the ISSU to Cisco NX-OS Release 6.2.1 image, the commands are not available.
Workaround : Following the ISSU, enter the feature lldp command on the switch to make the LLDP commands available on the switch.
switch(config)# feature lldp
switch(config)# show lldp ?
Symptom: Address Resolution Protocol (ARP) process fails for IP over Fibre Channel (IPFC).
Condition: This issue occurs when the Cisco MDS 9000 switches are connected through virtual Fibre Channel (vFC) interfaces.
Symptom: The Switched Port Analyzer (SPAN) functionality does not work after adding a VSAN filter and then removing the filter.
Condition: This situation occurs when the source is an F_port and the destination is a local SD_ port.
Symptom : On the MDS 9513 switch, when an MSM-18/4 module boots up, it sends a request to the supervisor module to mount the modflash on the MSM-18/4 module. If there is a timeout or error in response, the following syslog appears:
sw-dc5-br2-12 %LC_MNT_MGR-SLOT3-2-LC_MNT_MGR_ERROR: SUP mount failed. MTS receive timedout
sw-dc5-br2-12 %PROC_MGR-SLOT3-2-ERR_MSG: ERROR: PID 1144 (lc_mnt_mgr) exited abnormally, exit status (0xa)
sw-dc5-br2-12 %MODULE-2-MOD_MINORSWFAIL: Module 3 (serial: JAE1141ZB43) reported a failure in service lc_mnt_mgr
This issue might be seen when the supervisor module is unusually busy and cannot process the mount request from the MSM-18/4 module, or the actual mount command on the supervisor takes a long time.
Workaround : Reload the MSM-18/4 module in the same slot/module where the modflash mount failed. A request will be sent to the supervisor to mount the modflash.
Symptom : If beaconing is configured on some ports, LEDs might stop blinking after a supervisor switchover or module reload.
Condition: This might be seen following a supervisor switchover or a module reload.
Workaround : None.
Symptom : In a virtual SAN (VSAN), the inter-switch link (ISL) might fail after entering the suspend command, followed by the no suspend command.
Condition: This situation occurs if the no suspend command is entered after a VSAN suspend operation.
Workaround: After entering the suspend command, wait for at least 5 to 15 minutes and then use the no suspend command.
Symptom: Traffic between two Cisco MDS 9250i switches might stop when the write acceleration feature is enabled during traffic flow.
Condition: This situation occurs if there are more than 11 tunnels.
Workaround: Disable the write acceleration feature on all of the tunnels or move all tunnels to a PortChannel.
Symptom: Several control protocols are impacted because of the FCoE data traffic congestion in the traffic flows passing through, or originating, or terminating on a Cisco MDS 9250i switch that runs the default 7e network-qos policy.
Condition: This is a known limitation with 7e policy. With the 7e template, all control and data FCoE traffic is sent to a single queue. When congestion in the network is present, in addition to data packets, control packets also are impacted, which results in timeouts and drops for several control protocols. Control protocols might display errors.
Workaround: Use the 6e template throughout the fabric so that the control and data traffic are placed in different queues and do not impact each other.
Symptom: Tape acceleration cannot be enabled on the Cisco MDS 9250i switch if the Transmission Control Protocol (TCP) connections are set to 5.
Condition: The tcp-connections command is used to set the TCP connection to 5.
Workaround: Set TCP connections to 2.
Symptom: Inserting and removing an SFP in quick succession might cause the read operation to fail before completion with NACK errors, and the searching for supported speeds fails. This situation might prevent a port from coming up and the following error appears:
speed not supported by transceiver
Condition: This situation occurs with specific SFPs during removal and reinsertion.
Workaround: Avoid quickly removing and inserting an SFP. After removing an SFP, wait for few seconds before reinserting it.
Symptom: The In-Service Software Upgrade (ISSU) or In-Service Software Downgrade (ISSD) of more than one Cisco MDS 9250i switch that are connected to one another might cause the VE/E links between them to go down.
Condition: This situation occurs if an ISSU or ISSD is simultaneously performed on Cisco MDS 9250i switches that are interconnected.
Workaround: Perform an ISSU or ISSD in a progressive method. For example, after the upgrade or downgrade completes on one switch, move on to another.
Symptom: During an In-Service Software Upgrade (ISSU), the Cisco MDS 9250i switch displays the following errors:
2013 Oct 10 14:09:07 sw234-9250i %ACLTCAM-2-ACL_TCAM_PHY_TCAM_WRITE_FAILED: Value write to hardware TCAM failed(ASIC: 1, Input TCAM, Address: 775, Num Entries: 1, Error: Broken pipe).
2013 Oct 10 14:09:07 sw234-9250i %ACLTCAM-2-ACL_TCAM_PHY_TCAM_WRITE_FAILED: Value write to hardware TCAM failed(ASIC: 2, Input TCAM, Address: 609, Num Entries: 1, Error: Broken pipe).
Condition: The Cisco MDS 9250i switch has 200 I/O Accelerator (IOA) disk flows.
More information: The Cisco MDS 9250i switch supports up to 180 disk flows.
Symptom: The Cisco MDS 9250i switch displays a powered-off power supply module as redundant.
Condition: Two supply modules are operational and one is powered off.
Workaround: Ensure that all three supply modules are operational.
Symptom The Cisco MDS 9710 Director does not allow a copy running saving configuration and a switch reload operation.
Condition: Active Fibre Channel Redirect (FC-Redirect) configurations are present in Cisco MDS 9710 Director.
Workaround : Remove the Cisco MDS 9710 Director from the fabric.
Symptom: If both the Fibre Channel and Fibre Channel over Ethernet (FCoE) links with the same equal-cost multipath (ECMP) entries) between two switches, traffic does not flow.
Workaround: VLAN-VSAN mapping should be numerically identical when ECMPs exist with a mix of FC and FCoE paths. For example, map VSAN 10 to VLAN 10, VSAN 11 to VLAN 11, and so on.
Symptom: Activation zone mode change or any such change operation failing is Stage Fabric Configuration (SFC) stage with the following error:
Condition: When you configure a member of type symbolic-nodename with a name that has more than 240 characters in it and attempt to activate the zone set or perform a zone mode change.
Workaround: Configure zone members of type symbolic-nodename with the names that have fewer than 240 characters.
Symptom: The following priority flow control (PFC) incompatibility warnings are logged to syslog even though there are no compatibility issues between the FCoE peers on the specified links:
%ETH-QOS-2-QOSMGR_DCBXP_PFC_CMP_FAIL_MSG: Ethernet1/37 - qos config 'Priority-flow-control' not compatible with the peer
Condition: These messages are incorrectly logged during In Service Switch Upgrade (ISSU) or In Service Switch Downgrade (ISSD) on for FCoE interfaces even though there are no compatibility issues.
More Information: This is a cosmetic issue.
Symptom: On a Cisco MDS 9500 Series switch with a dual supervisor, after switchover to standby supervisor, the ilc_helper process crashes.
Condition: This situation occurs during the Data Mobility Manager (DMM) operations after a switchover.
Workaround: Purge the module and reenable the Storage Service Interface (SSI) software and configure the DMM again.
Symptom: The RewriteEngineLoopback test fails and get error is disabled on the Cisco MDS 9710 Director with 48-port 10-Gigabit FCoE.
Condition: An In-Service Software Upgrade (ISSU) is performed from any of Cisco NX-OS Releases versions 6.2.1, 6.2.3, and 6.2.5 to 6.2.7 or sequentially from 6.2.1 to 6.2.3, and then to 6.2.7.
Workaround: If there are multiple fabric modules in the chassis, reload each fabric module one after the other. For example, reload the fabric module 1, wait for the module to complete the reload process, wait for 2 to 3 minutes and then reload the next fabric module. You must not reload all fabric modules at the same time. After reloading all fabric modules, clear the diagnostic result using the diagnostic clear result module command.
Symptom: The Cisco MDS 9250i switch incorrectly displays that the Ternary Content Addressable Memory (TCAM) is full even if enough memory is available.
Condition: This situation occurs if the IOA flows go into the security region and some TCAM entries are deleted or added.
Symptom: Ethernet SPAN does not capture the egress traffic from a Fibre Channel node to the FCoE nodes.
Condition: This situation occurs when Ethernet SPAN is configured on a Fibre Channel node of an MDS 9000 Series switch and the corresponding FCoE node is receiving traffic from the Fibre Channel node.
Symptom: After changing the device alias configuration, the I/O Accelerator (IOA) flows remain unchanged until the reactivation of the zone set.
Condition: This situation occurs during a device-alias add, delete, or rename operation.
Symptom: Packets drop on the egress of the Cisco MDS 48-Port 10-Gigabit Fibre Channel over Ethernet module because of the congestion timeout check.
Condition: If the congestion timeout value is set to 100 ms for edge ports on the Cisco MDS 9710 Director with the Cisco MDS 48-Port 10-Gigabit Fibre Channel over Ethernet module, it causes the FCoE ISL to flap continuously.
Workaround: Set the timeout value to 500 ms.
More Information: Only the 500 ms timeout value is supported.
Symptom: Fibre Channel trace does not work on the Cisco MDS 48-Port 10-Gigabit Fibre Channel over Ethernet (FCoE) module.
Workaround: Use Path Trace to trace the path between two domains or a domain and an end device.
Symptom: The server interface connected to the N_port virtualization (NPV) switch might flap if 5 or more FDISC rejects are sent from the core switch within 10 seconds.
Condition: This occurs if the core switch rejects NPV FDISC logins in a short time or if the server is bursting logins that are getting rejected.
Workaround: Make sure that no rejects occur for NPV FDISC logins on a server interface that is connected to the NPV switch.
Symptom: During a high CPU utilization, if any access to the bootflash or compact flash occurs, USDs might crash, and this can result in switch reload.
Condition: This situation occurs during the scale configuration or a high CPU utilization if any copy to bootflash or access to compact flash causes the heartbeat failure.
Workaround: Based on the CPU utilization, perform the access to the bootflash.
Further Problem Description: Copying an image to bootflash causes more IDE interrupts, and the CPU processes the interrupts and does not give control to other USDs to communicate with the watchdog, and this issue causes the USD to crash.
Symptom: On the Cisco MDS 9710 Director with the 48-Port 10-Gigabit Fibre Channel over Ethernet module, packet drop occurs when the FCoE port generates pause frame during a congestion.
Condition: When the distance between Cisco MDS switches are distantly located and a congestion occurs.
Symptom: On the Cisco MDS 48-Port 10-Gigabit FCoE module, the ECMP fails with multiple FCoE ISL.
Condition: This situation occurs if one of these conditions is true:
– Two switches with multiple FCoE ISLs are connected. For example, the host is connected to one switch and the target on another switch.
– Multiple IVR flows are configured for any of these host-target combinations:
FC host to FCoE target, FC host to FC target, and FCoE host to FC target
Symptom: The normal operation of Data Mobility Manager (DMM) might be affected as follows:
– DMM Initiator-Target (IT) flows for migration jobs cannot be configured.
– DMM IT flows might encounter I/O traffic issues if either the initiator or target are physically moved to another switch.
Condition: This situation occurs if at least one of the switches in the fabric has the FCNS bulk notification feature enabled or if the DMM-specific Initiator or Target are moved to a switch that has the FCNS bulk notification feature enabled.
Workaround: Disable the FCNS bulk notification feature on the appropriate switches.
Symptom: After a switch upgrade to NX-OS 6.2(1) or later, a previously working AAA authenticated user who is configured for non network-operator privileges (such as network-admin) only receives network-operator privileges. This user is no longer able to configure the switch via CLI or SNMP.
The CLI user will show as having 'network-operator' role:
switch# show user-account fieldsupport
If the SNMP user exists, it will show as having 'network-operator' role:
switch# show snmp user fieldsupport
User Auth Priv(enforce) Groups
____ ____ _____________ ______
fieldsupport md5 des(no) network-operator
Condition: This issue only affects logins that meet all of the following conditions:
1) are logins to MDS switches
2) are authenticated remotely via RADIUS
3) have multiple vendor-specific attributes (VSAs) defined as a single Cisco-AV Pair, for example, shell and SNMP version 3 settings:
shell:roles="operations-user fieldsupport" snmpv3:auth=SHA priv=AES-128
This issue does not occur if the 'shell:roles' VSA is defined alone (even with multiple roles assigned).
Workaround: On the AAA server, create a separate RADIUS policy for NX-OS 6.2(x) users that splits Cisco-AV Pairs into true attribute pairs. For example:
Cisco-AVPair #1: shell:roles="operations-user fieldsupport
Cisco-AVPair #2: snmpv3:auth=SHA priv=AES-128
Assign this policy conditionally on the requesting RADIUS client IP address (that is, a Cisco MDS switch mgmt0 IP address). Continue to use the original policy with the old format for RADIUS authentication requests from switches running NX-OS earlier than 6.2(1).
If the RADIUS server does not support conditional assignment of policies by RADIUS client IP address then an alternate method is possible. Create a local user on the switch with local role assignment which will override the remotely supplied role using the following commands:
switch(config)# no username <userid>
switch(config)# username <userid> password ! role fieldsupport
Further Problem Description: This issue was introduced in NX-OS release 6.2(1) due to changes to make RADIUS VSA handling consistent across NX-OS platforms.
Symptom: If a nondefault logging level is set for LLDP, and if a switchover occurs, the running configuration will not show the logging level lldp 5.
Condition: This situation occurs when LLDP sets a nondefault logging level by using the logging level lldp command and if a switchover occurs.
Symptom: One of the symptoms is observed:
– A zone member goes offline.
– A Registered State Change Notification (RSCN) is not sent to zone members.
– The device-mapping entry, port world wide name (pWWN) associated to a device alias is not displayed in the Dynamic Port VSAN Membership (DPVM) database.
– The show running-configuration ivr command does not display the changes when a device alias member in an Inter-VSAN Routing zone (IVR zone) is renamed.
– The port world wide name (pWWN) associated with the device alias is found dissociated in the port security database.
– A device alias member is not found in the port security database.
Condition: This situation occurs when all or one of the conditions is met:
– A user attempts to perform a device-alias operation in batch, such as renaming an offline device alias to an existing online device alias or vice versa.
– A device alias was renamed.
– A device alias was deleted and an existing device alias is renamed to the deleted device alias in the same commit.
– A device alias, which is not configured, resides in the DPVM database, and an online device is renamed to the former.
– The IVR distribute option is enabled and the device alias is in enhanced mode, and the changes in a device alias are not updated to the IVR running configuration.
Workaround: Add the offline member to the device alias database, revert to the previous name if you have renamed a device alias, and flap the ports that are connected to the affected zone member.
Symptom: When you use the show topology and show fcs ie commands, a few attributes such as switch name, and management interface address are not displayed.
Condition: When the initial discovery is incomplete either because of the congestion that results in packet drops or the target switch does not respond.
Workaround: Use the fcs start-discovery command and then use the show topology and show fcs ie command to display the topology details including the switch name, management interface address, and vendor name.
Further Problem Description: If you use the show topology command, the switch name, the data for peer interfaces are displayed appropriately.
Symptom: After an In Service Software Upgrade (ISSU), In Service Software Downgrade (ISSD), or supervisor switchover, devices fail to FLOGI into the switch, and the following error is logged in the syslog:
%FLOGI-1-MSG_FLOGI_REJECT_FCID_ERROR after upgrade/switchover
Condition: This situation occurs if one or all of the following occur:
1. The Max flogi key is greater than 65535. The key can get this high if there are repeated FLOGIs on an interface. After the key exceeds 65535, this issue occurs. However, this situation does not impact end devices.
2. If a supervisor switchover, such as ISSU, ISSD, or system switchover occurs when the key is greater than 65535, Fibre Channel Identifiers (FC IDs) can be dropped from the FLOGI table. The end devices continue to function normally until they are logged out and then attempt to relogin.
3. If after both 1 and 2 above have occurred and then an end device is rebooted on the affected interface, that end device might not be able to log back in.
Workaround: You must first resolve the issue with the device on the interface with the Max flogi key over 65535, such as FLOGI rejects or port security, to prevent the FLOGI key from incrementing.
If the Max flogi key value is greater than 65535 before any supervisor switchover, ISSU, or ISSD, use the shutdown and then no shutdown command on the interface. Consequently, the Max flogi key value must be checked before any supervisor switchover. However, if the supervisor switchover has already occurred and logging in are failing, you must follow either of these steps:
– Contact Cisco TAC to implement a nondisruptive recovery. This requires special files not accessible to customers.
– Suspend the VSAN and wait for 5 minutes and then unsuspend the VSAN of the affected devices on the switch. This action is disruptive to all devices in that VSAN connected to this switch.
More Information: For detailed information about this issue, see the General Upgrading Guidelines.
Symptom: The DCBX local information shows LLS DCBX registration when the port is in the shut state as:
Local DCBXP Control information:
Operation version: 00 Max version: 00 Seq no: 1 Ack no: 0
Subtype Version En/Will/Adv Config
006/001 is the LLS TLV.
Note All the other DCBX features are usually preregistered on a down port and not seen in the DCBX output. LLS is not deregistered on a shut port, and keeps appearing in the output.
Condition: This situation occurs when the port is in the shut state.
Symptom: While configuring the SNMP server, if you configure a host name instead of an IP address, the following error appears:
Condition: If the domain name and the name server IP are configured, and if you provide the SNMP host name.
Workaround: Instead of the host name, add the IP address of the host in the SNMP configuration.
Symptom: A user with the priv-14 role does not inherit rules from the priv-0 through priv-13 roles.
Condition: This situation occurs during a normal operation.
Workaround: Create users with roles from priv-0 to priv-13 and log in, or log in to the switch using AAA authentication where users with certain user levels are identified.
Symptom: Credit Monitor disabled on 9710 FC ports running NX-OS 6.2(7) when the device FLOGIs in with less than or equal to 8 credits. This can lead to other ports failing with:
%PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 1%$ Interface fc1/1 is down (Link failure Link Reset failed nonempty recv queue) port-channel11.
Condition: Applies to MDS 9710 running NX-OS 6.2(7) only.
Also applies to earlier NX-OS 6.2 releases if there was an In Service Software Downgrade from NX-OS 6.2(7) to that earlier release. In that case, credit monitoring will remain disabled until the device re-logs in.
Workaround: There are two known workarounds:
1. If possible have the attached device increase the number of B2B credits it allows in its FLOGI to a number that is greater than 8.
2. A TCL script is available that will non-disruptively modify the port settings on any/all F ports in the switch to enable credit monitoring.
Please contact the Cisco TAC for this.
Upgrading to Cisco MDS NX-OS Release 6.2(9) or later will completely resolve the issue.
Further Problem Description: The following linecard command will show that credit monitoring is disabled on ports 19 and 20:
module-1# show process creditmon statistics
******************Credit Monitor Info*******************
Port Mode Monitor Cr Loss Slow-Port-Detection
---- ---- ------- -------- ------- ------
Other symptoms that might occur due to a port stuck at zero Tx credits remaining:
%DIAG_PORT_LB-2-INT_PORT_LOOPBACK_TEST_FAIL: Module:1 Test:Internal PortLoopback failed 10 consecutive times. Faulty module: affected ports:affected ports:30 Error:Loopback test failed. Packets lost on the SUP in the transmit direction
%DIAG_PORT_LB-2-SNAKE_TEST_LOOPBACK_TEST_FAIL: Module:1 Test:SnakeLoopback failed 10 consecutive times. Faulty module: affected ports:affected ports:1-48 Error:Test Failed, Could not identify the Faulty Device
Symptom : The security service crashes when configuring an SSH authentication key.
Configuring SSH keys multiple times within 10 minutes results in a HAP reset that resets the active supervisor.
Condition : This issue intermittently occurs when configuring an SSH authentication key.
Workaround : To avoid the supervisor reset, do not configure more than 2 SSH keys per 10 minutes.
Symptom : An egress FCoE interface log output discards during congestion even though pause frames are sent upstream on the ingress interface. Pause frames received on the egress interface do not prevent the output discards.
Affected ingress interfaces can be identified when the 'ENABLED' field is 1 in the output of the following module-level command:
show hardware internal qengine inst inst-num table vq_voq_td
where inst-num = quotient of ((the port number - 1) / 4. For example, to verify Ethernet1/1 is affected using “slot 1” and “inst 0” as arguments to the above command:
switch# slot 1 show hardware internal qengine inst 0 table vq_voq_td | include "port|ENABLE|^0"
INDEX QUEUE PKT TYPE VL THRESHOLD ENABLE
Condition : This issue only applies to interfaces with a “no drop” CoS, that is, FCoE interfaces. An interface will be affected by this issue only after a supervisor switchover (this includes ISSU/ISSD switchovers) and then the interface flaps for any reason (this includes moving the interface into a port channel).
For Nexus 7000/7700 switches, the first affected release is Cisco NX-OS Release 6.2(2).
For MDS 9500/9700 switches, the first affected release is Cisco MDS NX-OS release 6.2(7).
Workaround : To nondisruptively restore the “no drop” functionality, set the priority flow control to “on” and back to “auto” for each affected ingress interface. If the interface is a member of a port channel then the change should be done at the port channel interface level. For example:
switch(config-if)# interface port-channel 1
switch(config-if)# priority-flow-control mode on
switch(config-if)# priority-flow-control mode auto
The above workaround can only be applied to interfaces which are up. This will restore the potency of pause frames on the Ethernet interfaces. However, further port flaps will cause the issue to recur on the interface.
Further Problem Description: By default, FCoE traffic is no-drop class and can be affected by this issue. Also, congestion is usually found in network designed to be oversubscribed or when slow drain devices are present in a network. To recover permanently and nondisruptively, follow these steps:
1. Apply the priority-flow-control mode on to all affected interfaces
2. Upgrade the system to a fixed version of NX-OS
Apply the priority-flow-control mode auto to all the previously affected interfaces.
Symptom : FCIP tunnels don't fully utilise the available TCP window size. This leads to underperforming FCIP tunnels that never achieve their configured bandwidth on higher latency links.When TCP send queue hits 2MB threshold, the FCIP tunnel is exerting flow control back to the FC ports utilizing the tunnel. This leads to Rx B2B credit depletion. You can see the current size of the send queue via the following command:
show ips stats tcp interface gige
Local Address Remote Address State Send-Q Recv-Q
22.214.171.124:65525 126.96.36.199:3225 ESTABLISH 2558928 0
Condition : This issue applies when:
– NX-OS versions from 6.2(5) to 6.2(9c) inclusive, and
– the platform is an MDS 9500 or MDS 9222i, and
– latency on FCIP tunnel is high enough that a TCP window size greater than 2 MB is required
Workaround : Increase the number of FCIP tunnels.
Further Problem Description : The current TCP window size is displayed in the "Peer receive window" field of the show interface fcip command:
Peer receive window: Current: 4162 KB, Maximum: 4220 KB, Scale: 11
Symptom : System switchover takes too long a time [around 20 minutes] and standby gets reloaded before becoming HA-standby. Flogi process will consume 90 - 100% of CPU cycles during PSS restore and due to this, any end device activity like login and logout will not be processed by Flogi process and eventually gets timed out.
This may cause the flogi process to crash and if the standby is not in ha-standby status, the switch will reload.
Condition : The issue will happen if all of the the below conditions are met.
– The end device must have done a continuous Login - Logout [this is not a port flap but the end device doing a login flap without the port going down] to a switch running NX-OS less than 6.2.7. (or) 5.2.8e
– Max flogi key greater than the flogi bitset (See Further Problem Description below.)
– An ISSU to any version higher than NX-OS 6.2.7 or 5.2.8e from any version less than NX-OS 6.2.7 or 5.2.8e occur
– A Supervisor Switch-over happens.
Workaround : The workaround to recover - is to - identify the affected ports and flap them.
Note: Contact Cisco TAC to efficiently identify affected ports.
Further Problem Description : When there is repeated Flogi and Logo on an interface, the “Max Flogi key” count which is 16-bit will increment for every Login - Logout flap and after reaching 64k will wrap around and start from Zero. On the other hand, the “flogi bitset” count reaches 255 for every Login - Logout flap and after reaching 255 will stay put. When the “Max Flogi key” count is less than the “flogi bitset” the PSS recovery takes more time which is a bug. This issue is fixed and will not occur in Cisco MDS NX-OS Releases 6.2(7), 5.2(8e) and above - Bug CSCub40020.
If the issue has occurred in lower versions and when an ISSU to Cisco MDS Release NX-OS 6.2(7) or 5.2(8e) is done, it tries to recover the “Max flogi key”. If the recovery step ends up in Max flogi key < than flogi bitset value, the PSS recovery during further Switchover will take up more time. The time consumption of PSS recovery is directly proportional to the No. of Instances where the “Max Flogi key” is less than the “flogi bitset”
That is, in a scaled environment and when more and more ports or vsans are affected the time taken for PSS recovery is exponential.
The affected interface can be found by the following CLI:
# show flogi internal info |inc fc|port-channel|key|bitset
Interface fc2/5: mode[F]  Mode: F State: UP Vsan: 1
Vsan no: 1 Max flogi key: 0x1(1) num_fl[0x1]
Interface port-channel3: mode[TF]  Mode: TF State: UP Vsan: 552
Vsan no: 1 Max flogi key: 0x40(64) num_fl[0x10]
Vsan no: 552 Max flogi key: 0x40(64) num_fl[0x5]
In the above example,
interface fc2/5 in Vsan 1 and interface port-channel3 in Vsan 552 have Max flogi key greater than the flogi bitset [0x1(1) > 0 and 0x40(64) > 4] but interface port-channel3 in Vsan 1 has Max flogi key less than the highest flogi bitset value [0x40(64) < 255].
Symptom : Users remotely authenticated by RADIUS or TACACS+ cannot login to the system after ISSU. Also, the aaa group configuration has a deadtime greater than the maximum of 1440 minutes, for example:
switch# show running-config tacacs+
aaa group server tacacs+ TACACS_GROUP
Conditions : This issue only occurs for RADIUS or TACACS+ server groups.
Workaround : To recover after this issue has occurred:
1. Use a local account to login then reestablish a connection to the aaa servers with the one of the following commands:
test aaa server tacacs+ a.b.c.d
test aaa server radius a.b.c.d
This must be done for all server addresses in the affected group.
2. Reconfigure the deadtime of the server group to a value within the range of 0 to 1440. After the deadtime is within range, it can be removed with the no deadtime command.
To prevent this issue before an upgrade, initialise the deadtime and save the config, then remove it and save the config again. For example, for TACACS+:
(config)# tacacs-server deadtime 1
(config)# aaa group server tacacs tacacsgroup
(config)# no tacacs-server deadtime 1
(config)# aaa group server tacacs tacacsgroup
Symptom : Maximum FCIP throughput on the MDS 9250i is limited to less than line rate for 10 Gbps interfaces.
Condition : This issue only applies to 10 Gbps FCIP interfaces on the MDS 9250i platform.
Workaround : There is no workaround.
Symptom : The RSCN or ZONE service crashes with the following syslog message:
%SYSMGR-2-SERVICE_CRASHED: Service "rscn" (PID 5405) hasn't caught signal 11 (core will be saved).
%SYSMGR-2-SERVICE_CRASHED: Service "zone" (PID 5430) hasn't caught signal 6 (core will be saved)
A Cisco MDS 9700 switch can incur a switchover, however in most cases, the crash occurs again before the standby is available and the dual supervisor switch will reload.
----- reset reason for Supervisor-module 5 (from Supervisor in slot 5) ---
1) At 161169 usecs after Thu Dec dd hh:mm:ss 2014
Reason: Reset triggered due to HA policy of Reset
----- reset reason for Supervisor-module 6 (from Supervisor in slot 6) ---
1) At 422003 usecs after Thu Dec dd hh:mm:ss 2014
Reason: Reset triggered due to HA policy of Reset
Condition : This issue occurs only when "port" format RSCNs are configured and an RSCN is sent on the relevant VSAN. RSCNs are sent, for example, after activating zoneset changes or a link changing state. Further, only the following platforms are affected:
Cisco MDS 9710 Switch
Cisco MDS 9706 Switch
Cisco Nexus 7000 Switch
Cisco Nexus 7710 Switch
This issue does not occur when RSCNs are sent with "fabric" format.
Workaround : Use the default RSCN address format by removing the following lines from the switch configuration:
no zone rscn address-format port vsan
Note that some end devices may not support receiving RSCNs in this format.
Further Problem Description: This wrong data is constructed by the zone server. It can corrupt its own heap while creating the payload to put into MTS.
The crash can be either in the zone server or RSCN. It is just which module runs into the issue first. The fix that went in is to prevent both.
Symptom : An FCSP-ESP enabled (encrypted) port that was working fails to come up after ISSU/ISSD followed by link flap.
Condition : This issue only affects FCSP encrypted ports on MDS 9700 DS-X9448-768K9 and MDS 9500 DS-X9248-256K9 and DS-X9232-256K9 switching modules after an ISSU or ISSD to an affected version of NX-OS.
Workaround : Only a switch reload will recover from this situation. The switch must be running a fixed release of NX-OS (NX-OS 6.2(11) or above) before the reload to prevent the issue from recurring after recovery.
None of the following steps alone will not recover the port functionality:
– - Shut/no-shut the affected port.
– - Reloading the affected linecard.
– - Removing the FCSP configuration and re-configuring FCSP.
– - Upgrading to NX-OS 6.2(11) or above.
Further Problem Description: The issue arises on affected version of NX-OS since these versions do not push the FCSP-ESP configuration to hardware. This results in a permanent FCSP-ESP configuration mismatch with the peer port. Hence the port will not come up.
Symptom : An ISL connected over a DWDM path does not reach link up state.
Condition : This issue only applies to MDS 9700 DS-X9448-768K9 modules used with some DWDM vendors.
Workaround : None.
Further Problem Description: show interface shows the link in "Link failure or not-connected" with OLS/LRR and NOS increasing in both directions.
Symptom : An ISL does not initialize quickly across a DWDM connection. The link can take minutes, hours or even days to connect. Once connected, it is stable.
Condition : This issue only applies to DS-X9248-256K9 and DS-X9232-256K9 modules when connecting an ISL over a Tellabs 7100 DWDM path.
Workaround : None.
Further Problem Description: show interface shows the link in "Link failure or not-connected" with OLS/LRR and NOS increasing in both directions.
Symptom : Whenever we give the command to max-bandwidth-mbps in m9250i, although the maximum supported speed is 10Gig, it shows 1Gig as the maximum configurable bandwidth.
Condition : There is one condition when we have IPS port speed for m9250i speed set as 1Gig, then as of now, the user has to make sure he doesn't give the speed as 10Gig. Reason being it will show the 10Gig as max-bandwidth allowed even for 1Gig speed in m9250i. As of now, we don't have the intelligence in the cli to check for what speed the port speed is set to.
Hence when Port speed = 1Gig, we should give the value of max-bandwidth-mbps = 1Gig and not as 10Gig.
Workaround : There is no workaround.
Further Problem Description: Reproducible
Symptom : Callhome stops working and callhome tests fail.
Condition : Only destination profiles of full_txt are configured.
Workaround : To prevent from hitting this defect, configure an additional destination profile that is either short_txt or XML.
To recover from this defect after it has already been hit, perform a system switchover or reload the switch.
Symptom : MDS fabric switch running in NPV mode fails to generate port-monitor alerts.
Condition : Applies to all MDS fabric switches running in NPV mode using port-monitor.
Applies to all versions prior to NX-OS 6.2(13).
Will occur only in the following conditions:
- After one or more upstream NP or TNP ports goes down and then back up.
- For each (T)NP port that flaps, one F port at the end of the range of ports
will no longer be scanned for port-monitor counter events. For example, if the
(T)NP port fc1/1 flaps then the last F port being used(ex. fc1/48) will no
longer be scanned for port-monitor counter events.
Workaround : There are two workarounds, one temporary and one permanent:
1 - Contact the TAC and they can assist with killing the port-monitor process. Once the port-monitor process restarts, all ports will be once again scanned.
This is only temporary in the sense that if an upstream (T)NP port flaps again the problem will recur.
2 - Move the (T)NP ports to the end of the ports on the switch. For example, if there are four (T)NP uplinks on a MDS 9148 or MDS 9148S, then move them to fc1/45-fc1/48. Once this has been done the problem will not recur.
Further Problem Description: The fix is integrated into NX-OS 6.2(13) and later versions.
Symptom : This issue was originally seen on a FICON enabled 9513, while attempting an ISSU from 6.2(11c) to 6.2(11e). The fcd process experienced a hap reset due to heartbeat loss, and that resulted in the ISSU being aborted.
Condition : ISSU w/ FICON enabled. Specific triggers not known at the moment.
Workaround : We have created debug plugins to clear this problem. As of now the debug plugins are available for the following.
-rw-r--r-- 1 venutumm eng 189171 2017-12-04 12:54 m9500_ficon_active_sup_dplug_6_2_11c.bin
-rw-r--r-- 1 venutumm eng 189169 2017-12-04 12:54 m9500_ficon_standby_sup_dplug_6_2_11c.bin
-rw-r--r-- 1 venutumm eng 180888 2017-12-04 12:54 m9500_ficon_active_sup_dplug_6_2_5a.bin
-rw-r--r-- 1 venutumm eng 180891 2017-12-04 12:54 m9500_ficon_standby_sup_dplug_6_2_5a.bin
-rw-r--r-- 1 venutumm eng 181784 2017-12-04 12:54 m9500_ficon_active_sup_dplug_6_2_5b.bin
-rw-r--r-- 1 venutumm eng 181786 2017-12-04 12:54 m9500_ficon_standby_sup_dplug_6_2_5b.bin
The above plugins are attached to this bug.
*For any other platform/release. Please contact engineering and we will create them.
*Also please refer work around plan attached to the bug.
Further Problem Description: To know whether the issue is present (or not) in a chassis/switch, perform the following steps:
First, issue the following command and check the first number in the output (highlighted below with "^").
It is "2" in this example:
CUP9710# show ficon control-device sb3 | inc "Logical Path:" | wc
Second, issue the following command and check the output.
CUP9710# show system internal pss dump volatile:/dev/shm/fcd_sb3_runtime_ha | no-more
At the very end of that output, you will see the following:
# keys: 5 total_keys_size: 85 total_value_size: 40 deleted 0
avg_key_size: 17 avg_val_size: 8
"# keys: 5" in the above output is incorrect. It should be 2, so this switch has 3 leaked entries.
If the number of keys count is large (it's risky if the count is more than 10,000), ISSU/ISSD should be avoided until the unwanted entries are cleared.
Please open a case with the above outputs to TAC, which has dplugs to clear these entries prior to attempting an ISSU.