June 7, 2002
Products Affected
|
Product |
|---|
|
BPX8600 Networks |
|
IGX8400 Networks |
Problem Description
After a trunk or node failure event, slow reroute times and failures to reroute connections that appear healthy but do not pass traffic may be seen.
Background
This anomaly has been observed only in networks with mixed 9.2 and 9.3 Cisco IOS® Software.
Problem Symptoms
Broken user connections are not listed in dspcons -f output but are not routed and are not passing data. Network users will not have access to remote resources.
Workaround/Solution
The anomaly discussed in this field notice may be avoided by not maintaining a mixed switch software environment (9.2 and 9.3) in the same network for extended periods of time.
Solution: Upgrade entire network to release 9.3 and monitor bug ID CSCdx55308 (registered customers only) for further solutions.
There are two methods for recovery should connections fail during the upgrade:
-
Recommended for small networks. For purposes of this field notice, "small network" means 20 IGX8400 and BPX8600 nodes or less. Connections that are not listed as failed and also not passing traffic can be recovered in most cases, by using the command rrtcon -pfail will reroute all permanently failed connections. In the context of this field notice "Permanently Failed" refers to connections that failed because they did not reroute. The rrtcon -pfail command exists in release 9.3 and later releases. In previous releases the -pfail option did not exist for the rrtcon command. Rrtcon is a SuperUser level command.
-
Recommended for larger networks. For purposes of this field notice, "larger networks" means greater than 20 IGX8400 and BPX8600 nodes. The following algorithm has been implemented and is being tested in Cisco's Large Scale Network Test. It has been designed with the primary goal of reducing the cluster lock effect.
How To Upgrade Software
Use the steps below:
-
Collect dsprrgps (display reroute groups) for the entire network.
-
Sort dsprrgps output in decreasing order of connections requiring reroute.
-
If there are still connections requiring reroute:
-
If this is the first time dsprrgps is run.
-
Turn off routing on all nodes in the network * (Use Service Level command off1 Cm_Rerouting option.)
-
For each node in the node list compiled in step 2;
-
Wait based on the quantity of conns that need to be rerouted (as determined by dsprrgps collection):
-
Disable routing on the node ** (Use Service Level command option.)
-
Collect dsprrgps and regenerate the list of nodes with unrouted connections.
-
-
Enable routing on all nodes in the network. (Use Service Level command on1 Cm_Rerouting option.)
-
End
Note:?(*) This should be optimized to only turn off routing on those nodes with conns to reroute.
Note:?(**) While this step will guarantee that "cluster lock" cannot occur, it may be too aggressive and reduce overall performance. Under investigation.
DDTS
To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.
|
DDTS |
Description |
|---|---|
|
CSCdx55308 (registered customers only) |
Slow rerouting time after trunk event |
For More Information
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods: