This document describes the problem and solution related to MTP (Message Transfer Part) Level 3 User Adaptation Layer (M3UA) links that either go to a congested state or flap state, after a major network outage or software upgrade of the Cisco Aggregation Services Router (ASR) Serving GPRS (General Packet Radio Service) Service Node (SGSN). This normally happens in interoperability scenarios where the ASR 5000 node is connected to third-party nodes such as Home Location Register (HLR) or Radio Access Network (RNC).
The underlying issue is that the ASR 5000 SGSN receives a low advertise windows size in the Stream Control Transmission Protocol (SCTP) layer from the remote peer node, Signaling Transfer Point (STP) node, HLR, or RNC. The low window size can be seen in the packet capture trace, SCTP show command, or monitor protocol trace in the SGSN. In the packet capture you can see the advertised window size in the SCTP SACK message with a value of zero or close to zero. When this happens SGSN raises a M3UA alarm in order to inform the peer node not to send the packet from that peer endpoint. This causes the SCTP link to flap or enter into a congested state. Since SGSN sends a normal window size, it continues to receive M3UA data from peer nodes, but those packets might get dropped in the waiting queue if the peer node never comes out of congestion.
Sequence of Events that Lead to a M3UA Alarm in SGSN
SCTP sends a flow control start indication to M3UA.
SCTP sends a flow control stop indication to M3UA.
M3UA sets the congestion active flag for the association and begins to poll SCTP periodically about its flow control status.
While an association is in flow control, M3UA queues future data requests for that association until QUEUE_SIZE is reached. At that point, future messages for the association are discarded. M3UA propagates the association congestion information to the individual remote peers that are part of the association.
M3UA clears the congestion flag for the association and stops polling SCTP.
M3UA transmits anything in its congestion queue for that association to SCTP.
Peer Server Id : 2 Peer Server Process Id: 1 Association State : ESTABLISHED Flow Control Flag : TRUE Peer INIT Tag : 17282 SGSN INIT Tag : 3011555404 Next TSN to Assign to Outgoing Data Chunk : 324019883 Lowest cumulative TSN acknowledged : 324019882 Cumulative Peer TSN arrived from peer : 2204328608 Last Peer TSN sent in the SACK : 2204328607 Self RWND : 1048576 <- SGSN sends this window size Advertised RWND in received SACK : 32 <- peer sends this window size Peer RWND(estimated) : 32 <- Estimated window also goes down which cause SGSN not able to send packets on wire Retransmission counter : 0 Zero Window Probing Flag : FALSE Last Tsn received during ZWnd Probing : 0 Bytes outstanding on all addresses of this association : 0 Congestion Queue Length : 0 Ordered TSN assignment Waiting QLen : 7690 Unordered TSN assignment Waiting QLen : 0 Total number of GAP ACKs Transmitted : 2 Total number of GAP ACKs Received : 2037
Whenever flaps or congestion occurs continuously on the links, this is an indication that either the peer node does not process the request in time because of overwhelming requests that come from SGSN, or SGSN might receive an overwhelming number of requests from the network due to network congestion or a network issue.
A workaround to get out of this condition is to block and unblock the links associated with this congestion or flapping. Another way is to remove and then re-add the Peer Signaling Process (PSP) instance associated with this congestion or flapping.