This Document describes the procedure to debug packet buffer depletionmessages that can happen ondifferent Line cards in a 12000 series cisco routerrunning IOS. It is far too common to see valuable time and resourceswasted replacing the hardware that actually functions properly due to lack of knowledge on GSR buffer management .
The information in this document is based on these software and hardware versions:
Cisco 12000 Series Internet Router
Cisco IOS®Software Release that supports the Gigabit Switch Router
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document is started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
GSR or 12000 series cisco routers have a truly distributed architecture. This means that each LC runsits own copy ofCisco IOS software image andhave the intelligence to complete the packet forwarding decision on its own. Each line card does its own
Packet buffer management
One of the most important operation during packet switching in GSR is the buffer management which is done by various Buffer Management ASICs ( BMA) located in the Line cards.Below aresome messages related to GSRbuffer management which could show up in the router logs while in production. In the following section we will discuss the different triggers that could cause these messages to appear on the routerlogsandwhat are the corrective action to be done to mitigate the problem. In some situation this could also lead to packet loss which could manifest as protocol flaps and cause network impact.
Totroubleshoot the QM-SANITY warning errors we need to understand thepacket flow on a GSR line card. The Figure below explains themain blocks of a C12k line card and the packet flow path.
The Line Card (LC) on a Cisco 12000 Series Internet Router has two types of memory:
Route or processor memory (Dynamic RAM - DRAM): This memory enables mainly the onboard processor to run Cisco IOS software and store network routing tables (Forwarding Information Base - FIB, adjacency)
Packet memory (Synchronous Dynamic RAM - SDRAM): Line card packet memory temporarily stores data packets awaiting switching decisions by the line card processor.
As seen from the above image, GSR line card has specialised packet buffer ASIC( Application Specific Integrated Circuit),one in each direction of traffic flow which provides access to the packet memory.These ASICs also known as Buffer management ASIC ( BMA) does thepacket buffering andbufferqueue management function on the line card.To support high throughput /forwarding rates, the packet memory on either direction iscarved into different size memory poolsdesigned to forward packets of varying MTU sizes.
The frames received by the Physical Layer Interface Module (PLIM)cards are Layer 2 processed and DMAed to a local memoryin the PLIM card. Once the received data unit is complete, an ASIC in the PLIM contactsthe ingress BMA andrequests a buffer of appropriate size. If the buffer is granted, the packet moves to theline card ingress packet memory. If there are no available buffers the packet isdropped and ignored interface counter will go up. The ingress packet processor does the features processing on the packet, makes the forwarding decision and moves the packetto the toFab queue corresponding to the egress line card. The Fabric Interface ASIC( FIA)segments the packet to cisco cells and the cella are transmitted to the switch fabric. The packets are then received from the switch fabric by the FIA on the egress line card and goes on to the Frfab queues where they are reassembled, then to the egress PLIM, and finally sent on the wire.
The Decision of the FrFab BMA to select the buffer from a particular buffer pool is based on the decision made by the ingress line card switching engine. Since all queues on the entire box are of the same size and in the same order, the switching engine tells the transmitting LC to put the packet in the same number queue from which it entered the router.
While the packet is being switched,queue size of a particular buffer pool at the ingress line cardwhich was used to move the packet will be decrementedby one till the BMA in the egress line card returns the buffer.Here we should also note that the complete buffer management is done in hardware by the buffer management ASIC’s and for flaw less operation it is necessary thatthe BMA’s returns thebuffers to the original pool from where it was sourced.
There are three scenarios where theGSR packet buffer management can experience stress or failure leading to packet loss. Below are the three Scenarios.
The hardware queue management fails. Thishappenswhen the egressBMA fails to return the packet buffer orreturns the packet buffer toincorrectbuffer pool. If the buffers are returned to the incorrect pool, we will seesome buffer pools growing and some buffer pools depleting over a period of time and eventually effecting packets with the depleting buffer pool size. We willstart seeing the QM-Sanity warnings asthe packet buffer depletes and crosses the warning threshold.
Use the QM sanity debugs and show controllers tofab queuescommand to check if you are impacted by this condition. Refer to the troubleshooting sectionto find how to enableQM sanity thresholds.
This condition is generally caused by faulty hardware. Checkthe below outputs on the router and looks for parity errors or line card crashes. The fix would be to replace the Line card.
show context all
From the QM sanity debugs and show controller tofab queue we can see thePool 2 is growing in size while Pool 4 is running low. This indicates Pool 4 is loosing buffers and it is being returned to Pool 2.
QM sanity debugs:
SLOT 5:Oct 25 04:41:03.286 UTC: Pool 1: Carve Size 102001: Current Size 73078
SLOT 5:Oct 25 04:41:03.286 UTC: Pool 2: Carve Size 78462: Current Size 181569
SLOT 5:Oct 25 04:41:03.286 UTC: Pool 3: Carve Size 57539: Current Size 6160
SLOT 5:Oct 25 04:41:03.286 UTC: Pool 4: Carve Size 22870: Current Size 67
102001/102001 (buffers specified/carved), 39.1%, 80 byte data size
1 13542 13448 73078 262143
78462/78462 (buffers specified/carved), 30.0%, 608 byte data size
2 131784 131833 181569 262143
57539/57539 (buffers specified/carved), 22.0%, 1616 byte data size
3 184620 182591 6160 262143
23538/22870 (buffers specified/carved), 8.74%, 4592 byte data size
4 239113 238805 67 262143
Traffic congestion on the next hop device or the forward path. In this scenariothe device to which the GSR feeds traffic cannotprocess at GSR’s speed and as aresult the next hop device is sending pause frames towards GSR asking it to slow down.If flow control is enabled on GSR PLIM cards , the router will honour the pause frames and will start buffering the packets. Eventually the router will run out of buffers causing the QM Sanity error messages and packet drops.We willstart seeing the QM-Sanity warnings asthe packet buffer depletes and crosses the warning threshold. Refer to the troubleshooting section on how to find the QM sanity thresholds.
Use the show interface output on the egress interface to check if the router is impacted by this Scenario. The below capture gives an example of an interface receiving pause frames. The action plan will be to look at the cause of congestion in the next hop device.
GigabitEthernet6/2 is up, line protocol is up
Small Factor Pluggable Optics okay
Hardware is GigMac 4 Port GigabitEthernet, address is 000b.455d.ee02 (bia 000b.455d.ee02)
0 output buffer failures, 0 output buffers swapped out
At times of oversubscription due to poor network design/traffic bursts/DOS attack.QM Sanity warning can occur if there is sustained high traffic conditionwheremore trafficis directed at the router than what the Line Cards can handle.
To rootcause this check the traffic rates on all the interfaces in the router. Thatwill reveal if any of the high speed links are congesting slow links.
Use the “show interface output” command.
To Check the current QM sanity level for a LC
Attach to LC
Go to enable mode
Run test fab command
Collect the output of “qm_sanity_info”
Option q to exit test fab command line
Exit from LC
To configure QM Sanity parameters
change to configuration mode
Run hw-module slot <slot#> qm-sanity tofab warning freq <>
To enable/disable QM sanity debugs
Attach to LC
Go to enable mode
Run “qm_sanity_debug”. Run again and it will stop the debugs
Option q to exittest fabcommand line
Exit from LC
To check the GSR fabric interface asic statistics
show controller fia
To check the Tofab queues
show controllers tofab queues
To check Frfab queus
show controller frfab queues
The below output is pulled from a working Lab router to demostrate the command outputs.