This document describes the steps to troubleshoot the GSR12000 device (running either IOS or IOS-XR) during conditions when the device is unreachable.
Cisco recommends that you have basic knowledge of the GSR12000 platform.
This document is restricted to Cisco 12000 series router.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Record the LED information, as shown in this table, before recovery/debugging the node further.
PWR OK “GREEN” => PEM is good
Otherwise one of the below LED’s show amber “AMBER”
FAULT, OC(over current), TEMP(over temperature
Note: Info need to be collected for all PEMs installed in chassis
There are two setup of LED ENABLED and FAIL one for each fabric card(2 CSC + 3 SFC) and one set for Alarm card itself
GREEN indicate enabled
AMBER indicate fail/empty slot
There are two status LED’s OK and FAIL
OK LED indicate blower is good
FAIL LED indicate blower issue
Eng3 has LED segment “IOX RUN” during stable state.
Eng5 has LED on faceplate GREEN in stable state or AMBER during booting or IN RESET
Slot 0 through
Active ACTV RP in stable state
Standby STBY RP in stable state
Record console Ethernet LED’s
Pictorial View of Faceplate
Alarm card faceplate showing the different LEDs
Privacy Enhanced Mail (PEM) faceplate showing PEM Staus LEDs
Flow chart based Router Debugging and Recovery
Flow Chart 1
Confirm console connection details and accessibility to terminal server is established.
Flow Chart 2
If the console access is not available, use this flowchart.
Flow Chart 3
When the console access is unavailable and LEDs are glowing, but displaying incorrect status, use this flow chart.
PRP: RP ACTV, RP STBY
LC: IOS RUN (E3)/Green LED (E5)
Alarm card: GREEN
Fabric cards: GREEN (LED on alarm card)
PEM: Green LED
Blower: Green LED
Intermittent checks for accessibility: - Check if display on any cards changed
Command List 1: Capture to be collected when active RP’s console is accessible.
admin show platform
admin show redundancy
admin show environment power-supply
show power-mgr detail
Run these commands to check process status, CPU usage, packet manager status and identify culprit process (if any)
and collect the command provided in the session.
show processes blocked
show processes cpu | ex 0%
show packet-memory summary
Collect these set of logs for the above identified process.
show processes <jid>
show processes threadname <jid>
follow jid <> iter <3>
admin show controllers fabricq drop
admin show controllers fabricq errors
admin show controllers fabricq output
admin show controllers fabricq queue
admin show controllers fabricq tofab
admin show controllers fabricq frfab
admin show controllers fabric (3 times)
show controller fia location <all slots> (3 times)
Mbus counters (capture 2-3 times)
admin show mbus can-error location all
admin show mbus counters location all
run mtool discover
admin show controllers fabric trace
admin show sysldr trace all
show fiad trace
show_psarb_trace (from shell)
If there is time, then you can collect the showtech (huge logs).
admin show tech-support shelf-management file <qualified disk path>
Command List 2: Logs to be collected when only standby’s console is accessible
Note: Collect logs only if the active console access is not available, but the standby access is accessible.
Procedure: Access ksh(shell) of standby and attach to active ksh over mbus using this procedure, and collect logs from the active’s shell.
<esc>ksh from standby console and then attach <active nodeid>
Basic logs to know card status, power status and console logs
envmon_show -m –p
admin show logging
Logs to check if fabric driver and QADs are healthy
IOS-Command List: Capture to be collected when the console is accessible.
Show monitor event-trace lci
Show monitor event-trace agent-ctrl
Show monitor event-trace board
Show monitor event-trace fab
Show gsr agent-ctrl
show gsr power-mgr details
show env power
show env internal
Dump these logs multiple times with the time gap.
Execute-on all show controller fia
Show controller fia
Show controller errors fabric counters
Show controller errors
Show controller xbar
Show controllers sca
Show controllers clock
Show controllers csc-fpga
Show controllers fab-clk
Show mbus counters
Show mbus can