Introduction
This document describes the steps to troubleshoot the GSR12000 device (running either IOS or IOS-XR) during conditions when the device is unreachable.
Prerequisites
Requirements
Cisco recommends that you have basic knowledge of the GSR12000 platform.
Components Used
This document is restricted to Cisco 12000 series router.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Troubleshoot
LED indication
Record the LED information, as shown in this table, before recovery/debugging the node further.
Sl. No
|
Module
|
info
|
LED Staus
|
1
|
Power shelf/PEM’s
|
PWR OK “GREEN” => PEM is good
Otherwise one of the below LED’s show amber “AMBER”
FAULT, OC(over current), TEMP(over temperature
Note: Info need to be collected for all PEMs installed in chassis
|
PEM1:
PEM2:
PEM3:
PEM4:
|
2
|
Alarm card
|
There are two setup of LED ENABLED and FAIL one for each fabric card(2 CSC + 3 SFC) and one set for Alarm card itself
GREEN indicate enabled
AMBER indicate fail/empty slot
|
Alarm card:
CSC0: CSC1:
SFC0:
SFC1:
SFC2:
|
3
|
Blower
|
There are two status LED’s OK and FAIL
OK LED indicate blower is good
FAIL LED indicate blower issue
|
TOP:
BOT:
|
3
|
LC
|
Eng3 has LED segment “IOX RUN” during stable state.
Eng5 has LED on faceplate GREEN in stable state or AMBER during booting or IN RESET
|
Slot 0 through
Slot 15
|
4
|
RP
|
Active ACTV RP in stable state
Standby STBY RP in stable state
Record console Ethernet LED’s
|
ACTV:
STBY:
|
Pictorial View of Faceplate
Alarm card faceplate showing the different LEDs

Privacy Enhanced Mail (PEM) faceplate showing PEM Staus LEDs

Flow chart based Router Debugging and Recovery
Flow Chart 1
Confirm console connection details and accessibility to terminal server is established.

Flow Chart 2
If the console access is not available, use this flowchart.

Flow Chart 3
When the console access is unavailable and LEDs are glowing, but displaying incorrect status, use this flow chart.

*Display LEDs
- PRP: RP ACTV, RP STBY
- LC: IOS RUN (E3)/Green LED (E5)
- Alarm card: GREEN
- Fabric cards: GREEN (LED on alarm card)
- PEM: Green LED
- Blower: Green LED
- Intermittent checks for accessibility:
- Check if display on any cards changed
Command List 1: Capture to be collected when active RP’s console is accessible.
admin show platform
admin show redundancy
admin show environment power-supply
show power-mgr detail
show logging
Run these commands to check process status, CPU usage, packet manager status and identify culprit process (if any)
and collect the command provided in the session.
show processes blocked
show processes cpu | ex 0%
show packet-memory summary
Collect these set of logs for the above identified process.
show processes <jid>
show processes threadname <jid>
follow jid <> iter <3>
Fabric logs
admin show controllers fabricq drop
admin show controllers fabricq errors
admin show controllers fabricq output
admin show controllers fabricq queue
admin show controllers fabricq tofab
admin show controllers fabricq frfab
admin show controllers fabric (3 times)
show controller fia location <all slots> (3 times)
Mbus counters (capture 2-3 times)
admin show mbus can-error location all
admin show mbus counters location all
run mtool discover
PD traces
admin show controllers fabric trace
admin show sysldr trace all
show fiad trace
show_psarb_trace (from shell)
If there is time, then you can collect the showtech (huge logs).
admin show tech-support shelf-management file <qualified disk path>
Command List 2: Logs to be collected when only standby’s console is accessible
Note: Collect logs only if the active console access is not available, but the standby access is accessible.
Procedure: Access ksh(shell) of standby and attach to active ksh over mbus using this procedure, and collect logs from the active’s shell.
<esc>ksh from standby console and then attach <active nodeid>
Basic logs to know card status, power status and console logs
show_platform –a
envmon_show -m –p
show_logging -A
admin show logging
Logs to check if fabric driver and QADs are healthy
fabricq_lwm_show_command -v -a
fabricq_lwm_show_command -v -t
fabricq_lwm_show_command -v -f
fabricq_lwm_show_command -v -q
fabricq_lwm_show_command -v -d
fabricq_lwm_show_command -v -r
fabricq_lwm_show_command -v -o
fabricq_lwm_show_command -v -p
fabricq_lwm_show_command -v -c
fabricq_lwm_show_command -v -p
fabricq_lwm_show_command -v -s
qad_show -b -i
To check mbus issue (collect 2-3 times)
mtool discover
show_mbus can-stats
show_mbus can-error
Run these commands to check process status, CPU usage and packet manager status and identify culprit process (if any) and collect command provided in this session.
show_processes -b
show_proc_cpu -c | grep -v -E 0%
packet_show summary
packet_show corrupt
Collect below set of logs for the above identified process
sysmgr_show -o -p <jid in hex>
show_processes -T -p <jid in hex>
attach_process -j <jid> -i 3
Collect PD traces
fiad_show_ltrace
show_psarb_trace
sysldr_show_ltrace
IOS-Command List: Capture to be collected when the console is accessible.
Show logging
Show tech
Show gsr
Show monitor event-trace lci
Show monitor event-trace agent-ctrl
Show monitor event-trace board
Show monitor event-trace fab
Show gsr agent-ctrl
show gsr power-mgr details
show env power
show env internal
Dump these logs multiple times with the time gap.
Execute-on all show controller fia
Show controller fia
Show controller errors fabric counters
Show controller errors
Show controller xbar
Show controllers sca
Show controllers clock
Show controllers csc-fpga
Show controllers fab-clk
Show mbus counters
Show mbus can