Monitoring Fabric Links

This chapter describes methods for monitoring fabric links and troubleshooting them to ensure optimal network performance.

Fabric Link Keepalive Monitoring

Table 1. Feature History Table

Feature Name

Release Information

Feature Description

Fabric Link Keepalive Monitoring

Release 24.2.11

This feature allows you to monitor and identify the fabric links that are down due to failure to receive keep-alive messages.

If a fabric link doesn’t receive the keep-alive message, the Cisco IOS XR software performs a port-reset action and tries to activate the fabric link. This feature is enabled by default. You also have the option to disable the maximum port-reset threshold value of five, which causes the link to flap again, but we recommend you avoid using this command unless you have evaluated its impact on your traffic flow.

This feature introduces the hw-module fabric-tsmon-port-reset disable command, which disables the maximum port-reset threshold value.

Fabric links in a router are high-speed connections that interconnect the internal components of a modular router, facilitating efficient data transfer and communication within the routing infrastructure.

The connection between a line card (LC) and a fabric card (FC) in a modular network device is a high-speed interface that enables the forwarding of data packets from the network ports on the line card through the switch fabric on the fabric card for routing to their destination.

These are the control messages exchanged between LC and FC specific NPU devices.

Prior to Release 24.2.11, if a fabric link doesn’t receive the keep-alive message, it continues to flap until it becomes active or is manually shut down by admin.

From Release 24.2.11, if the fabric link doesn’t receive the keep-alive messages, the Cisco IOS XR software performs the following steps to establish the connection between the fabric links:

  1. Failure to receive keep-alive messages triggers a keep-alive (tsmon) interrupt.

  2. The Cisco IOS XR software performs a port reset and tries to recover the fabric link.

  3. If five interrupts occur continuously or randomly within 24 hours, the Cisco IOS XR software will permanently shut down the fabric link on the sixth interrupt and display the following syslog message.

    Router# Jan 19 11:31:37.899 UTC: npu_drvr[248]: %FABRIC-NPU_DRVR-3-NPU_CPA_GEN_ERR_INFO : TS_MON flap crossed threshold: Asic 2, link 180, event 20
    

    Note


    The syslog message is displayed only after the fabric link is permanently shut down, which is after the sixth interrupt.


By default, each port is configured with a threshold that allows for five interrupts within a 24-hour period. If a sixth interrupt occurs within this timeframe, the Cisco IOS XR software shuts down the flapping fabric link. In the section View Fabric Link Interrupt, you can see that the fabric link is shut down during the sixth tsmon interrupt within 24 hours.

You also have the option to disable the port-reset threshold value using the hw-module fabric-tsmon-port-reset disable command, which disables the fabric link shutdown action. For more information, see the section Disable the Port Reset Threshold Value.

After disabling the port-reset threshold value, the Cisco IOS XR software doesn't shutdown the fabric link even after interrupts cross the threshold value; instead, it keeps flapping the fabric link and logs the asic-errors for the interrupt.

View Fabric Link Interrupt

If a fabric link flaps or shuts down due to failure to receive keep-alive messages, you can use the show asic-errors command to see the interrupt information:

Router# show asic-errors npu 8 all location 0/rp0/CPU0
************************************************************
*                       Link Errors                        *
************************************************************
8000, 8808-FC1, 0/FC4, npu[8]
Name            : slice[5].ifg[0].mac_pool8[1].rx_link_status_down.rx_link_status_down6
Block ID        : 0x288
Addr            : 0x100
Leaf ID         : 0x51002006
Thresh/period(s): 10/day
Error count     : 1
Last N errors   : 1
--------------------------------------------------------------
First N errors.
@Time, Error-Data
------------------------------------------
Jun  3 22:13:18.973101
        Error description: Id:193 Bit:0x6 Action:none LINK_DN: (Slc/Ifg/FstSer):(5/0/14) RxLnDnSt:1 RxRmtLnStDn:0 PcsLnStDn:1 PcsAlnStDn:1 HiBer:0 HiSerIntr:0 SigLos:0 1 0 0 0 0 0 0 FifoOverflow:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  Link:174 NTS-22:13:18.973636 STS-22:13:18.973101 TEMP_SENSOR:43.5C
--------------------------------------------------------------
8000, 8808-FC1, 0/FC4, npu[8]
Name            : ts_mon.slice_interrupt_register[5].link_down_interrupt
Block ID        : 0x40f
Addr            : 0x105
Leaf ID         : 0x81e020a0
Thresh/period(s): 5/day
Error count     : 6
Last clearing   : Mon Jun  3 22:13:18 2024
Last N errors   : 6
--------------------------------------------------------------
First N errors.
@Time, Error-Data
------------------------------------------
Jun  3 22:13:18.945593
        Error description: Id:192 Bit:0x0 Action:port-reset LINK_FABRIC_DN: (Slc/Ifg/FstSer):(5/0/14) Link:174 NTS-22:13:18.945716 STS-22:13:18.945593   
-------------------------------------------------------------- 

The above sample displays only a part of the actual output; the actual output displays more details.

Disable the Port Reset Threshold Value

To disable the port-reset threshold value, use the hw-module fabric-tsmon-port-reset disable command.

You can't activate the fabric links that are down due to port-reset action crossing thresholds using the hw-module fabric-tsmon-port-reset disable command. The fabric links that are down must be debugged with CISCO.


Caution


We recommend that you troubleshoot and resolve the reason for the fabric port shutdown instead of using the hw-module fabric-tsmon-port-reset disable command to prevent the fabric port shutdown.


Router# configuration terminal
Router(config)# hw-module fabric-tsmon-port-reset disable 
Router(config)# commit