Introduction
This document describes how to understand and troubleshoot hardware resources on Catalyst 9000 series switches.
Prerequisites
Requirements
There are no specific requirements for this document.
Components Used
The information in this document is based on these software and hardware versions:
- Cisco Catalyst 9200, 9300, 9400, 9500 non-HP series switches on Cisco IOS® XE 16.x & 17.x software
- Cisco Catalyst 9500HP, 9600 series switches on Cisco IOS® XE 16.x & 17.x software
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
- Various features on Catalyst 9000 Series Switches consume limited hardware resources. These resources exist to accelerate the performance of those features, and to deliver the expected high forward rates expected from a switch.
- The size of these tables can vary from switch model to switch model, but the basic troubleshooting methodology remains the same.
- Commonly, the primary limited hardware resource in LAN switching is referred to as TCAM (TCAM is a memory technology especially suited to store Longest Prefix Match (LPM) information for fast lookup or other types of OR logic lookups).
- In Catalyst 9000 Series switches, multiple memory types are used beyond just TCAM, suited to specific needs of a given feature (HASH is another type of simplified memory. The MAC address table is an example of this memory type).
When you troubleshoot a feature that does not operate as expected, a good starting point is to confirm that hardware is not beyond the scale of the switch in question. While switches can vary in the size of these tables, the verification and troubleshoot methodology remains mostly the same.
Note: This page is also a reference page where you can find information on various features, and how to check their hardware scale.
Note: Per-platform, the CLI sometimes includes the term switch and sometimes does not. (show platform hardware fed <number|active|standby> fwd-asic resource tcam utilization versus show platform hardware fed <active> fwd-asic resource tcam utilization
Terminology
EM
|
Exact Match
|
An entry in Hash memory that is a 1:1 match (host route, Directly Connected host)
|
LPM
|
Longest Prefix Match
|
Any route that is /31 or shorter (/32 routes are EM type)
|
TCAM
|
Ternary Content-Addressable Memory
|
A type of memory that stores and queries entries with three different inputs: 0, 1 and X. This type of memory must be used in cases where there can be multiple matches to the same entry, and the resulting Hash for each would not be unique. This table includes a mask or X value that allows it to know if it matches or does not match this entry.
|
CAM
|
Content-Addressable Memory
|
General term for hardware memory (Hash/TCAM)
|
RIB
|
Routing Information Base
|
The routing table seen in show ip route.
|
FIB
|
Forwarding Information Base
|
Simplified table with prefixes added by the RIB and ARP tables with a pointer to the ADJ table.
|
Directly Connected
|
Directly Connected Route
|
A locally connected host prefix (ARP adjacent).
|
Indirectly Connected
|
Indirectly Connected Route
|
A route that is via a remote next hop to reach.
|
ADJ
|
Adjacency (table)
|
Stores next hop information used for packet rewrite.
|
EM
|
Exact Match
|
Connected hosts, indirect /32 host prefixes
|
TCAM
|
Ternary Content-Addressable Memory
|
Indirect prefixes /31 or shorter
|
FED
|
Forward Engine Driver
|
The ASIC (hardware) layer
|
FMAN-FP
|
Forward Manager- Forwarding Plane
|
FMAN-FP manages software objects that add, delete, or modify FED information.
|
SI
|
Station Index
|
Station Index = packet rewrite information (RI = Rewrite Index) and outbound interface information (DI = Destination Index)
|
RI
|
Rewrite Index
|
MAC address rewrite information for layer 3 forwarding to the next hop adjacency.
|
DI
|
Destination Index
|
Index that points to the outbound interface.
|
UADP
|
Cisco Unified Access™ Data Plane
|
The ASIC architecture used in the switch.
|
ASIC Version Information (UADP 2.0 versus 3.0)
The key difference between the 2.0 and 3.0 versions of the Catalyst 9000 series ASICs is how FIB hardware is populated or used.
In UADP 3.0 memory called EM/LPM is used:
- host routes (/32 mask length) and directly connected (ARP adjacent)
- /31 or shorter prefixes (Where a mask comparison is required to make a forward decision.)
In UADP 3.0 TCAM, it still exists for FIB, but is used only for special cases or exceptions where EM/LPM cannot be used.
- An example of this would be if the IP address space is not contiguous or multiple address spaces used, and merge into EM/LPM is not possible.
In UADP 2.0 memory is split into two sections EM & TCAM:
- EM is used for /32 host routes and directly connected (ARP adjacent) hosts.
- TCAM is used for /31 or shorter prefixes where a prefix mask comparison is required.
Compare these outputs between the two ASIC types:
In this example, the 9500-12Q has significantly more TCAM space. But, the 9500-48Y4C (9500H) has an even greater scale of EM/LPM.
- LPM - the same logic applies to the TCAM of the 9500-12Q, but, its not specifically called out.
- The EM/LPM on 9500H indicates that this shared memory space is used for both Exact Match (EM) and LPM (prefix-based) entries. The system uses an optimized memory system to achieve both scale, performance, and flexibility.
- The significantly reduced TCAM on 9500H exists to store special entries, particularly Hash Collisions (when a unique hash cannot be generated for a particular entry).
9500-48Y4C (9500H / High Performance - UADP 3.0 based switch)
Switch#show platform hardware fed active fwd-asic resource tcam utilization
Codes: EM - Exact_Match, I - Input, O - Output, IO - Input & Output, NA - Not Applicable
CAM Utilization for ASIC [0]
Table Subtype Dir Max Used %Used V4 V6 MPLS Other
------------------------------------------------------------------------------------------------------
IP Route Table EM/LPM I 212992 3 0.01% 2 0 1 0 <-- LPM matches now stored here
IP Route Table TCAM I 1536 15 0.02% 6 6 2 1 <-- Used for exception cases
9500-12Q (UADP 2.0 based switch)
Switch#show platform hardware fed active fwd-asic resource tcam utilization
Codes: EM - Exact_Match, I - Input, O - Output, IO - Input & Output, NA - Not Applicable
CAM Utilization for ASIC [0]
Table Subtype Dir Max Used %Used V4 V6 MPLS Other
------------------------------------------------------------------------------------------------------
IP Route Table EM I 49152 3 0.01% 2 0 1 0
IP Route Table TCAM I 65536 15 0.02% 6 6 2 1 <-- LPM matches are stored here in 2.0
Note: For more information on the UADP Architecture see Cisco Catalyst 9500 Architecture White Paper
General Hardware Validation Commands
These commands show high level usage statistics for Hash, TCAM, Interface, Rewrite resources used.
- These resources are related, and exhaustion of a dependent resource can affect the ability to fully use other available resources.
- Changes to the outputs from these commands in 17.x train makes the ability to read hardware, and diagnose specific issues much easier.
Example: A switch can have available Hash / TCAM, but run out of Adjacencys.
- Packet ability to forward can be impacted to some destination prefix not because hardware cannot program FIB but because it cannot program a new rewrite entry.
show platform hardware fed <switch> active fwd-asic resource tcam utilization <-- Hash & TCAM
show platform hardware fed <switch> active fwd-asic resource utilization <-- SI/RI/DI/etc (other related resources)
show platform hardware fed <switch> active fwd-asic resource rewrite utilization <-- IP Adjacency. LISP adjacency, Tunnel Adjacency, etc
### 17.x train CLI displays multiple resources in one place (these are not available in 16.x) ###
New CLI combines aspects of all 3 commands into one table for easier diagnosis of all resources related to IPv4
show platform hardware fed active fwd-asic resource features ip-adjacency utilization
Cisco IOS XE 17.x General Hardware Validation Commands
show platform hardware fed active fwd-asic resource tcam utilization command is the first place you want to look to evaluate if you have a hardware scale issue. (It displays information on a per-ASIC basis).
Codes:
- EM - Exact_Match <-- Consult Terminology table for definition
- I - Input, O - Output, IO - Input & Output, <-- If resource is directional it is noted
- NA - Not Applicable <-- If direction is not applicable
Switch#show platform hardware fed active fwd-asic resource tcam utilization
Codes: EM - Exact_Match, I - Input, O - Output, IO - Input & Output, NA - Not Applicable <-- Key for table abbreviations
CAM Utilization for ASIC [0] <-- Content Addressable Memory for ASIC 0
Table Subtype Dir Max Used %Used V4 V6 MPLS Other <-- CAM usage broken down per resource & memory type (EM versus TCAM)
------------------------------------------------------------------------------------------------------
Mac Address Table EM I 65536 18 0.03% 0 0 0 18
Mac Address Table TCAM I 1024 21 2.05% 0 0 0 21
L3 Multicast EM I 16384 0 0.00% 0 0 0 0
L3 Multicast TCAM I 1024 9 0.88% 3 6 0 0
L2 Multicast EM I 16384 0 0.00% 0 0 0 0
L2 Multicast TCAM I 1024 11 1.07% 3 8 0 0
IP Route Table EM I 49152 3 0.01% 2 0 1 0 <-- Data from RIB/FIB populated here
IP Route Table TCAM I 65536 15 0.02% 6 6 2 1 <-- Data from RIB/FIB populated here
QOS ACL TCAM IO 18432 85 0.46% 28 38 0 19
Security ACL TCAM IO 18432 129 0.70% 26 58 0 45
Netflow ACL TCAM I 1024 6 0.59% 2 2 0 2
PBR ACL TCAM I 2048 22 1.07% 16 6 0 0 <-- Data for PBR & NAT populated here
Netflow ACL TCAM O 2048 6 0.29% 2 2 0 2
Flow SPAN ACL TCAM IO 1024 13 1.27% 3 6 0 4
Control Plane TCAM I 512 276 53.91% 126 106 0 44
Tunnel Termination TCAM I 1024 18 1.76% 8 10 0 0
Lisp Inst Mapping TCAM I 2048 1 0.05% 0 0 0 1
Security Association TCAM I 512 4 0.78% 2 2 0 0
CTS Cell Matrix/VPN
Label EM O 8192 0 0.00% 0 0 0 0 <-- Outbound resource used to reach remote VPNv4 prefixes
CTS Cell Matrix/VPN
Label TCAM O 512 1 0.20% 0 0 0 1
Client Table EM I 4096 0 0.00% 0 0 0 0
Client Table TCAM I 256 0 0.00% 0 0 0 0
Input Group LE TCAM I 1024 0 0.00% 0 0 0 0
Output Group LE TCAM O 1024 0 0.00% 0 0 0 0
Macsec SPD TCAM I 1024 2 0.20% 0 0 0 2
CAM Utilization for ASIC [1]
<...snip...>
If hardware scale from command show platform hardware fed active fwd-asic resource tcam utilization looks okay, check other dependent resources
Note: There are many shared resources. This is just a few that are commonly used. (This table appearance does not change between 16.x & 17.x)
Switch#show platform hardware fed active fwd-asic resource utilization
Resource Info for ASIC Instance: 0
Resource Name Allocated Free <-- Number available. If this is at max (or very close) possible issues can occur
------------------------------------------
RSC_DI 61 41805 <-- DI = Destination Index
RSC_RI 3 57317 <-- RI = Rewrite Index
RSC_RI_REP 10 49143 <-- RI_REP = Multicast Rewrite/Replication Index
RSC_SI 519 64849 <-- SI = Station Index
<...snip...>
Switch#show platform hardware fed switch active fwd-asic resource rewrite utilization
Resource Info for ASIC Instance: 0
Rewrite Data Allocated Free <-- Rewrite specific hardware resources
-------------------------------------------------------
PHF_EGRESS_destMacAddress 0 32000 <-- Destination MAC (Layer 3 next hop MAC rewrite)
IPV4_TUNNEL_SRC_IP_ADDR 0 16 <-- IPv4 Tunnel Source IP
IPV4_TUNNEL_DEST_IP_ADDR 0 256 <-- IPv4 Tunnel Destination IP
IPV4_GRE_TUNNEL_DEST_IP_ADDR 0 1024 <-- GRE specific tunnel Destination IP
GRE_HEADER 0 684
GRE_KEY 0 684 <-- GRE keys
NAT_L3_DEST_IPV4 0 7168 <-- NAT Layer 3 IPv4 Destination
NAT_DST_PORT_UNICAST 0 8192 <-- NAT Destination Ports
NAT_L3_SRC_IPV4 0 8192 <-- NAT Layer 3 IPv4 Source
NAT_SRC_PORT_UNICAST 0 8192 <-- NAT Source Ports
<...snip...>
Switch#show platform hardware fed active fwd-asic resource features ip-adjacency utilization
IPv4 unicast adjacency resource info
Resource Info for ASIC Instance: 0 [A:0, C:0] <-- Per-ASIC & Core [Asic 0, Core 0]
Shared Resource Name Allocated Free Usage% <-- Shared resources
----------------------------------------------------------------------------------
RSC_RI 3 57317 0.01 <-- RI = Rewrite Index
RSC_SI 519 64849 0.79 <-- SI = Station Index
<-- These are tables that maintain port map info, and other necessary details to send packets
<-- These resources are shared, and used by many features
Rewrite Data Allocated Free Usage% <-- Rewrite resources (Dest MAC)
-----------------------------------------------------------------------------------
PHF_EGRESS_destMacAddress 0 32000 0.00 <-- Destination MAC usage
<-- When a packet is sent to a next hop, it must be written with a destination MAC address
CAM Table Utilization Info Allocated Free Usage% <-- EM (Hash) & TCAM resources
------------------------------------------------------------------------------------
IP Route table Host/Network 0/ 0 0/32768 0.00/ 0.00
<-- Resource that programs prefixes, either local/host routes (EM/Hash) or Shorter /31 or less prefixes (TCAM)
Note: 9500H & 9600 ASIC have the ability to store shorter prefix mask in Hash memory (called EM/LPM) versus TCAM. See the IPv4 specific scenario for more details
Cisco IOS XE 16.x General Hardware Validation Commands
show platform hardware fed active fwd-asic resource tcam utilization command is the first place you want to look to evaluate if you have a hardware scale issue. (It displays information on a per-ASIC basis). You can see that in 16.x train the output is less granular, and some of the descriptions vary.
In most cases, the Table list is clear with a couple exceptions:
- Directly or indirectly connected routes. This needed improvement, as it was not clear that directly means both ARP adjacent routes AND /32 host routes. Indirectly means any route /31 or shorter
- Policy Based Routing ACEs include NAT related configuration. Keep this in mind when NAT is the feature of concern.
Switch#show platform hardware fed switch active fwd-asic resource tcam utilization
CAM Utilization for ASIC [0]
Table Max Values Used Values
--------------------------------------------------------------------------------
Unicast MAC addresses 32768/1024 19/21
L3 Multicast entries 8192/512 0/9
L2 Multicast entries 8192/512 0/11
Directly or indirectly connected routes 24576/8192 3/19 <-- First value 24576 = EM / Second value 8192 = TCAM
QoS Access Control Entries 5120 85
Security Access Control Entries 5120 126
Ingress Netflow ACEs 256 8
Policy Based Routing ACEs 1024 22
Egress Netflow ACEs 768 8
Flow SPAN ACEs 1024 13
Control Plane Entries 512 255
Tunnels 512 17
Lisp Instance Mapping Entries 2048 3
Input Security Associations 256 4
SGT_DGT 8192/512 0/1
CLIENT_LE 4096/256 0/0
INPUT_GROUP_LE 1024 0
OUTPUT_GROUP_LE 1024 0
Macsec SPD 256 2
Note:The commands listed here did not have a CLI change between 16 and 17 code trains, and they are only described once in the 17.x section of this document.
show platform hardware fed <switch> active fwd-asic resource utilization <-- SI/RI/DI/etc (other related resources)
show platform hardware fed <switch> active fwd-asic resource rewrite utilization <-- IP Adjacency. LISP adjacency, Tunnel Adjacency, etc
Per-Feature Hardware Validation Commands
Scenario: IPv4 Prefixes
IPv4 hardware validation can be found on this page Understand IPv4 Hardware Resources on Catalyst 9000 Switches
Symptom the resource is beyond scale.
- Device or prefix reachability issues. While routes that exist or devices can remain reachable, any new or updated prefixes are not reachable.
- Log messages indicate the hardware is not able to take new object updates.
- Object layer, which programs software into hardware become congested.
- Absent entries at the impacted hardware layer (in this case the FIB is the impacted layer).
IPv4 Syslogs
If you run out of a particular IPv4 FIB or Adjacency resource, SYSLOG message are generated by the system
IPv4 FIB Log Message
|
Definition
|
Recovery Action
|
%FED_L3_ERRMSG-3-RSRC_ERR: Switch 1 R0/0: fed: "Failed to allocate hardware resource for fib entry due to hardware resource exhaustion" error.
|
Hardware reserved for IPv4 FIB entries has run out of space (EM or TCAM).
|
Summarize routes or take some other action to reduce the scale of FIB entries (this can be exact match or TCAM, whichever one is exhausted).
|
%FED_L3_ERRMSG-3-RSRC_ERR: R0/0: fed: "Failed to allocate hardware resource for adj entry - rc:1"error.
|
The Adjacency table is exhausted. This is the table in hardware where next hop destination MAC addresses are stored.
|
Reduce the scale number of directly connected (ARP adjacent) hosts
|
Scenario: ACL
ACL hardware validation can be found on this page Validate Security ACLs on Catalyst 9000 Switches
ACL Syslogs
If you run out of a particular Security ACL resource, SYSLOG message is generated by the system (interface, Vlan, label, and so on values can differ).
ACL Log message
|
Definition
|
Recovery Action
|
%ACL_ERRMSG-4-UNLOADED: Switch 1 fed: Input <ACL> on interface <interface> could not be programmed in hardware and traffic can be dropped.
|
ACL is Unloaded (held in software).
|
Investigate TCAM scale. If beyond scale, redesign ACLs.
|
%ACL_ERRMSG-6-REMOVED: 1 fed: The unloaded configuration for Input <ACL> on interface <interface> has been removed for label <label>asic<number>
|
Unloaded ACL configuration is removed from interface.
|
ACL is already removed, no action to take.
|
%ACL_ERRMSG-6-RELOADED: 1 fed: Input <ACL> on interface <interface> has now been loaded into the hardware for label <label> on asic<number>
|
ACL is now installed in Hardware.
|
Issue with ACL is now in hardware resolved, no action to take.
|
%ACL_ERRMSG-3-ERROR: 1 fed: Input <ACL> IP ACL <NAME> configuration could not be applied on <interface> at bindorder <number>
|
Other type ACL error (such as dot1x ACL install failure)
|
Confirm ACL configuration is supported, and TCAM is not beyond scale
|
%ACL_ERRMSG-6-GACL_INFO: Switch 1 R0/0: fed: Logging is not supported for GACL
|
GACL has log option configured.
|
GACL do not support log. Remove log statements from GACL.
|
%ACL_ERRMSG-6-PACL_INFO: Switch 1 R0/0: fed: Logging is not supported for PACL
|
PACL has log option configured.
|
PACL do not support log. Remove log statements from PACL.
|
%ACL_ERRMSG-3-ERROR: Switch 1 R0/0: fed: Input IPv4 Group ACL implicit_deny:<name>: configuration could not be applied on Client MAC 0000.0000.0000
|
(dot1x) ACL fails to apply on target port.
|
Confirm ACL configuration is supported, and TCAM is not beyond scale.
|
Scenario: NAT
NAT hardware validation can be found on this page Configure and Verify NAT on Catalyst 9000 Switches
NAT Syslogs
NAT Feature does not have any syslog that prints when hardware resources are out of scale. Cisco bug ID CSCvz46804 was filed as an enhancement to add these logs.
If you experience NAT issues and want to verify hardware resource usage, check show platform hardware fed switch active fwd-asic resource tcam utilization (the PBR ACL region is highly utilized when NAT TCAM is depleted).
Also verify you have configured NAT in accordance with the limitations noted here: Limitations of NAT
Scenario: MPLS
MPLS hardware validation can be found on this page Configure and Verify MPLS on Catalyst 9000 Switches
MPLS Syslogs
If you run out of a particular resource, such as MPLS labels, SYSLOG message are generated by the system.
Key points to remember:
- MPLS LABEL is used for label disposition. (This resource is consumed when prefixes are learned from a local CE)
- LSPA is used for label imposition. (This resource is consumed when prefixes are learned from a remote PE)
MPLS Log Message
|
Definition
|
Recovery Action
|
%FED_L3_ERRMSG-3-RSRC_ERR: Switch 1 R0/0: fed: "Failed to allocate hardware resource for fib entry" error due to hardware resource exhaustion.
|
Hardware reserved for IP prefixes has run out of space (EM or TCAM).
|
Take one of these actions to reduce the number of prefixes learned by the local or remote PE:
1. Summarize prefixes at CE.
2. Change label allocation mode from per-prefix to per-vrf.
|
%FED_L3_ERRMSG-3-mpls_out_of_resource: Switch 1 R0/0: fed: "Out of resource for MPLS LABEL ENTRY" error. Failed to program local label:8205 (8192/8192) in hardware.
|
Local label Allocation: Hardware reserved for MPLS local labels has run out of space (EM or TCAM).
|
Take one of these actions to reduce number of labels used on local PE:
1. Summarize prefixes at local CE or local PE.
2. Change label allocation mode from per-prefix to per-vrf on the local PE.
|
%FED_L3_ERRMSG-3-MPLS_LENTRY_PAUSE: Switch 1 R0/0: fed: Critical limit reached for MPLS LABEL ENTRY resource. Lentry Create PAUSED.
|
Local label Allocation: Hardware reserved for MPLS local labels has run out of space (EM or TCAM).
|
Take one of these actions to reduce number of labels used on local PE:
1. Summarize prefixes at local CE or local PE.
2. Change label allocation mode from per-prefix to per-vrf on the local PE.
|
%FED_L3_ERRMSG-3-mpls_out_of_resource: Switch 1 R0/0: fed: "Out of resource for MPLS LSPA. Failed to program in hardware" error.
|
Remote label allocation: Hardware reserved for LSPA remote labels has run out of space.
|
Take one of these actions to reduce number of labels used on remote PE:
1. Summarize prefixes at remote CE or remote PE.
2. Change label allocation mode from per-prefix to per-vrf on the remote PE.
|
Scenario: QoS
QoS hardware validation can be found on this page Understand QoS Hardware Resources on Catalyst 9000 Switches
QoS Syslogs
If you run out of QoS related resources, SYSLOG messages are generated by the system:
QoS Related Syslog Message
|
Definition
|
Recovery Actions
|
%FED_QOS_ERRMSG-4-TCAM_OVERFLOW: Switch 1 R0/0: fed: "Failed to program" error TCAM for policy-map ingress_pmap2 on GigabitEthernet1/0/10.
|
Hardware (TCAM) reserved for QoS entires has run out of space.
|
- Ensure you have a valid / supported configuration.
- Review the remainder of this document to validate the current scale utilization of your switch and possible steps to reduce if it is overutilized.
|
%FED_QOS_ERRMSG-3-QUEUE_SCHEDULER_HW_ERROR: Switch 1 R0/0: fed: "Failed to configure queue scheduler" error for GigabitEthernet1/0/27.
|
Installation to hardware of QoS queue scheduler has failed.
|
- Verify your configuration is supported
- Review the QoS configuration guide for your specific platform and version of software.
For 9200LONLY: Review Cisco bug IDCSCvz54607and Cisco bug IDCSCvz76172
|
FED_QOS_ERRMSG-3-QUEUE_BUFFER_HW_ERROR: R0/0: fed: "Failed to configure default queue buffer" error.
|
Installation to hardware of QoS queue buffers has failed.
|
- Verify your configuration is supported.
- Review the QoS configuration guide for your specific platform and version of software.
- Review Cisco bug IDCSCvs49401
|
Related Information
Technical Support & Documentation - Cisco Systems
Cisco Catalyst 9200 Series Switches Data Sheet
Cisco Catalyst 9300 Series Switches Data Sheet
Cisco Catalyst 9400 Series Switches Data Sheets
Cisco Catalyst 9500 Series Switches Data Sheets
Cisco Catalyst 9600 Series Switches Data Sheet
Cisco Catalyst 9500 Architecture White Paper
Cisco Bug IDs
Cisco bug ID CSCvg60292 (When maximum routes in TCAM are hit, no routes are able to be installed in Hash table.)
Cisco bug ID CSCvx57822 (Hardware Tables need 90% utilization watermark.)
Cisco bug ID CSCvs49401
Cisco bug ID CSCvz54607
Cisco bug ID CSCvz76172