Introduction
This document describes faults with code F1545 to F1552 which are cosmetic faults related to the way ACI categorize dropped packets.
Problem
By default, ACI has atomic counters enabled.
These counters check drops between leafs directly or from leafs to spines
rule : on-going-atomic-counter
In this case, the fault codes are F1545, F1546, F1547, F1548, F1549, F1550, F1551, and F1552 which are leaf to leaf counters.
Why are they reported?
Those counters are valid so long as the communication is TEP to TEP only, (no VPC for example). It was on the first versions for ACI monitoring, but the design and implementation of ACI has long made these faults irrelevant. Regardless of the version your fabric is currently at, they can be acknowledged and deleted by disabling the feature.
Some packets are counted as drops but are not actual tenant traffic drops. For example:
- Leaf vPC pairs use a vPC pair Virtual IP (vPC VIP) to transmit packets via their individual tunnel interfaces to other leafs. ACI uses tunnel interfaces for packet statistics, since the vPC VIP is a virtual interface there is no corresponding tunnel interface for it on the individual leafs and as a result it has no tunnel interface to track drops against. So, packets destined to a vPC VIP are counted as drops on the receiving vPC peer leaf switch.
- vPC control plane packets between leaves and endpoint information exchange.
- Drops caused by contracts in place are also counted as drops, but these are expected (and not bad) since your contract are doing their work.
Workaround
Disable the Ongoing Atomic Counter feature, even if squelched, the number of ongoing atomic counter objects can lead to APIC performance issues in some fabrics with a large number of nodes.
There is an enhancement documented to disable ongoing atomic counters by default Cisco bug ID CSCwh67235 and this is the default behavior after 6.1.x versions.
Operations > Visualization, click Settings, choose Administrative State Disabled, then click Submit.
disable ongoing atomic counters
Be aware that even with disable atomic counters users can still (and must) use on-demand atomic counters documented on Configure Atomic Counter Policies documentation.
Ongoing are not valid and greatly contribute to scale/faults. On-demand are reactive and valid, disabling ongoing does not prevent using on-demand.
By disabling this feature are the packet drops a completely useless alert?
ACI has many different fault counters. If there are real drops, they show up under those fault codes.
For example, F100696 - ingress forwarding drop packets, more details on Explanations of Packet Drop Faults in ACI.
Related issues
We can see and slow APIC response or failure to respond to single request due to out of memory issues that have been caused for the dbgr object over the scale supported. This feature does not scale well with fabric with a large number of nodes.
Common Vefirications
Check for faults where the attribute rule contains "on-going-atomic-counter":
APIC# moquery -c faultInst -x 'query-target-filter=wcard(faultInst.rule,"on-going-atomic-counter")' | grep dn
dn : dbgs/ac/sdvpcpath-101-103-to-102-104/fault-F1546
dn : dbgs/ac/sdvpcpath-101-103-to-102-104/fault-F1548
dn : dbgs/ac/sdvpcpath-102-104-to-101-103/fault-F1548
dn : dbgs/ac/sdvpcpath-102-104-to-101-103/fault-F1546
dn : dbgs/ac/path-101-to-103/fault-F1545
dn : dbgs/ac/path-101-to-103/fault-F1547
dn : dbgs/ac/path-103-to-101/fault-F1545
dn : dbgs/ac/path-103-to-101/fault-F1546
Check DBGR services is running:
APIC# ps -ef | egrep "dbgr.bin|STIME"
UID PID PPID C STIME TTY TIME CMD
ifc 15785 1 1 May23 ? 07:57:39 /mgmt//bin/svc_ifc_dbgr.bin --x
Check dbgr service for records matching text "enough tokens". It can be selected by date and number of occurrences in this example, a total of 153506 for date 2024-05-20
APIC# zgrep "enough tokens" /var/log/dme/log/svc_ifc_dbgr*
svc_ifc_dbgr.bin.log.595460.gz:30038||2024-05-20T08:11:01.125937358+00:00||doer||INFO||co=doer:1:1:0x800000013811b8b:0||Dropping stimuli as doer does not have enough tokens||../common/src/framework/./core/proc/Doer.cc||1303
...
svc_ifc_dbgr.bin.log.595460.gz:30038||2024-05-20T08:11:03.126887965+00:00||doer||INFO||co=doer:19:1:0x98000000129a2c01:0||Dropping stimuli as doer does not have enough tokens||../common/src/framework/./core/proc/Doer.cc||1303
APIC# zgrep "enough tokens" /var/log/dme/log/svc_ifc_dbgr* | grep 2024-05-20 | wc -l
153506
References:
Cisco APIC Faults, Events, and System Messages Management Guide > Handling Expected Faults
Atomic Counters Guidelines and Restrictions
Configure Atomic Counter Policies
Explanations of Packet Drop Faults in ACI
Reference Bugs:
Cisco bug ID CSCwh67235 : Disable Ongoing Atomic Counters by Default
Cisco bug ID CSCuz99892 : TEP-to-TEP atomic counters are unreliable
Cisco bug IDCSCvp07545 : Receiving faults F1545 & F1547 faults in ACI fabric
Cisco bug ID CSCwf18707 : Raised fault if dbgAcPathA scalability is exceeded