Following are the guidelines for ACI switch upgrades and downgrades:
Rule 1 – Divide your leaf and spine switches into at least two groups
For example:
-
Group ODD: leaf 101, leaf 103, spine 1001
-
Group EVEN: leaf 102, leaf104, spine 1002
Rule 2 – Determine how spine switches should be grouped
-
Always keep at least one MP-BGP route reflector (RR) spine switch up and running in each pod.
-
Always keep at least one spine switch with IPN connectivity up and running in each pod.
-
Never perform a graceful upgrade for a spine switch if the given pod has only one spine switch (in the case of multi-pod).
See Graceful Upgrade or Downgrade of ACI Switches for details.
For example:
|
Update Group
|
Pod 1
|
Pod 2
|
|
ODD
|
leaf 101, leaf 103, leaf 105
spine 1001 (RR, IPN)
spine 1003
|
leaf 201, leaf 203, leaf 205
spine 2001 (RR, IPN)
spine 2003
|
|
EVEN
|
leaf 102, leaf 104, leaf 106
spine 1002 (RR, IPN)
spine 1004
|
leaf 202, leaf 204, leaf 206
spine 2002 (RR, IPN)
spine 2004
|
Where:
Rule 3 – Determine how leaf switches should be grouped
For example:
|
Update Group
|
Pod 1
|
Pod 2
|
|
ODD
|
leaf 101 (vPC 11, APIC1)
leaf 103 (vPC 12, APIC2)
leaf 105 (vPC 13)
spine 1001
|
leaf 201 (vPC 21, APIC3)
leaf 203 (vPC 22)
leaf 205 (vPC 23)
spine 2001
|
|
EVEN
|
leaf 102 (vPC 11, APIC1)
leaf 104 (vPC 12, APIC2)
leaf 106 (vPC 13)
spine 1002
|
leaf 202 (vPC 21, APIC3)
leaf 204 (vPC 22)
leaf 206 (vPC 23)
spine 2002
|
Where:
Rule 4 – Understand the concurrent capacity in switch update groups
General
-
Each update/maintenance group should contain a maximum of 80 switch nodes.
-
The concurrent capacity (switches that are upgraded or downgraded simultaneously) decides how many switches should be upgraded
or downgraded simultaneously within the same update/maintenance group. However, we recommend that you create separate update
groups to upgrade or downgraded switches on different schedules instead of relying on the concurrent capacity setting because
the concurrent capacity setting doesn’t let you manage which switches in the same group are to be upgraded or downgraded at
the same time.
-
If both leaf nodes in the same vPC pair are in the same switch upgrade or downgrade group, only one of the leaf nodes is upgraded
or downgraded at a time regardless of the concurrent capacity.
-
Starting from Cisco APIC release 4.1(1), when graceful upgrade or downgrade is enforced and there are no other operational spine switches in the same
pod, the upgrade or downgrade is rejected regardless of the concurrent capacity setting.
Prior to Cisco APIC release 4.2(5):
-
Even in the same update group, switches are upgraded or downgraded only one pod at a time.
-
The default concurrent capacity per group is 20.
If you have more than 20 switches in the same group, you can use upgrade scheduler to change the capacity to unlimited.
See the Upgrading the Leaf and Spine Switch Software Version section for details:
From Cisco APIC release 4.2(5):
-
Switches in the same update group are upgraded or downgraded simultaneously, regardless of pods.
-
The default concurrent capacity per group is unlimited.
The above enhancements from Cisco APIC release 4.2(5) take effect as soon as the Cisco APICs are upgraded to 4.2(5) or later. For instance, when the Cisco APICs are upgraded to 4.2(5) and the switches are still at release 13.2(10), the above enhancements will be effective when the
switch is upgraded from 13.2(10) to 14.2(5).
This enhancement will help you reduce the time it takes to upgrade your switches.
Rule 5 – Save time by downloading switch images beforehand
Even after you have downloaded Cisco APIC and switch images to the Cisco APIC's firmware repository, the switches still need to download the image from the Cisco APICs. In later releases, this operation can be performed separately from the actual upgrade procedure. This is called pre-download
and is equivalent to Step 7 in Workflow to Upgrade or Downgrade the Cisco ACI Fabric.
Prior to switch release 14.1(1):
Not supported. Switches download the image from Cisco APICs when the upgrade or downgrade is triggered.
Switch release 14.(1) - 15.0(x):
-
Pre-download can be performed through the upgrade scheduler.
-
Following is the recommended procedure:
-
Create update groups with the scheduler set far into the future (such as 10 years in the future). This will trigger switches
to download the image from Cisco APICs immediately.
-
When it’s the time to start the upgrade in the maintenance window, edit the same groups and change the Upgrade Start Time to Now.
-
If the current version of the switches is 14.2(5) or later, the Cisco APIC GUI shows the progress of the pre-download.
Switch release 15.1(1) or later:
The above enhancement (pre-download) from switch release 14.1(1) takes effect only after both the Cisco APICs and the switches are upgraded or downgraded to the corresponding versions. For example, when the Cisco APICs are upgraded to 4.2(7) and the switches are still on 13.2(10), pre-download is not available to upgrade the switches from
13.2(10) to 14.2(7). On the other hand, when the Cisco APICs are upgraded to 5.2(1) and the switches are still in 14.2(7), pre-download is performed through the new Cisco APIC GUI using Begin Download for switch upgrades from 14.2(7) to 15.2(1).
Graceful Upgrade or Downgrade of ACI Switches
If you want to isolate a switch from user traffic when performing an upgrade or downgrade procedure, it's helpful to become
familiar with the different terms and methods available to better understand what is supported and what is not supported in
these situations:
-
Graceful Insertion and Removal (GIR): The operation used to isolate a switch from user traffic.
-
Maintenance mode: Used to isolate a switch from user traffic for debugging purposes. You can put a switch in maintenance mode by enabling the Maintenance (GIR) field in the Fabric Membership page in the Cisco APIC GUI, located at (right-click on a switch and choose Maintenance (GIR)).
If you put a switch in maintenance mode, that switch is not considered as a part of the operational ACI fabric infra and it will not accept regular Cisco APIC communications. Therefore, performing a firmware upgrade or downgrade for a switch in this state is not supported, because
the process may fail or may get stuck in an incomplete status indefinitely if you attempt to perform a firmware upgrade or
downgrade on the switch while the switch is in this state.
-
Graceful Upgrade: Used to reload a switch after it is isolated from user traffic during an upgrade procedure. Switches are programmed to reboot
automatically at a certain point during the firmware upgrade process; this operation will automatically perform GIR prior
to that reboot. You can find the Graceful Maintenance option (releases prior to release 5.1) or the Graceful Upgrade option (release 5.1 and later) for a switch in an update group in in the Cisco APIC GUI.
If you wish to halt the procedure after the switch is isolated from user traffic and before it is reloaded in order to ensure
the user traffic is flowing through redundant paths, such an operation is currently not supported in ACI.
Guidelines for ACI Switch Graceful Upgrade
All guidelines from Guidelines for ACI Switch Upgrades and Downgrades also apply to Graceful Upgrade. However, this section provides more information on several guidelines that are specifically critical for Graceful Upgrade.
-
As suggested in Rule 2 – Determine how spine switches should be grouped, do not upgrade all spine switches in a pod at one time, especially when you are performing a Graceful Upgrade in a Multi-Pod setup.
Otherwise, the upgrade will fail, leaving the spine switches isolated from the fabric indefinitely. This is because, as part
of the Graceful Upgrade process, IPN connectivity is brought down explicitly on each spine switch being upgraded gracefully so that it can isolate
itself from the fabric. Upgrading in this way results in the entire pod, including the spine switches themselves, to lose
communication with Cisco APICs and switches in other pods without the means to self-recover.
Due to this reason, if you are performing a Graceful Upgrade, you must put the spine switches from the same pod into different maintenance/update groups such that the switches get upgraded
separately. If the pod has only one spine switch, you must disable the Graceful Upgrade (or Graceful Maintenance) option prior to the upgrade. In case you fail to follow this procedure, refer to the workaround provided in CSCvn28063.
To avoid this issue, Cisco APIC 4.1(1) release introduced a safe mechanism to reject the upgrade of the last spine switch in a pod when Grace Upgrade is enforced. This block mechanism is also described in Rule 4 – Understand the concurrent capacity in switch update groups.
-
As suggested in Rule 3 – Determine how leaf switches should be grouped, you must put Cisco APIC-connected leaf switches into different maintenance/update groups so that two leaf switches connected to the same Cisco APIC are not upgraded at the same time.
Leaf Reload in the Absence of Cisco APIC
In releases before Cisco ACI 6.2(1), Cisco ACI switches could operate independently and forward traffic even if they lost
connectivity to all APICs. For example, if a switch reloads because of power loss or a kernel panic, it starts in a stateless
state, tries to download its configuration from the APICs, and fails to establish a connection with any of them.
Starting with Cisco ACI release 6.2(1), Cisco APIC automatically triggers snapshots of a switch for predefined events, such
as when a node joins or rejoins the cluster or when you make changes to the configuration.
If an ungraceful reload occurs, the switch restores its operational state from the most recent snapshot and rejoins the fabric.
It then reconciles its configuration with the latest version on the APICs to return to a stable state.
Collecting Snapshots
When a node joins the APIC cluster for the first time, the APIC initiates a baseline snapshot collection on the switch. The
initial snapshot is taken 15 minutes after the node joins the cluster. Additional snapshots are captured every 15 minutes
until there are ten in total. Any configuration changes in APIC will trigger up to two snapshots within each 24-hour period,
after the initial set is complete.
Example of a Baseline Snapshot
The example below shows the output from the first snapshot.
completedBaselineSnapCount : 10
completedConfigSnapCount : 0
completedCount : 10
dn : uni/controller/nodeidentpol/nodep-TEP-1-101/configsnap
failureCount : 0
failureDescription : N/A
failureReason : NoFailure
intervalMinutes : 15
lastAttemptTs : 2025-10-08T05:38:04.272+00:00
lastSucessTs : 2025-10-08T05:38:04.272+00:00
lcOwn : local
maxCount : 10
modTs : 2025-10-08T05:39:04.064+00:00
monPolDn : uni/fabric/monfab-default
nextSnapshotTime : 2025-10-08T05:37:04.216+00:00
required : no
rn : configsnap
state : completed
status :
totalAttempts : 10
triggeredBy : baseline
Example of a Snapshot Taken 24 Hours After Node Configuration Change
The example below shows the snapshot output taken 24 hours after the node’s configuration change.
# fabric.ConfigSnapshot
childAction :
completedBaselineSnapCount : 5
completedConfigSnapCount : 1
completedCount : 6
dn : uni/controller/nodeidentpol/nodep-TEP-1-102/configsnap
failureCount : 0
failureDescription : N/A
failureReason : NoFailure
intervalMinutes : 1140
lastAttemptTs : 2025-10-08T05:39:04.280+00:00
lastSucessTs : 2025-10-08T05:39:04.280+00:00
lcOwn : local
maxCount : 2
modTs : 2025-10-08T18:49:37.995+00:00
monPolDn : uni/fabric/monfab-default
nextSnapshotTime : 2025-10-08T18:53:37.999+00:00
required : yes
rn : configsnap
state : untriggered
status :
totalAttempts : 6
triggeredBy : config
Handling Different Reload Scenarios
Stateless Reload
There is no change in stateless reload behavior. During a stateless reload, the switch starts up without a stored state and
downloads policies from the APIC to resume operation. During the reload, any captured snapshots are deleted.
Graceful Reload
There is no change in graceful reload behavior. In a graceful reload, a snapshot is captured just before the switch reloads.
After the reload, the switch restores its state from the snapshot.
Ungraceful Reload
During an ungraceful reload, the switch attempts to restore its state from the most recent snapshot available.
Limitations
This limitation applies when a leaf switch reloads and Cisco APIC is unavailable.
-
When the port is brought up, a snapshot is taken every 24 hours. If you make a configuration change—such as adding a static
vPC—that is not included in the current snapshot, the downlink port may come up after a power cycle but will experience traffic
loss.