Cisco APIC Getting Started Guide, Release 6.2(x)

Best practices for cluster management

Updated: January 12, 2026

Want to summarize with AI?

On this page

Overview

Key best practices for cluster management

Fabric Controller cluster size expansion

Guidelines for expanding the APIC cluster size

Cluster size reductions

Guidelines for reducing Cisco APIC cluster size

Controller replacements in the cluster

Guidelines for replacing a Cisco controller in a cluster

Overview

This topic explains best practices for cluster management, focusing on maintaining controller health, ensuring firmware consistency, and safeguarding data integrity during configuration, maintenance, and scaling operations to prevent data loss and cluster instability.

Best practices for cluster management help ensure cluster health, stability, and data integrity during configuration, maintenance, and scaling operations.

Cluster management best practices are guidelines to maintain controller health, firmware consistency, and data integrity.
They include verifying controller health, ensuring firmware version uniformity, and following correct procedures for adding, moving, or decommissioning controllers.
Proper adherence prevents data loss, cluster instability, and unsupported operations.

Key best practices for cluster management

Always verify the health of all controllers before making any changes to the cluster. Confirm that every controller is fully fit and resolve any health issues before proceeding.

Ensure that all controllers in the cluster run the same firmware version before adding, configuring, or clustering devices. Do not cluster controllers running different firmware versions.
Maintain at least three active controllers in your cluster, and add standby controllers as needed. For scalability requirements, consult the Verified Scalability Guide to determine the required number of active controllers for your deployment.
Ignore cluster information from controllers that are not currently active in the cluster, as their data may be inaccurate.

Once you configure a cluster slot with a controller’s ChassisID, you must decommission that controller to make the slot available for reassignment.

Wait for all ongoing firmware upgrades to complete and verify the cluster is fully fit before making additional changes.
When moving a controller, always ensure the cluster is healthy. Select the controller you intend to move, shut it down, physically move and reconnect it, and then power it on. After the move, verify through the management interface that all controllers return to a fully fit state.
Move only one controller at a time to maintain cluster stability.
When transferring a controller to a different set of leaf switches or to a different port within the same leaf switch, ensure the cluster is healthy first. Decommission the controller before moving it, and then recommission it after the move.
Before configuring the cluster, confirm that all controllers run the same firmware version to prevent unsupported operations and cluster issues.

Delete any unused OOB EPGs associated with a controller. Assigning multiple EPGs to a controller is not supported and can cause the cluster workflow IP address to be overridden by policy.
Log record objects are stored only in one shard on a single controller. If you decommission or replace that controller, those logs are permanently lost.
When decommissioning a controller, all fault, event, and audit log history stored on it is deleted. If you replace all controllers, all log history is lost. Before migrating a controller, manually back up its log history to prevent data loss.

Fabric Controller cluster size expansion

A Fabric Controller cluster size defines the total number of controllers operating together, determining redundancy, fault tolerance, and scalability within the cluster.

Defines the total number of controllers operating together within a Fabric Controller cluster.
Determines the level of redundancy and fault tolerance achievable by the cluster.
Influences the cluster’s ability to scale and handle increasing workloads.

Guidelines for expanding the APIC cluster size

Follow these guidelines to expand the APIC cluster size:

Schedule the cluster expansion at a time that ensures the fabric workload is not impacted.
If the health status of one or more APIC controllers in the cluster is not "fully fit," remedy the situation before proceeding.
Stage the new APIC controllers according to the instructions in their hardware installation guide. Verify in-band connectivity using a ping test.
Increase the cluster target size to match the sum of the existing and new controller counts. For example, if the existing controller count is three and you are adding three controllers, set the new cluster target size to six. The cluster adds each new controller sequentially until expansion is complete. If an existing controller becomes unavailable during the expansion process, cluster expansion will stop. Address the issue before continuing with cluster expansion.
The expansion may require more than ten minutes per appliance because the controllers must synchronize data when a new appliance is added. After the cluster successfully expands, the operational size and the target size will be equal. Allow the controllers to complete the cluster expansion before making additional changes to the cluster.

Cluster size reductions

A cluster size reduction is a system management operation that decreases the number of controllers in a cluster by decommissioning and removing selected controllers, triggering cluster synchronization to maintain system stability.

Decreases the number of controllers in a cluster.
Requires the orderly decommissioning and removal of selected controllers.
Triggers cluster synchronization processes to maintain system stability.

Guidelines for reducing Cisco APIC cluster size

Follow these guidelines to reduce the Cisco Application Policy Infrastructure Controller (APIC) cluster size and decommission the Cisco APICs that are removed from the cluster:

Reducing the cluster size increases the load on the remaining Cisco APICs. Schedule the Cisco APIC size reduction at a time when the demands of the fabric workload will not be impacted by the cluster synchronization.
If one or more of the Cisco APICs' health status in the cluster is not "fully fit," remedy that situation before proceeding.
Reduce the cluster target size to the new lower value. For example, if the existing cluster size is 6 and you will remove 3 controllers, reduce the cluster target size to 3.
Starting with the highest numbered controller ID in the existing cluster, decommission, power down, and disconnect the Cisco APIC one by one until the cluster reaches the new lower target size.

Upon the decommissioning and removal of each controller, the Cisco APIC synchronizes the cluster.
Note
After decommissioning a Cisco APIC from the cluster, promptly power it down and disconnect it from the fabric to prevent its rediscovery. Before returning it to service, do a wiped clean back to factory reset.

If the disconnection is delayed and a decommissioned controller is rediscovered, follow these steps to remove it:
1. Power down the Cisco APIC and disconnect it from the fabric.
2. In the list of Unauthorized Controllers, reject the controller.
3. Erase the controller from the GUI.
Cluster synchronization stops if an existing Cisco APIC becomes unavailable. Resolve this issue before attempting to proceed with the cluster synchronization.
Depending on the amount of data the Cisco APIC must synchronize upon the removal of a controller, the time required to decommission and complete cluster synchronization for each controller could be more than 10 minutes per controller.

Note

Failure to follow an orderly process to decommission and power down Cisco APICs from a reduced cluster can lead to unpredictable outcomes. Do not allow unrecognized Cisco APICs to remain connected to the fabric.

Note

Complete the entire necessary decommissioning steps, allowing the Cisco APIC to complete the cluster synchronization accordingly before making additional changes to the cluster.

Example of cluster size reduction

If a cluster originally contains six controllers and three are to be removed, administrators should set the cluster target size to three. Remove controllers one at a time, starting with the highest numbered controller ID, and follow established procedures to ensure reduction and synchronization are successful.

Controller replacements in the cluster

A controller replacement is a cluster maintenance operation that substitutes a failed or decommissioned controller with a new or spare unit, requires using the same initial provisioning parameters and software image as the controller being replaced, and maintains cluster synchronization and operational continuity when performed according to established procedures.

Substitutes a failed or decommissioned controller with a new or spare unit.
Requires using the same initial provisioning parameters and software image as the controller being replaced.
Maintains cluster synchronization and operational continuity when performed according to established procedures.

Guidelines for replacing a Cisco controller in a cluster

When replacing a Cisco controller in a cluster, observe the following guidelines to ensure a safe and successful process:

Verify that all controllers have a Fully Fit health status before beginning the replacement.
Schedule the Cisco APIC controller replacement at a time when the demands of the fabric workload will not be impacted by the cluster synchronization.
Make note of the initial provisioning parameters and image used on the Cisco APIC controller that will be replaced. The same parameters and image must be used with the replacement controller. The Cisco APIC proceeds to synchronize the replacement controller with the cluster.

Note
Cluster synchronization stops if an existing Cisco APIC controller becomes unavailable. Resolve this issue before attempting to proceed with the cluster synchronization.
You must choose a Cisco APIC controller that is within the cluster and not the controller that is being decommissioned. For example: Log in to Cisco APIC1 or APIC2 to invoke the shutdown of APIC3 and decommission APIC3.
CIMC policy configuration: Delete the CIMC policy for the standby and active APIC when replacing the standby APIC. If you do not delete the CIMC policy, ensure to update the CIMC policy for the active APIC after the replacement of the standby APIC is complete.

Perform the replacement procedure in the following order:

Make note of the configuration parameters and image of the APIC being replaced.
Decommission the APIC you want to replace (see Decommission a Cisco APIC in the cluster using the GUI )
Commission the replacement APIC using the same configuration and image of the APIC being replaced (see Commission a Cisco APIC in the cluster using the GUI )

Stage the replacement Cisco APIC controller according to the instructions in its hardware installation guide. Verify in-band connectivity with a PING test.

Note
Failure to decommission Cisco APIC controllers before attempting their replacement will preclude the cluster from absorbing the replacement controllers. Also, before returning a decommissioned Cisco APIC controller to service, do a wiped clean back to factory reset.
Depending on the amount of data the Cisco APIC must synchronize upon the replacement of a controller, the time required to complete the replacement could be more than 10 minutes per replacement controller. Upon successful synchronization of the replacement controller with the cluster, the Cisco APIC operational size and the target size will remain unchanged.

Note
Allow the Cisco APIC to complete the cluster synchronization before making additional changes to the cluster.
The UUID and fabric domain name persist in a Cisco APIC controller across reboots. However, a clean back-to-factory reboot removes this information. If a Cisco APIC controller is to be moved from one fabric to another, a clean back-to-factory reboot must be done before attempting to add such a controller to a different Cisco ACI fabric.

Need help?

Open a support case

(Requires a Cisco Service Contract)

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.