ACI Upgrade/Downgrade Architecture

High Level Summary of APIC Upgrades and Downgrades

When performing an upgrade or downgrade of an APIC cluster, there is a certain sequence of events that occur to allow for the upgrade or downgrade of each APIC separately, along with ensuring that the data on the upgraded or downgraded APIC will be compatible with the target image. Most of these events happen in the background, so it’s important to understand what you should expect to see when you trigger an upgrade or downgrade of the APIC cluster.

  1. Image is uploaded to the firmware repository. The image is synced to all APIC cluster members.

  2. Upgrade or downgrade is triggered to a specific target version.

  3. Each APIC in the cluster goes through the process to install the new image in the first grub partition. This happens in parallel to speed up the upgrade or downgrade process.

  4. Once the image installation is completed, each APIC takes its turn to go through a data conversion process of the database files in a sequential order. When this occurs, the following events happen:

    1. The Data Management Engine (DME) processes shut down. This includes the nginx web server which services all API requests. Because of this, you will lose access to the UI/API, as well as any other backend application that runs on that APIC.

    2. The database files are converted from the initial version to the target version. The amount of time this takes is dependent on the size of the database which includes configuration deployed on the ACI fabric, operational data, record objects such as audit logs and so on. Because of this, the total time to complete the conversion will vary between deployments.

      When your source version is APIC release 6.0(3) or newer, the database conversion process has been enhanced and users may notice a shorter wait time for this process compared to the previous releases.

      When your source version is APIC release 6.2(1) or newer, APIC 1 acts as the central orchestration point for the upgrade and performs the database conversion for all database distributed across the APIC cluster at once. Other APICs perform the conversion only for the special ones that are local to themselves. This provides a more robust conversion process by reducing the risk of cluster synchronization issues.


      Note


      It’s critical that there is no disruptive action taken to the APIC at this stage, as it could result in data loss or partial configuration if this stage does not complete successfully. See Guidelines and Limitations for Upgrading or Downgrading for more information.


    3. The APIC will then reload after the database conversion process has completed successfully and will boot up on the version of software defined in the target version.

  5. After the APIC that performed the reload comes back online, the sequence of events outlined in 4 happen to the next APIC in the cluster. In the meantime, the APIC that came back online initiates the post upgrade activities as the final check of the database. This process repeats itself until all members of the cluster have been upgraded or downgraded.

  6. Prior to Cisco APIC 6.0(6), the upgrade of the APIC cluster was considered complete when all APICs came back online and Fully-Fit regardless of the post upgrade activities. Starting from Cisco APIC 6.0(6), the status of APIC cluster upgrade will transition to “Post Upgrade Pending” until the post upgrade activities are complete on each APIC node. Then, the upgrade status will finally become “Completed”.


    Note


    The post upgrade activities on APIC should be successfully completed before proceeding with the switch upgrades. In general, it is recommended to run the pre-upgrade validation script before the APIC cluster upgrade and before the switch upgrades respectively because the upgrade of the APIC cluster and switches may not occur during the same maintenance window. However, prior to Cisco APIC 6.0(6), it is highly encouraged to run the script before the switch upgrades even when it takes place within the same maintenance window because the script checks not only the pre-upgrade validations but also the status of the post upgrade activities to make sure the fabric is ready to proceed with the switch upgrades after the APIC cluster upgrade.


  7. Starting from Cisco APIC 6.1(2), the same upgrade steps are performed on standby APIC nodes after all active APIC nodes are upgraded. See the Getting Started Guide for Cisco APIC 6.1 for details.

Detailed Summary of APIC Upgrade

The following section provides a detailed summary of APIC upgrades.

Understanding APIC Upgrade and Downgrade Stages

Beginning with Cisco ACI release 6.2(1), the APIC upgrade process has been enhanced. This section describes the APIC upgrade process. The upgrade progresses through several stages, each building on the completion of the previous one.

  1. The upgrade process always starts on APIC1

  2. APIC1 performs several cluster-wide upgrade stages and is the first APIC to be upgraded.

  3. When APIC1 is ready to upgrade itself, it hands off control to APIC2 using an internal API call.

  4. APIC1 then operates in node upgrade mode and completes its own upgrade.

  5. During this time, UI is redirected to APIC 2 during this time.

  6. APIC2 hands control back to APIC1 and UI is redirected.

  7. APIC1 continues upgrading the remaining APICs in the cluster in the same way.

  8. When all APICs have been upgraded to the target version, APIC1 performs the final cluster-wide stages and completes the upgrade cycle.

The table below explains what occurs at each stage of the upgrade process.

Name

Stage Level

Description

Catastrophic Failure Check

Cluster Wide

Initiate pre-upgrade validations to ensure upgrade can be performed.

Cluster Upgrade Preparation

Cluster Wide

Verify upgrade callbacks and set the cluster version to the requested upgrade version.

Upgrade Preparation

Node Level

Setup pre-upgrade state for all states.

Stage New OS

Node Level

Sets up the environment for data conversion using the extracted target image.

Freeze Database

Node Level

Freeze database for data conversion and monitor permdown completion.

Stop APIC Services

Node Level

Shut down services.

Stage Database Conversion

Node Level

Setup for data conversion.

Database Conversion

Node Level

Performs data conversion.

Install New OS

Node Level

Install the new OS.

Reboot

Node Level

Based on the upgrade, restart container or kexec the box to perform the reload operation.

Finalize OS Configuration

Node Level

Finalization of state done.

Cleanup

Node Level

Verify post upgrade activity.

Validate Cluster Health

Node Level

Verify that all nodes are in a healthy state.

Finalize Database State

Node Level

Verify if cluster is healthy

Validate Cluster Upgrade

Node Level

Perform post cluster upgrade checks.

Finalize Cluster State

Cluster Wide

Verify if post upgrade is completed on all shards and verify the APIC version

Validate Cluster State

Cluster Wide

Reset upgrade call backs on all APICs

Default Interface Policies in the 5.2(4) release and later

When you upgrade to the 5.2(4) or later release, the Cisco Application Policy Infrastructure Controller (APIC) creates the following default interface policies automatically:

  • CDP (cdpIfPol)

    • system-cdp-disabled

    • system-cdp-enabled

  • LLDP (lldpIfPol)

    • system-lldp-disabled

    • system-lldp-enabled

  • LACP (lacpLagPol)

    • system-static-on

    • system-lacp-passive

    • system-lacp-active

  • Link Level (fabricHIfPol)

    • system-link-level-100M-auto

    • system-link-level-1G-auto

    • system-link-level-10G-auto

    • system-link-level-25G-auto

    • system-link-level-40G-auto

    • system-link-level-100G-auto

    • system-link-level-400G-auto

  • Breakout Port Group Map (infraBrkoutPortGrp)

    • system-breakout-10g-4x

    • system-breakout-25g-4x

    • system-breakout-100g-4x

During the upgrade, if there is already a policy with the exact same name and the exact same parameters as any of these policies, the system takes ownership of those policies and the policies become read-only. If instead the parameters are different, such as the system-cdp-disabled has a setting "enabled," then the policies will continue to be user policies. That is, a user can modify the policies.

High Level Summary of Switch Upgrade and Downgrade

When performing an upgrade or downgrade of an ACI switch node, there is a certain sequence of events that occur to the device(s) being upgraded or downgraded. Most of these events happen in the background, so it’s important to understand what you should expect to see when you trigger an upgrade of an ACI switch node.

  1. The image is pushed from the APIC to the switch.

  2. The filesystem and bootflash of the switch is checked to ensure that there is enough space to extract the image.

  3. The image is extracted, and the primary grub partition is updated to the target version. The older version is moved into the recovery partition.

  4. The BIOS and EPLD images are upgraded if applicable.

  5. The switch will do a clean reload, and will re-join the ACI fabric running the newer version of software.

Starting with release 2.1(4), support was added for the third-party Micron Solid State Drive (SSD) firmware auto update. As part of the standard Cisco APIC software upgrade process, the switches will reboot when they upgrade. During that boot-time process, the system will also check the current SSD firmware and will automatically perform an upgrade to the SSD firmware, if necessary. If the system performs an SSD firmware upgrade, the switches will then go through another clean reboot afterward.

Detailed Summary of Switch Upgrade

The following sections provide a detailed summary of switch upgrades.

Understanding Switch Upgrade and Downgrade Stages

During an ACI switch node upgrade or downgrade, the upgrade or downgrade progress will advance based on the stages which have completed.

The following table provides more details on what happens at each stage of this upgrade or downgrade process:

Upgrade Progress

Install Stage

Description

0%

Firmware upgrade queued

Displayed when firmware is being downloaded to the switch from the APIC.

5%

Firmware upgrade in progress

Displayed when the upgrade installer is initiated, and the upgrade process has started.

45%

Firmware upgrade in progress

Displayed after the bootflash check has completed and the image extraction stage has begun.

60%

Firmware upgrade in progress

Image Extraction stage has completed and the grub partition is being updated with the new software information.

70%

Firmware upgrade in progress

The software has been updated on the switch.

80%

Firmware upgrade in progress

The EPLD and BIOS upgrade has begun.

95%

Firmware upgrade in progress

The EPLD and BIOS upgrade has completed, and switch reboot has been initiated.

100%

Upgraded Successfully

The switch has re-joined the fabric after the clean reload running target version of software.

Guidelines and Limitations for Upgrading or Downgrading

  • If at any point in time you believe the upgrade or downgrade has either stalled or failed, it is critical that you do not take any of the actions listed below:

    • Do not reload any Application Policy Infrastructure Controller (APIC) in the cluster.

    • Do not decommission any Cisco APIC in the cluster.

    • Do not change the firmware target version back to the original version.

    Instead, follow these guidelines:

    1. View the installer log files outlined in the Troubleshooting section if applicable (see APIC Installer Log Files and ACI Switch Installer Log Files). This will help in understanding if there is still activity ongoing on the devices being upgraded or downgraded.

    2. Collect the tech-support files outlined in the Troubleshooting section (see Collecting Tech-Support Files).

    3. Contact Cisco TAC if the upgrade or downgrade does not complete successfully and upload the tech-support files to the TAC case after it has been created.

  • The log record objects are stored only in one shard of a database on one of the Cisco APICs. Because of this, the log records are not accessible while the Cisco APIC is rebooting for an upgrade or downgrade, unlike other objects that can still be read through another Cisco APIC.

  • To upgrade to the Cisco APIC 6.0(2) release or later, you must perform the following procedure:

    1. Download the Cisco APIC 6.0(2) or later image and upgrade the APIC cluster to the downloaded release. Before this step is completed, do not download the Cisco Application Centric Infrastructure (ACI)-mode switch images to the Cisco APIC. The 6.0(2) release has both 32-bit and 64-bit switch images, but releases prior to 6.0(2) do not support 64-bit images. As a result, downloading the 64-bit images at this time might cause errors or unexpected results. However, if your Cisco APICs have the 5.2(8) release or later, except for the 6.0(1) release, you can download the switch images to the Cisco APIC before this step the same as you would with any other upgrade procedure prior to 6.0(2).

    2. Download both the 32-bit and 64-bit Cisco ACI-mode switch images to the Cisco APIC. Downloading only one of the images may result in errors during the upgrade process.

      Beginning with the 6.0(3) release, the switch determines which image to install from the Cisco APIC based on the available memory of the switch instead of based on a static mapping. If the available memory of the switch is less than or equal to 24 GB, the switch installs the 32-bit image. If the available memory of the switch is greater than or equal to 32 GB, the switch may be upgraded to the 32-bit image first, then upgraded again to the 64-bit image, which results in two reboots during the upgrade process.

      Modular spine switches install the 64-bit image regardless of the switch's available memory.

    3. Create the maintenance groups and trigger the upgrade procedure as usual. Cisco APIC automatically deploys the correct image to the respective switch during the upgrade process.

  • If you change the switch firmware group during the upgrade process, the upgrade process will not be completed and you may encounter some unexpected upgrade behavior.