Upgrade Tasks

Starting from Release 3.9, MSX can perform in-service upgrades. These upgrades minimize system downtime compared to prior releases. Take note, however, that while you are upgrading MSX system components, you may experience occasional system stalls and may have to retry some actions.

This chapter contains the following topics:

Pre-Upgrade Tasks

Before upgrading MSX, you must run Ansible scripts that perform several actions, such as health checks, backing up keys and passwords, a data backup, and other tasks. This process has been structured so that MSX should continue to function even if some components are not perfectly upgraded. In the rare event of a catastrophic failure, the backups will enable you to revert to your previous MSX deployment.


Note

The following upgrade steps must be applied on top of MSX 3.10.0. After you upgrade MSX to 4.0.0, in Action Orchestrator, only default workflows will be present. Any workflows created in 3.9.3 will have to be recreated.


This section details actions that must be performed before initiating an MSX upgrade.

Health Check and Backups

Procedure


Step 1

Verify that the nodes and services in your existing deployment are healthy. This process takes approximately 5 minutes. If the platform health check indicates that any components are unhealthy, then you should recover those components prior to proceeding with the upgrade.

ansible-playbook checks/check-vms.yml 
Step 2

Run the following script from the 3.10 container to perform a data backup of the MSX instance that is in service. The data backup process allows you to take a backup of all persistent MSX data, such as Cassandra, Elasticsearch, and NSO data. This process is useful in cases where you might need to recover lost or corrupted MSX data from a backup. This process should take approximately 30 minutes.

ansible-playbook vms-backup.yml --extra-vars backup_tag=backup-tag-name 

This command creates a backup that is tagged with the current timestamp under the vms-backup folder. Optionally, you can use --extra-vars backup_tag=specify-name-of-backup (For example: ansible-playbook vms-backup.yml --extra-vars backup_tag=pre-4.0.0-upgrade). This ensures that the same backup is created for every run, thus enabling you to automate the restore process.

If the vms-backup.yml playbook fails, you should not proceed with the next steps. Instead, contact Cisco technical support for further guidance.

Step 3

Copy the files that are essential for the upgrade process. These files should not have changed from your first MSX install and they will be used for all future MSX versions. The commands for copying the files are:

cp /msx-3.10.0/ansible/ssh.cfg /msx-3.10.0/ansible/vms-backup/infra/
cp /msx-3.10.0/ansible/inventory/inventory /msx-3.10.0/ansible/vms-backup/infra/
cp /msx-3.10.0/ansible/group_vars/all/passwords.yml /msx-3.10.0/ansible/vms-backup/infra/
cp /msx-3.10.0/ansible/keys/id_rsa.pub /msx-3.10.0/ansible/vms-backup/infra/
cp /msx-3.10.0/ansible/keys/id_rsa /msx-3.10.0/ansible/vms-backup/infra/
cp /msx-3.10.0/ansible/group_vars/all/external_addresses.yml /msx-3.10.0/ansible/vms-backup/infra/
cp /msx-3.10.0/ansible/vms-backup/infra/<openrc> /msx-3.10.0/ansible/

Preparing the Container

Procedure


Step 1

Copy the msx-binaries-4.0.0-xxx.tar.gz and msx-container-4.0.0-xxx.tar.gz to the host where you plan on installing MSX.

Step 2

Use SSH to connect to the host where you will install MSX.

Step 3

Run this command on the host to load the MSX container image:


[root@CentOS localadmin] # docker load -i MSX-container-4.0.0-xxx.tar.gz
Step 4

Extract msx-binaries-4.0.0-xxx.tar.gz to /vms-binaries-mount-source.

Step 5

Get the <docker_image_ID> using the command:


[root@CentOS localadmin] # docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
<none>              <none> 	    e94ad25e65a9         5 days ago          383 MB
Step 6

Spin up the 4.0.0 container with the vms-backup-mount-source folder attached as a volume:


docker run --name <4.0.0-Container_Name> -v </vms-binaries-mount-source>:/msx-4.0.0/ansible/repo -v <vms-backup-mount-source>:/msx-4.0.0/ansible/vms-backup <docker image id> -it /bin/bash
Step 7

Copy the required files as shown:


cp /msx-4.0.0/ansible/vms-backup/infra/inventory /msx-4.0.0/ansible/inventory/inventory
cp /msx-4.0.0/ansible/vms-backup/infra/ssh.cfg /msx-4.0.0/ansible/ssh.cfg
cp /msx-4.0.0/ansible/vms-backup/infra/passwords.yml /msx-4.0.0/ansible/group_vars/all/passwords.yml
cp /msx-4.0.0/ansible/vms-backup/infra/external_addresses.yml /msx-4.0.0/ansible/group_vars/all/external_addresses.yml
cp /msx-4.0.0/ansible/vms-backup/infra/<openrc> /msx-4.0.0/ansible/<openrc>
 
Step 8

Create a folder to copy the keys into and perform the copy operation:


mkdir /msx-4.0.0/ansible/keys
cp /msx-4.0.0/ansible/vms-backup/infra/id_rsa /msx-4.0.0/ansible/keys
cp /msx-4.0.0/ansible/vms-backup/infra/id_rsa.pub /msx-4.0.0/ansible/keys
Step 9

(OpenStack Only) Source the V3 OpenStack configuration file to enable access to your OpenStack project, which will use the values within the openrc file:

source <path-to-openrc> 
Step 10

Navigate to the msx-4.0.0/ansible folder. All playbooks must be run from this folder.

cd /msx-4.0.0/ansible
Step 11

Upgrade ssh.cfg with the new version.

ansible-playbook utils/upgrade-sshcfg.yml
Step 12

Make the required changes in group_vars/all/main.yml, using the backed up main.yml as a reference.

Note 

Do not replace the MSX 4.0.0 variable file with the MSX 3.10.0 variable file. We recommend that you manually update the variables that were changed during MSX 3.10.0 deployment, including vms_domain, and vms_subdomain.

Step 13

Create a vault_password file with the same password used in MSX 3.10.0 and export the path as specified:

export ANSIBLE_VAULT_PASSWORD_FILE=<path to vault_password file>
Step 14

Generate any new passwords that were added in this release by running the following playbook.

ansible-playbook upgrade-passwords.yml --extra-vars @vms-backup/infra/passwords.yml
Step 15

Execute the MSX prerequisites playbook to prepare the new MSX 4.0.0 container for an upgrade. The playbook will perform password, security group-related, and other updates as required. This process should take approximately 5 minutes.

ansible-playbook upgrade-prereq.yml --extra-vars @vms-backup/infra/passwords.yml --skip-tags vnfoutside,csr_route_upgrade 

Upgrading Cisco MSX

At a high level, the process of upgrading MSX involves running old and new systems in parallel with all traffic being routed to the old system. When the Network Service Orchestrator is updated, any service updates or provisioning will be queued until the update is complete. Afterwards, traffic will be redirected to the new MSX version and the old instance will be deleted. The following section details the in-service upgrade process and provides additional explanations where necessary.

Procedure


Step 1

Upload the MSX isolated binaries to your S3 (AWS) / minio (OpenStack) bucket. This process should take approximately 15 minutes.

ansible-playbook upload-isolated-binaries.yml 
Step 2

(Optional) If you are installing the Datadog monitoring service to provide metrics for your infrastructure, make sure to follow the procedure in Prerequisites for All Datadog Deployment Scenarios . Use this command:

ansible-playbook upload-datadog-images.yml
Step 3

Edit each Service Pack deployment variables file so that it matches your deployment requirements. When you perform an upgrade, the previous changes to the file will be lost. Use the backup deployment variables file for each Service Pack to reinstate those changes. The Service Pack variable files are located at: /msx-4.0.0/ansible/group_vars/all/{servicepack_name}_variables.yml, where {servicepack_name} can be manageddevice, vbranch, sda, or sdwan.

Step 4

Upgrade the Infra Services to update the infrastructure components for the Platform Microservices and Service Packs. This process should take approximately 20–40 minutes.

ansible-playbook upgrade-infra-services.yml 
Step 5

Run the Blue/Green platform Microservice update, which will upgrade the Platform and Service Packs. Both 3.10.x and 4.0.0 Microservices will be running in parallel, however all traffic will be routed to 3.10.x at this stage. Additionally, all Microservices will be running in non-HA mode.

ansible-playbook blue-green-ms-upgrade.yml 
Step 6

Perform an upgrade of the Network Service Orchestrator (NSO) component. At this point, any updates to existing services or the provisioning of new services will be queued (and delayed) until the NSO upgrade has completed.

ansible-playbook bg-nso-upgrade.yml 
Note 

If you have an SD-WAN deployment with vManage connected, you must copy your external certificates and import them into the centralized MSX keystore. For more information, see Adding External Certificates to MSX.

Step 7

Run the switch-ms-routing.yml playbook. This will redirect traffic from MSX 3.10.x to 4.0.0, enable HA mode, and will delete all older microservice instances once complete.

ansible-playbook switch-ms-routing.yml 
Step 8

Upgrade the MSX user interface for the Platform and Service Packs.

ansible-playbook upgrade-ui.yml 
Step 9

Upgrade Action Orchestrator and the Workflow engine. The Workflow engine is placed in maintenance mode, so no new Workflow is created or updated during this time. This process should take approximately 40 minutes.

ansible-playbook upgrade-ao.yml
Step 10

Run the rebuild-inventory playbook to generate an inventory file for csrhub. The --tags parameter specifies the tasks that will be run within the playbook.

ansible-playbook utils/rebuild-inventory.yml --tags csrhub-inventory
Step 11

Upgrade OS and Kubernetes. This will update Kubernetes and the underlying Operating System kernel, security components, and any other required system updates. This upgrade will be performed one node at a time starting with the Kubernetes Master nodes.

ansible-playbook os-k8s-rolling-upgrade.yml 
Step 12

Run the hardening playbook to improve operating system security. Services are not interrupted while this playbook is being run.

ansible-playbook run-os-hardening.yml
Step 13

Update the NFV client secret to autogenerated (random) passwords.

ansible-playbook update-nfv-client-secret.yml
Step 14

Verify that the nodes and services in your upgraded deployment are healthy. This process should take approximately 5 minutes.

ansible-playbook checks/check-vms.yml 

Verifying an Upgrade

Log in to the MSX portal and ensure that all Microservices and UI components in Settings > Component Versions appear in 4.0.0. See Logging into the Portal.

Post-Upgrade Tasks

If you have upgraded from an MSX 3.10 or earlier instance with one shard, then your upgraded MSX 4.0.0 instance will also have one shard. If needed, you will be able to add additional shards after the upgrade. For more information, see Scaling NSO Shards.

Cleaning Up Post Upgrade

After you have completed updating MSX and verified that the system is performing as expected, you should run the following cleanup operations.

  1. Pruning Docker

    Performing a 'docker system prune' on all K8s nodes will allow you to remove all unused/stopped docker containers/networks/volumes/images.

    ansible-playbook utils/docker-prune.yml 
  2. Cleaning the Registry

    You can optionally change the default retention level for the registry. To change the number of image versions that will be kept by the system, update the registry_retention_number as shown below. Older registry image versions will be deleted.

    Override the default retention level by passing in --extra-vars

    'registry_retention_number=<number to keep>'

    For example:

    ansible-playbook cleanup-registry.yml --extra-vars 'registry_retention_number=2'
  3. Clean up older AWS IAM roles. For more information, see Cleaning Up AWS IAM Roles.

Recovering from Upgrade Issues

In the case of a failed upgrade, you can roll the system back to a known good state, using the rollback process.

Before You Begin:

Ensure that the required conditions are met for restoring the backup in the target environment.

  • The specified ansible variables are the same as the original environment:

    vms_domain

    vms_subdomain

    deployment_mode

    deployment_mode_env

  • The group_vars/all/passwords.yml is the same as the original environment.

  • The target environment has the same set of service packs deployed.

Upgrade Notes for SAML-Based SSO

For enhanced security, the following SAML-based SSO parameters in MSX 4.0.0 are turned on by default:
  • security.auth.saml.want-authn-request-signed

  • security.auth.saml.encrypt-assertion

Due to this change, if you had configured SAML-based SSO in the previous versions of MSX, SSO may stop working after the upgrade.

To avoid this issue, perform the following procedure after the upgrade:

Procedure


Step 1

Import Service Provider metadata into MSX.

Download service provider SAML metadata file.

To download Cisco SD-WAN (vManage) metadata, do the following:

  • In vManage, click Administration > Settings > Identify Provider Settings > Edit.

  • Click Enabled.

  • Navigate to Click here to download the SAML metadata and save the content in a file (for example: vmanage_metadata.xml).

  1. Save the metadata file to the following location in the Kubernetes node:

    /data/vms/heapdumps/usermanagementservice/vmanage_metadata.xml

    MSX Usermanagement services maps the above Kubernetes location to the following:

    /data/conf/vmanage_metadata.xml

    Specify the location (/data/conf/vmanage_metadata.xml) as the metadata file location while importing the file into MSX.

  2. Log in to the Cisco MSX Portal.

  3. In the main menu, go to Settings > SSO Configuration.

  4. In the Add SSO Client’s section, select the SAML service provider client ID (vManage in this case) and click the Edit icon.

    In the Attributes section, specify the vManage metadata file location (from step a) in the Metadata Source field. The metadata source field takes a URL or a file path.

    Note 

    Metadata fields appear only if you have selected SAML service provider client ID.

Step 2

[Applicable only for Cisco SD-WAN] Turn off the SAML security settings.

By default, the following SAML security parameters in MSX are set to true:

  • security.auth.saml.want-authn-request-signed

  • security.auth.saml.encrypt-assertion

For SAML service provider integration with MSX, if the above security parameters are set to ‘True,’ the auth request from the service provider must be signed, and the assertion sent back by MSX will be encrypted.

To turn off this default settings, do the following:

  1. Log in to Inception VM and access the kubernetes master node.

    ssh -i id_rsa centos@_INCEPTION_FLOATING_IP_ADDRESS_ -t ssh _kubernetes-master-1_IP_ADDRESS
  2. Use the following cURL commands:

    curl --request PUT -g -k -v -H "X-Consul-Token: {consul_token_value}" --data 'false' https://localhost:8500/v1/kv/userviceconfiguration/defaultapplication/security.auth.saml.want-authn-request-signed
    curl --request PUT -g -k -v -H "X-Consul-Token: {consul_token_value}" --data 'false' https://localhost:8500/v1/kv/userviceconfiguration/defaultapplication/security.auth.saml.encrypt-assertion
    Note 

    Replace {consul_token_value} with your actual consul token value from the passwords.yml file.

  3. Save the above consul entries. To save, restart the UserManagement microservices from the Kubernetes master node using the following commands:

    kubectl -n vms delete -f /etc/kube-manifests/usermanagementservice-rc.yml
    kubectl -n vms create -f /etc/kube-manifests/usermanagementservice-rc.yml
    

Cisco MSX Utilities

Utility playbooks (located in the utils folder) may be provided by the development team to assist operators in performing specific tasks. This section will detail any available utilities for the current release.

Removing Packages

A remove-packages.yml utility can be used to remove specified packages.


Note

The remove-packages.yml playbook should be used with caution as the actions performed by this playbook cannot be undone. If you specify the wrong package_list parameter for this playbook, you can make your deployment inoperable.


Use this procedure to remove one or more packages:

Procedure


Step 1

Set the environment variable for the Ansible Vault password:

export ANSIBLE_VAULT_PASSWORD_FILE=<vault_pwd_path> 
Where,
<vault_pwd_path>=/msx-4.0.0/ansible/vault/password.txt
Step 2

Change directory to the Ansible folder.

cd /vms-deploy/ansible 
Step 3

Run the playbook, which is in the utils folder. For the --extra-vars argument, specify any (comma-separated) packages that you would like to uninstall.

In our example, we will remove the openssh-clients package. This operation is required for FedRAMP deployments.

ansible-playbook utils/remove-packages.yml --extra-vars "package_list=openssh-clients" 
Step 4

You will be prompted for the Qualified Domain Name to confirm the environment in which you are performing this action. Type the QDN to confirm.

Step 5

Examine the Play Recap output to confirm that the desired actions were successfully completed.


Cleaning Up AWS IAM Roles


Note

Do not run this utility until all the environments in your AWS account have been upgraded to MSX 4.0.0.


A post-upgrade utility playbook has been created to clean up older AWS IAM roles after the names have automatically been transitioned to the “vms_subdomain.vms_domain” prefix naming convention. This convention ensures unique IAM role names, which enforce correct deployment access.

Use this utility with extreme caution as it will delete the old IAM roles/policies: “K8SMasterRole”, “K8SNodeRole”, and “InceptionRole”. Any server still using these older policies will stop working properly. Before using the utility, make sure that your servers are using the new AWS IAM role names.

To clean up the AWS IAM Roles:

  1. Change to the Ansible directory.

    cd /msx-4.0.0/ansible
  2. Set the environment variable for the Ansible Vault password:

    export ANSIBLE_VAULT_PASSWORD_FILE=<vault_pwd_path>

    where

    <vault_pwd_path>=/msx-4.0.0/ansible/vault/password.txt
  3. Run the cleanup utility.

    ansible-playbook utils/remove-obsolete-iam-role.yml