Post-Installation Tasks

This chapter contains the following topics:

Validating an Installation

Before You Begin

Verifying Network Service Orchestrator Configurations

Before logging in to the Portal, it is important to ensure that the Cisco Network Service Orchestrator (NSO) has been loaded with the correct configuration settings. You can verify the configuration settings using the following procedure.


Note

There are multiple NSO instances if you are deploying more than one service pack. Therefore, these steps must be performed on the service pack-specific NSO node, for example, nso-manageddevice, and so on.


Procedure


Step 1

Log in to one of the kubernetes master nodes.


# grep master inventory/inventory 
[kube-master]
  kubernetes-master-ctsai-east-2-1 ansible_host=<master_1_ip_address> ansible_user=centos ansible_become=true
  kubernetes-master-ctsai-east-2-2 ansible_host=<master_2_ip_address> ansible_user=centos ansible_become=true
  kubernetes-master-ctsai-east-2-3 ansible_host=<master_3_ip_address> ansible_user=centos ansible_become=true
# ssh -F ssh.cfg centos@<master_1_ip_address>
Step 2

Access the NSO node using this command:


kubectl -n vms exec -it nso-<servicepack_name>-0 -c nso-<servicepack_name> /bin/sh

For example:


$ kubectl -n vms exec -it nso-vbranch-0 -c nso-vbranch /bin/sh

Or


$ kubectl -n vms exec -it nso-manageddevice-0 -c nso-manageddevice /bin/sh
Step 3

Change to vms user:


su vmsnso
Step 4

Run NSO CLI:


ncs_cli -u admin
Step 5

Verify the following:

  • For NACM groups for vmsnso

    
    admin@ncs% show nacm groups
    group ncsadmin {
    user-name [ private vmsnso ];
    }
    group ncsoper {
    user-name [ public vmsnso ];
    }
    [ok][2017-01-26 17:19:59]
    
  • For the aaa user

    
    admin@ncs% show aaa authentication users user
    user vmsnso {
    uid 1000;
    gid 1000;
    password $6$XfC.UmxZoxMGq58Y$Re4XKlYNHm2Ws2WkjWL09H9VNGoJqNG7TzQhtVPDZfjTY6amBxiAdafKl7iu4HQM2/uPy/2irtu/vRvJANJb//;
    ssh_keydir '';
    homedir '';
    }
    [ok][2017-01-26 17:21:10]
    [edit]
    
Step 6

Verify that the service packs were successfully installed (exit configure mode).


admin@ncs> show packages package oper-status 
Step 7

Verify the day0 device configurations.

Use one of these commands: cfg-selector or pnp day0-common (for Managed Device service pack) to verify that the day0 configuration is updated with globals and provider. Exit configure mode before executing these commands.

  • Using cfg-selector command.

    
    admin@ncs> show configuration cfg-selector 
    globals {
        ncs-service-node  10.3x.1.xx;
        ip-address <fully-qualified-domain-name> of VMS portal;
        mgmt-ipv6-type    false;
        sa-encryption-key $4$SngMGroVL+76nI4dGb496GBHn1uWZILUVR0FTjturZSDMZ4thbtG5mcMftAfGszx;
    }
    provider VZ {
        variables {
        ...
        }
        service-assurance {
        ...
        }
        offering IWAN {
        ...
        }
  • Using pnp day0 command (To be used for Cisco MSX Managed Device service pack).

    
    admin@ncs> show configuration pnp day0-common
    day0-common config-mgmt {
    variable CPE_HOSTNAME {
    value "";
    }
    variable CPE_SNMP_V3_AUTH_PASS {
    value CiscoVMS100%; <-- actual value obtained from passwords.yml
    }
    variable CPE_SNMP_V3_PRIV_PASS {
    value CiscoVMS100%; <--- actual value obtained from passwords.yml
    }
    variable CPE_SNMP_V3_USER {
    value vmsuser;
    }
    variable DEV_CUSTOMER_DNS_1 {
    value 8.8.8.8;
    }
    variable DEV_CUSTOMER_DNS_2 {
    value 8.8.4.4;
    }
    variable DEV_MGMT_HUB1 {
    value 173.39.80.209;
    }
    variable DEV_MGMT_HUB2 {
    value ""; <-- populated only in case of Dual DC deployment
    }
    variable DEV_MGMT_IP_ADDRESS {
    value "10.255.0.1";
    }
    variable DEV_MGMT_LOCAL_KEY {
    value cisco123; <-- actual value obtained from passwords.yml
    }
    variable DEV_MGMT_REMOTE_IDENTITY {
    value cisco.com;
    }
    variable DEV_MGMT_REMOTE_KEY {
    value cisco123; <-- actual value obtained from passwords.yml
    }
    variable DEV_MGMT_ROUTE {
    value "0.0.0.0 0.0.0.0";
    }
    variable DEV_MGMT_TUNNEL_INTERFACE {
    value 0;
    }
    variable DEV_NAT_KEEPALIVE {
    value 60;
    }
    variable ONBOARDING_INTERFACE {
    value ""; <-- set to the value of onboarding interface (e.g. GigabitEthernet 0/0/1. This is selected when device model is configured during add device flow in portal UI/API.
    }
    }
    
Step 8

Verify PNP server interface map settings.


admin@ncs% show pnp
server {
    port    443;
    use-ssl true;
}
interface-map "(C29[0-9][0-9])|(CISCO29[0-9][0-9])" {
    wan                 GigabitEthernet0/1;
    lan                 GigabitEthernet0/0;
    config-restore-file flash:day--1-config;
}
interface-map "(C39[0-9][0-9])|(CISCO39[0-9][0-9])" {
    wan                 GigabitEthernet0/1;
    lan                 GigabitEthernet0/0;
    config-restore-file flash:day--1-config;
}
interface-map ASR10[0-9][0-9] {
    wan                 GigabitEthernet0/0/1;
    lan                 GigabitEthernet0/0/2;
    config-restore-file bootflash:day--1-config;
}
interface-map ISR4[0-9][0-9][0-9] {
    wan                 GigabitEthernet0/0/1;
    lan                 GigabitEthernet0/0/2;
    config-restore-file bootflash:day--1-config;
}
[ok][2017-01-26 19:34:57]
Note 

For the Cisco MSX Managed Device service pack, there is no preconfigured pnp interface-map. Instead, the value of Tunnel0 source interface is obtained based on the value of the onboarding interface and the device model configured when adding a site/device in the portal UI provisioning flow.

Verification for Managed Device:


admin@ncs% show pnp
server {
port 443;
use-ssl true;
}
proxy-servers {
allow-any;
}
logging {
directory /var/log/ncs;
serial all;
}
Step 9

Verify that the provider name is correctly set. To verify, run show provider-infrastructure in the configuration mode. For example:


admin@ncs% show provider-infrastructure 
provider-infrastructure CiscoSystems {
catalog vBranch;
}
[ok][2017-10-18 19:54:57]

To find the list of supported VNFs, physical CPEs, run show catalog in the configuration mode. For example:


vmsnso@ncs% show catalog
catalog vBranch {
branch-cpe ENCS {
physical false;
read-timeout 90;
write-timeout 90;
enable-commit-queue false;
branch-cpe-template pnp-map-vCPE;
nfvis-tenant admin;
password $8$M5naF0NizWpvaJf8wqK5nGPtnX3PJyUs/AFn5EVt/tE=;
day0 {
file nfvis_day0.cfg;
}
cpe-onboarding {
device-type netconf;
port 830;
}
network GE0-0-SRIOV-1;
network GE0-0-SRIOV-2;
network GE0-1-SRIOV-1;
network GE0-1-SRIOV-2;
network LAN-SRIOV-1;
network LAN-SRIOV-2;
network LAN-SRIOV-3;
network LAN-SRIOV-4;
network LAN-SRIOV-5;
network LAN-SRIOV-6;
network int-mgmt-net;
network lan-net;
Step 10

Log in to the MSX Portal and verify that the service packs are now available. For more information, see Logging in to the Portal.


Validating the Status of MSX

After verifying NSO configuration, verify all microservices are up and running.

Use the procedure below to check the status of all microservices available in the MSX platform:

Procedure


Step 1

Move to the ansible folder and invoke the following playbook to verify the kubernetes pod status.


cd /msx-4.0.0/ansible
ansible kube-master -m command -a "kubectl get pod -n vms -o wide"
Step 2

Export the ANSIBLE_VAULT_PASSWORD_FILE main.yml variable to the path of the password file.


export ANSIBLE_VAULT_PASSWORD_FILE=<vault_pwd_path>
Step 3

Verify the health status of the MSX Platform.


ansible-playbook checks/platform-health.yml

Deleting the Deployer Container

Passwords in the deployer container are encrypted using Ansible Vault, meaning they are not in clear text at rest. That said, if you access the deployment container and know the Ansible Vault password then you can extract the system passwords. This is by design as the passwords are needed by the deployer to create the deployment. For this reason, the deployment container should be a transient entity. Once a production deployment has been successfully installed, administrators should make necessary backups of passwords, variables, and SSH keys, and store them in a safe place in accordance with their own best practices. Administrators should then delete the container to maintain a secure system.

Scaling NSO Shards

By default, there are two NSO shards for new deployments. Each time the number of devices in your production environment exceeds a threshold of 500 per NSO, you will need to add a new shard to scale your NSO capabilities. If you are upgrading from a 3.9 or earlier MSX deployment and you had a single NSO shard, then you will still only have a single shard after the upgrade.

Prior to running the command to scale the NSOs for Managed Device or SD-Branch:

Procedure


Step 1

Navigate to the msx-4.0.0 Ansible folder.

cd /msx-<version>/ansible 
Step 2

Create an Ansible vault password file with the same password used in MSX 4.0.0, and export the Ansible password file so that you can run the Ansible playbooks.

export ANSIBLE_VAULT_PASSWORD_FILE=<path_to_vault_password_file> 

Scaling Managed Device NSOs

If you need to add additional shards to increase the Managed Device capacity of your deployment, follow the procedures in this section. You may add one shard at a time, with no upper limit. It is best to add shards individually if you are not sure that you have enough CPU or memory resources for the additional shards.

To add a new Managed Device shard:

Procedure


Step 1

Open group_vars/all/manageddevice_variables.yml to edit the PNP_MGMT_ADDR_SUBNET_MASK_LIST_MANAGEDDEVICE variable.

Originally, PNP_MGMT_ADDR_SUBNET_MASK_LIST_MANAGEDDEVICE has [ '10.254.1.0/24', '10.254.8.0/24' ] as the assigned shards. In this example, you will add one more shard to the subnet.

To add the shard, use [ '10.254.1.0/24', '10.254.8.0/24' , '10.254.32.0/24'].

Do not change the sequence of subnets in the variable because the playbook will behave incorrectly.

Step 2

Run the command, with $NUM equal to the amount of running shards that you have at present.

library/ansible-playrole nso-k8s/deploy kube_master "servicepack_name: manageddevice, existing_nso_shard: $NUM, schema_mode: NewInstall"

Where $NUM = 2 in the above example.


Scaling SD-Branch NSOs

You may add one shard at a time, with no upper limit. It is best to add shards individually if you are unsure that you have enough CPU or memory resources for the additional shards.

To add a new SD-Branch shard:

Procedure


Step 1

Open group_vars/all/vbranch_variables.yml to edit the PNP_MGMT_ADDR_SUBNET_MASK_LIST_VBRANCH variable.

Originally, PNP_MGMT_ADDR_SUBNET_MASK_LIST_VBRANCH has [ '10.254.10.0/23', '10.254.10.0/23' ] as the assigned shards. In our example, you will add one more shard to the subnet.

To add a third shard, repeat the same subnet like this: [ '10.254.10.0/23', '10.254.10.0/23' , '10.254.10.0/23' ]. All NSO shards should have the same subnet for SD-Branch.

Step 2

Run the command, with $NUM equal to the amount of running shards that you have at present.

llibrary/ansible-playrole nso-k8s/deploy kube_master "servicepack_name: vbranch, existing_nso_shard: $NUM, schema_mode: NewInstall"

Where $NUM = 2 in the above example.


Updating New NSO Shard Configurations

When a new NSO shard is created, the Catalog, VNFD, and NSD configurations for that shard must be configured.

You can use one of the following methods to configure an NSO shard:

  • Manual NSO Manipulation for SD-WAN Service Pack

Service Type APIs

A set of APIs in the Orchestration microservice allows you to upload catalog entries, VFDs, and other information by service type. If these APIs were used to upload the original configurations, then all users must run the PUT operations again with the same configuration. This way, data will get distributed to all the new shards.

The following images display the available APIs for Catalogs, VNFDs, and NSDs:

  • Available APIs for Catalogs:

  • Available APIs for VNFDs:

    You can call the following API to get all VNFDs that were deployed earlier.


    Note

    With the service type APIs, when you perform a GET operation, you do not need to worry about in which shard the object resides as the data is centralized in the database.


    Once all the VNFDs are returned, you can iterate through the VNFDs and update MSX using the following API:

    • This will push the VNFD to all new shards and complete the configuration process.

  • Available APIs for NSDs:

NSO Shard ID (Deprecated APIs)


Note

If this is an older MSX installation that had not loaded the configurations using service type APIs, you should consider upgrading to use the service type APIs.


If you had used the NSO shard ID APIs to originally upload catalogs and VNFDs, you can use the same APIs to push the configuration to the new NSO shards. You must configure each newly-added NSO shard separately using the same APIs, and the specific shard ID. For example:

To update the new NSO shards with existing catalog entries:

  1. Get all the existing catalog entries from the original NSO shard, using the command:

    GET ../api/v1/{shardId}/config/catalog

    Note

    When using the shard ID APIs, verify all the NSO shards when retrieving an object from the NSO shard. If all the NSO shards are synchronized, select only one NSO shard to retrieve an object.


  2. Retrieve an object from the NSO shard, get the list of all the available NSO shards from the manage micro service using the following API:

    GET ../api/v2/orchestrators 

    The response object has an array of objects with an attribute Shard ID. Use the Shard ID, to verify each NSO shard and obtain the required object. For example:

    • To get the specific catalog from the NSO shard, use the following API:

      GET ../api/v1/{shardId}/config/catalog/{id}
    • To get all the existing catalog definitions from the original NSO shard, use the following API:

      GET ../api/v1/{shardId}/config/catalog/
  3. For each new shard, go through the returned list of catalog definitions and update the shard using the following API:

    POST ../api/v1/{shardId}/config/catalog

Note

Using the same process that you used for catalog entries, update the new shards with existing VNFD entries.


Manual NSO Manipulation for SD-Branch Service Pack

Use the manual NSO manipulation method to upload catalog entries and VNFDs for older installations that do not use service type APIs. If service type APIs were used to load the original configuration, you can continue to use the service type APIs.

To manually load the new NSO shard:

Procedure


Step 1

Save the catalog and VNFD configuration files from the existing shards.

Configuration for saving catalog and VNFD files:

vmsnso@ncs> configure
Entering configuration mode private
[ok][2020-08-27 17:42:45]
 
[edit]
vmsnso@ncs% show catalog | display xml | save catalog.xml
[ok][2020-08-27 17:42:48]pre
 
[edit]
vmsnso@ncs% show nfvo vnfd | display xml | save nfvo-vnfd.xml
[ok][2020-08-27 17:42:53]
Step 2

Copy the catalog and VNFD configuration files to the new shard NSO container.

Step 3

Log in to NSO and merge configurations to the new shard NSO.

For catalogs and VNFDs, use the following commands to merge:

vmsnso@ncs% load merge
Possible completions:
  <filename/terminal>  catalog.xml  nfvo-vnfd.xml  post-install.log
vmsnso@ncs% load merge catalog.xml
[ok][2020-08-27 17:43:14]
 
[edit]
vmsnso@ncs% commit
[ok][2020-08-27 17:43:16]
 
[edit]
vmsnso@ncs%

Manual NSO Manipulation for SD-WAN Service Pack

To patch multiple NSO configurations for the SD-WAN service pack on the AWS environment:

Procedure

Step 1

Deploy the tar file on the AWS environment.

For more information, see the Devnet document.

Based on the user configurations the SingleIP, DualIP, or both can be pushed to the AWS environment.

Step 2

Log in to the Kubernetes master node.

Ensure that both the nsoConfigDualIP and nsoConfigSingleIP configuration files are present in the /home/centos folder.

Step 3

Copy the SingleIP and DualIP configuration files from the existing NSO container to the new NSO container using the command:

“sudo kubectl cp nsoConfigSingleIP <new_shard_NSO_container:/tmp>”
 “sudo kubectl cp nsoConfigDualIP <new_shard_NSO_container:/tmp>”
Step 4

Log in to the NSO and navigate to the new NSO folder.

 cd /tmp/nsoConfigDualIP  
 cd  /tmp/nsoConfigSingleIP
Step 5

Perform load merge using the command:

sh loadmerge.sh

Data Backups

Cisco MSX includes two basic backup mechanisms:

  1. Manual backups: These backups are run at once and are typically run pre-upgrade. The state of the backups is mostly consistent. For more information, see Manual Data Backups section.

  2. Cron-based backups: These backups are used to provide a convenient ongoing backup mechanism where you can backup individual databases on different schedules. Data consistency may be out of sync because of the different schedules. For more information, see K8s Cron-based Backups section.


Note

MSX Operators should clean up any older backups according to the retention policies for their organization.


Manual MSX Data Backups

As a best practice, you should periodically create a backup of your system once it is in operation. Ensure that the certificates (ca.pem and ca-key.pem) are also backed up along with your other data. They are located at /etc/ssl/vms-certs on the Inception and kube-master nodes. Use the following playbook to perform the backup:

ansible-playbook vms-backup.yml --extra-vars backup_tag=msx-backup-tag

Note

Whenever you need to restore MSX, ensure that you use the same tag that was specified in the backup operation.


Backing Up CockroachDB

When making a backup of the MSX system, in addition to running the vms-backup.yml playbook, you must run the following steps to back up CockroachDB.

Procedure


Step 1

Log in to the installer container.

Step 2

Create temporary AWS credentials to use for the backup process. This user should only have access to your S3 bucket. For more information, see Creating the AWS Credentials File.

Step 3

Back up your existing AWS credentials file.

Step 4

Insert the new temporary AWS credentials into your credentials file.

Step 5

Run the cockroachdb-backup.yml playbook.

cd /msx-4.0.0/ansible
export ANSIBLE_VAULT_PASSWORD_FILE=<path to pwd file>
ansible-playbook cockroachdb-backup.yml --extra-vars backup_target_dbs=serviceconfigmanager
Step 6

Verify that the roachdump job completed successfully.

ssh -F ssh.cfg centos@<kubernetes-master-1-VM_IP>
sudo su
kubectl get pods | grep roachdump

Verified command output example:

roachdump-9qprn                      0/1     Completed           0          83s
Step 7

View the job logs to ensure that the backup was successful.

kubectl logs roachdump-9qprn

Log command output example:

+ exec /bin/bash /backup.sh nfviallsps
2020-02-22T07:13:04Z INFO Starting backup of database nfviallsps
2020-02-22T07:13:04Z INFO Backup successfully completed. Beginning commpression.
2020-02-22T07:13:04Z INFO Backup successfully compressed. Sending to long term storage.
`/backup/2020-02-22T07:13:04Z-serviceconfigmanager.sql.gz` -> `backupstore/backups/2020-02-22T07:13:04Z-nfviallsps.sql.gz`
Total: 56 B, Transferred: 56 B, Speed: 493 B/s
2020-02-22T07:13:04Z INFO Backup successfully stored offline. Removing local copy.
2020-02-22T07:13:04Z INFO Backup job complete!
Step 8

Delete the roachdump job.

kubectl delete job roachdump
Step 9

Delete the roachdump manifest off of the Master node in which it resides.

rm /etc/kube-manifests/cockroachdb/roachdump-job.yml
Step 10

Replace your original AWS credentials file.

Step 11

Within AWS, delete the access key for the temporary S3 account.


Relocating CockroachDB Backups Out of MSX

The CockroachDB backup playbook saves the backup file to either S3 (AWS) or MinIO (OpenStack). Administrators should copy these backups outside of the MSX environment to ensure that they are available in the event of a complete environment failure. Additionally, administrators should safely store these backups in accordance with their operational best practices.

Relocating CockroachDB Backups (Openstack)

This section explains the process of retrieving CockroachDB backups from MinIO and relocating those backups to a non-MSX location. Administrators should run these commands from the Master node.

Procedure

Step 1

Export the environment variable.


export MC_HOST_backupstore=https://minio-admin:$(grep -A1 minio-admin /root/.mc/config.json | grep secretKey | awk '{print $2}' | sed -e 's/^"//' -e 's/",$//')@$(hostname):9000
Step 2

List the available CockroachDB backups.


/usr/local/bin/mc ls backupstore/backups

Output Example:


[2020-07-31 12:19:42 UTC]  1.7KiB 2020-07-31T12:19:41Z-serviceconfigmanager.sql.gz
Step 3

Copy any backups from MinIO to the Master node.


/usr/local/bin/mc cp backupstore/backups/<backup-filename> <dest>

Command Example:


/usr/local/bin/mc cp backupstore/backups/2020-07-31T12:19:41Z-serviceconfigmanager.sql.gz /home/centos/
Step 4

Copy the file from the Master node to a safe (non-MSX) location.


Relocating CockroachDB Backups (AWS)

This section explains the process of retrieving CockroachDB backups from S3 and relocating those backups to a non-MSX location. Administrators should run these commands from the /msx-<version>/ansible folder in the Installer container:

Procedure

Step 1

List the available CockroachDB backups.


aws s3 ls "s3://$(grep vms_subdomain: group_vars/all/main.yml | awk '{print $2}')-msx-bucket.$(grep vms_domain: group_vars/all/main.yml | awk '{print $2}')/backups/"

Command Example:


aws s3 ls s3://vms_subdomain-bucket-aws-msx-bucket.platform.ciscovms.com/backups/2020-07-31 10:22:00       1744 2020-07-31T10:21:58Z-serviceconfigmanager.sql.gz

Output Example:

2020-07-31 10:22:00       1744 2020-07-31T10:21:58Z-serviceconfigmanager.sql.gz 
Step 2

Copy any backups from S3 to the Installer container.


aws s3 cp "s3://$(grep vms_subdomain: group_vars/all/main.yml | awk '{print $2}')-msx-bucket.$(grep vms_domain: group_vars/all/main.yml | awk '{print $2}')/backups/<backup-filename>" <dest-path>
Command Example:
aws s3 ls s3://vms_subdomain-bucket.aws-msx-bucket.platform.ciscovms.com/backups/ 

Output Example:

download: s3://vms_subdomain-bucket.aws-msx-bucket.platform.ciscovms.com/backups/2020-07-31T10:21:58Z-serviceconfigmanager.sql.gz to vms-backup/2020-07-31T10:21:58Z-serviceconfigmanager.sql.gz 
Step 3

Copy the backups from the Installer container to a safe (non-MSX) location.


Restoring the CockroachDB

Use the following procedure to locate and restore a CockroachDB backup.

Procedure


Step 1

Log in to the installer container.

Step 2

Change to the Ansible directory.

cd /msx-4.0.0/ansible
Step 3

Run the cockroachdb-list-backups playbook to get the name of any backups.

ansible-playbook cockroachdb-list-backups.yml

Command output example:

ASK [cockroachdb/restore : Display list of backups] *************************************************************************
task path: /msx-4.0.0/ansible/roles/cockroachdb/restore/tasks/list-backups.yml:23
Monday 27 July 2020  18:47:32 +0000 (0:00:01.304)       0:00:06.225 *********** 
ok: [kubernetes-master-dvd-1] => {}MSG:BACKUPS:
[2020-07-27 18:12:17 UTC]  1.7KiB 2020-07-27T18:12:15Z-serviceconfigmanager.sql.gz

If a backup does not exist, you will see the following error message:

MSG:NO BACKUPS

If the environment has been removed or destroyed, then the backups will not exist in MinIO (OpenStack) or S3 (AWS).

Step 4

Log into the Kubernetes Master node.

ssh -F ssh.cfg centos@<kubernetes-master-1-VM_IP>
sudo su
Step 5

Use the following procedure to drop the Cockroach database.

kubectl exec -it cockroachdb-0 -c cockroachdb bash

./cockroach sql --certs-dir cockroach-certs
drop database serviceconfigmanager cascade; 
Step 6

To restore, choose a file from the list of backups.

Step 7

Log in to the installer container.

cd /msx-4.0.0/ansible
export ANSIBLE_VAULT_PASSWORD_FILE=<path to pwd file>
ansible-playbook cockroachdb-restore.yml --extra-vars '{"restoreTarget":{ "database": "serviceconfigmanager", "user":"serviceconfigmanager", "service":"serviceconfigmanager", "backupfile":"<backup file name>" }}'
Step 8

Verify that the restore operation was successful.

ssh -F ssh.cfg centos@<kubernetes-master-1-VM_IP>
sudo su
kubectl get po | grep roachrestore
Step 9

View the output to verify that the roachrestore job completed successfully.

roachrestore-ph65b                           0/1     Completed   0          67s
Step 10

View the log to make sure that the restore operation was successful.

kubectl logs roachrestore-ph65b

Verified command output example:

+ exec /bin/bash /restore.sh 2020-02-25T18:58:00Z-serviceconfigmanager.sql.gz
2020-02-25T19:17:58Z INFO Starting restore of database file 2020-02-25T18:58:00Z-serviceconfigmanager.sql.gz
`backupstore/ei-infra-aws-msx-bucket.qa.ciscovms.com/backups/2020-02-25T18:58:00Z-serviceconfigmanager.sql.gz` -> `/backup/2020-02-25T18:58:00Z-serviceconfigmanager.sql.gz`
Total: 2.56 KiB, Transferred: 2.56 KiB, Speed: 59.12 KiB/s
2020-02-25T19:17:59Z INFO Successfully retrieved backup archive
2020-02-25T19:17:59Z INFO Successfully extracted sql from archive
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 4
INSERT 1
INSERT 5
INSERT 3
ALTER TABLE
ALTER TABLE
2020-02-25T19:17:59Z INFO Restore job complete!

Recovering CockroachDB Backups Into MSX

If you need to recover from a backup that is no longer stored in S3 or MinIO, use the procedures in the following sections to push the backup into the MSX environment that needs to be restored.

Recovering CockroachDB Backups (Openstack)

This section describes the process of recovering CockroachDB backups from a non-MSX location back into MSX (MinIO). Administrators should perform the following actions from the Master node.

Procedure

Step 1

Copy the backup file from your safe location to the Master node.

Step 2

Export the environment variable.

export MC_HOST_backupstore=https://minio-admin:$(grep -A1 minio-admin /root/.mc/config.json | grep secretKey | awk '{print $2}' | sed -e 's/^"//' -e 's/",$//')@$(hostname):9000
Step 3

Copy any backups to MinIO from the Master node.

/usr/local/bin/mc cp <path-to-file> backupstore/backups/

Command Example:

/usr/local/bin/mc cp /home/centos/2020-08-03T11\:22\:26Z-serviceconfigmanager.sql.gz backupstore/backups/

Recovering CockroachDB Backups (AWS)

This section describes the process of recovering CockroachDB backups from a non-MSX location back into MSX (S3). Administrators should perform the following actions from the /msx-<version>/ansible folder in the Installer container.

Procedure

Step 1

Ensure you have the backup file available on the Installer container.

Step 2

Copy any backups to S3 from the Installer container.

aws s3 cp <path-to-file> "s3://$(grep vms_subdomain: group_vars/all/main.yml | awk '{print $2}')-msx-bucket.$(grep vms_domain: group_vars/all/main.yml | awk '{print $2}')/backups/"

Command Example:

aws s3 cp vms-backup/pre-upgrade/2020-07-31T10\:21\:58Z-serviceconfigmanager.sql.gz "s3://$(grep vms_subdomain: group_vars/all/main.yml | awk '{print $2}')-msx-bucket.$(grep vms_domain: group_vars/all/main.yml | awk '{print $2}')/backups/" 

Output Example:

upload: vms-backup/pre-upgrade/2020-07-31T10:21:58Z-serviceconfigmanager.sql.gz to s3://vms_subdomain-bucket.aws-msx-bucket.platform.ciscovms.com/backups/2020-07-31T10:21:58Z-serviceconfigmanager.sql.gz 
 

K8s Cron-based Backups

MSX includes some basic cron-based functionality that provides the ability to automate backups on a schedule. Backup jobs are configured using UTC, and timing can be customized per database. Backups include persistent data from Cassandra, CockroachDB, Elasticsearch, NSOs, and Action Orchestrator, which contains ArangoDB and Postgres.

Currently, MSX supports backing up to S3 protocol-based endpoints. This solution is tested with AWS S3 and MinIO cloud storage products.

By default, AWS deployments use the S3 bucket created as part of the deployment. This bucket typically holds the Yum repo and the Docker registry, but you can also define an alternative S3 bucket for backups.

Using AWS S3 Repos for on-prem Backups or Alternate AWS Buckets

This section covers two scenarios where you can use AWS buckets for backup purposes. The first method uses an ASW S3 bucket to back up on-prem deployments. The second method uses an alternate region AWS S3 bucket; for example, to provide geographic redundancy.

The procedure for setting up the AWS S3 buckets is as follows. Many of these steps involve Amazon products, so they will not be extensively documented here, and you can refer to external links for details.

  1. Create a private S3 bucket in AWS. For instructions, see: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html

  2. Create an Identity and Access Management (IAM) user and an IAM role associated to that user. For instructions, see: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html

  3. Within the IM role for the bucket, set bucket permissions similar to those of the following template. Make sure that you have updated the BUCKET_NAME:

    {
      "Statement": [
        {
          "Action": [
            "s3:ListBucket",
            "s3:GetBucketLocation",
            "s3:ListBucketMultipartUploads",
            "s3:ListBucketVersions"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::BUCKET_NAME"
          ]
        },
        {
          "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:AbortMultipartUpload",
            "s3:ListMultipartUploadParts"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::BUCKET_NAME/*"
          ]
        }
      ],
      "Version": "2012-10-17"
    }
    
  4. Create an access key for the user. For instructions, see https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys

  5. Update group_vars/all/cron_backup_var.yml with the appropriate bucket details as shown in bold below. Ensure that you have exported the environment variables before running the playbook to create the Cron jobs or before performing a restore operation.

    s3_backup:
      endpoint: 's3.us-east-2.amazonaws.com'
      region: 'us-east-2'
      bucket: 'backup-test'
      base_path: 'backups'
      protocol: 'https'
      access_key: "{{ lookup('env', 'S3_ACCESS_KEY_ID') }}"
      secret_key: "{{ lookup('env', 'S3_SECRET_ACCESS_KEY') }}"
      proxy_host: "{{ proxy | urlsplit('hostname') }}"
      proxy_port: "{{ proxy | urlsplit('port') }}"
      proxy_scheme: "{{ proxy | urlsplit('scheme') }}"
    

Using MinIO as an S3 Endpoint

MinIO has a gateway mode that can be set up to use NAS storage. For more information, see https://docs.min.io/docs/minio-gateway-for-nas.html

For any MinIO questions, you can read the documentation at https://docs.min.io

Configure the MinIO endpoint by ensuring that it is not self-signed for an https-based protocol.

s3_backup:
  endpoint: '10.XX.X.X:9000'
  bucket: 'backups'
  protocol: 'http'
  access_key: "{{ lookup('env', 'S3_ACCESS_KEY_ID') }}"
  secret_key: "{{ lookup('env', 'S3_SECRET_ACCESS_KEY') }}"

Restoring Backups

This section explains how to restore the data backups.

Restoring Elasticsearch Backups

The following procedure is a general guide to one approach for restoring Elasticsearch data. For more details on performing snapshot and restore operation, see the official documentation at https://www.elastic.co/guide/en/elasticsearch/reference/6.8/modules-snapshots.html.

  1. List all the existing repositories (full backups) to find the backup to restore. If you want to restore data from 2020-12-08, you may need to restore the 2020-12-09 repository depending on the time the backup jobs were configured to run. The Chron backup jobs run in UTC time.

    # curl http://es-logs:9200/_cat/repositories?v
     
    id                       type
    es-backup-4.0.0-20201201   s3
    es-backup-4.0.0-20201202   s3
    es-backup-4.0.0-20201203   s3
    es-backup-4.0.0-20201204   s3
    es-backup-4.0.0-20201205   s3
    es-backup-4.0.0-20201206   s3
    es-backup-4.0.0-20201207   s3
    es-backup-4.0.0-20201208   s3
    es-backup-4.0.0-20201209   s3
    es-backup-4.0.0-20201210   s3
    es-backup-4.0.0-20201211   s3
    es-backup-4.0.0-20201212   s3
    es-backup-4.0.0-20201213   s3
    es-backup-4.0.0-20201214   s3
    
  2. List the snapshots (incremental backups) using the following curl command:

    curl 'http://es-logs:9200/_cat/snapshots/<repository-name>?v&s=id'

    # curl 'http://es-logs:9200/_cat/snapshots/es-backup-4.0.0-20201209?v&s=id'
     
    id                   status start_epoch start_time end_epoch  end_time duration indices successful_shards failed_shards total_shards
    snap-20201209001503 SUCCESS 1607472908  00:15:08   1607473311 00:21:51     6.7m       9                41             0           41
    
  3. Restore the specific indices using the following curl command:


    Note

    As a rule, for all Elasticsearch restore operations, prior to restoring indices you must either close or delete the existing indices. Indexes that are being actively written to cannot be restored over. For more information, see: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-open-close.html.

    curl -X POST "es-logs:9200/_snapshot/<repository-name>/<snapshot-id>/_restore?pretty" -H 'Content-Type: application/json' -d'{ "indices": "<index-name>,<index-name>" }'

    # curl -X POST "es-logs:9200/_snapshot/es-backup-4.0.0-20201209/snap-20201209001503/_restore?pretty" -H 'Content-Type: application/json' -d'{ "indices": "logstash-2020.12.07,logstash-2020.12.08" }'
     
    {
      "accepted" : true
    }
    
    
  4. Monitor the restore progress. You can check the progress of the restore operation by listing the indices and viewing certain details. The restore operation is complete once the indices health state becomes green and their docs.count and pri.store.size columns are populated. Use the following curl command to view the indices and related details.

    # curl 'http://es-logs:9200/_cat/indices?v'
     
    health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    green  open   logstash-2020.11.20 2w5g8jzcRgCG_ofqFYTCFQ   5   1          0            0      2.5kb          1.2kb
    green  open   logstash-2020.11.21 mvxBmpOpRnCZaDE_pbNWnw   5   1          0            0      2.5kb          1.2kb
    green  open   logstash-2020.11.19 n20hdhl3TCSDQvuh2Q1tEQ   5   1          0            0      2.5kb          1.2kb
    green  open   logstash-2020.11.22 p-4wwbEFRIyloNDRszAtLA   5   1          0            0      2.5kb          1.2kb
    green  open   logstash-2020.12.13 FRmI-i5yQMeBmr-OAILLiw   5   1    6603982            0     10.4gb          5.2gb
    green  open   logstash-2020.12.12 YO89Yzb4TUqr8pr-9QVOUQ   5   1    6601900            0     10.4gb          5.2gb
    green  open   logstash-2020.11.18 0bp8BoFJTSaYQVJBuRo8xw   5   1       1564            0      1.6mb        835.9kb
    green  open   logstash-2020.11.23 W0Uf_Gf0T6aSipsG9wvxvw   5   1          0            0      2.5kb          1.2kb
    yellow open   logstash-2020.12.07 pwRCyQJDTE-XqPCEDskM7A   5   1                                                 
    green  open   logstash-2020.12.14 qF5mVL93SymbMJGOcy4mJw   5   1    3547310            0      7.4gb          3.4gb
    green  open   logstash-2020.11.25 Sq9fBztvS9u6qaK16Ryg5w   5   1          0            0      2.5kb          1.2kb
    green  open   logstash-2020.11.24 tZGLt6UEQneFdq4GosLgpQ   5   1       6752            0        5mb          2.5mb
    green  open   .kibana             V4nSOx93Qf642s8BiYqoyg   1   1          4            0     51.8kb         25.9kb
    yellow open   logstash-2020.12.08 VxQb2DVjQIaX6VV9PlQ3XA   5   1
    
    

    If necessary, you can obtain more detailed restore progress information by running the following curl command:

    curl -s 'http://es-logs:9200/_cat/recovery' | grep <index-name>

    # curl -s 'http://es-logs:9200/_cat/recovery' | grep 2020.12.07
     
    logstash-2020.12.07 0 3m    snapshot    done  n/a            n/a                         10.201.161.193 vms-logs-data-es-log-data-2 es-backup-4.0.0-20201209 snap-20201209001503 124 124 100.0% 124 1114905317 1114905317 100.0% 1114905317 0    0    100.0%
    logstash-2020.12.07 0 41.3s peer        index 10.201.161.193 vms-logs-data-es-log-data-2 10.201.76.193  vms-logs-data-es-log-data-1 n/a                      n/a                 124 114 91.9%  124 1114905316 513079220  46.0%  1114905316 0    0    100.0%
    logstash-2020.12.07 1 2.9m  snapshot    done  n/a            n/a                         10.201.76.193  vms-logs-data-es-log-data-1 es-backup-4.0.0-20201209 snap-20201209001503 74  74  100.0% 74  1111465590 1111465590 100.0% 1111465590 0    0    100.0%
    logstash-2020.12.07 1 52.9s peer        index 10.201.76.193  vms-logs-data-es-log-data-1 10.201.70.65   vms-logs-data-es-log-data-0 n/a                      n/a                 74  70  94.6%  74  1111465589 541903133  48.8%  1111465589 0    0    100.0%
    logstash-2020.12.07 2 51.6s peer        index 10.201.70.65   vms-logs-data-es-log-data-0 10.201.161.193 vms-logs-data-es-log-data-2 n/a                      n/a                 133 126 94.7%  133 1115393078 660291756  59.2%  1115393078 0    0    100.0%
    logstash-2020.12.07 2 3.1m  snapshot    done  n/a            n/a                         10.201.70.65   vms-logs-data-es-log-data-0 es-backup-4.0.0-20201209 snap-20201209001503 133 133 100.0% 133 1115393079 1115393079 100.0% 1115393079 0    0    100.0%
    logstash-2020.12.07 3 3m    snapshot    done  n/a            n/a                         10.201.161.193 vms-logs-data-es-log-data-2 es-backup-4.0.0-20201209 snap-20201209001503 111 111 100.0% 111 1112960462 1112960462 100.0% 1112960462 0    0    100.0%
    logstash-2020.12.07 3 2.1m  peer        done  10.201.161.193 vms-logs-data-es-log-data-2 10.201.76.193  vms-logs-data-es-log-data-1 n/a                      n/a                 111 111 100.0% 111 1112960461 1112960461 100.0% 1112960461 0    0    100.0%
    logstash-2020.12.07 4 3m    snapshot    done  n/a            n/a                         10.201.76.193  vms-logs-data-es-log-data-1 es-backup-4.0.0-20201209 snap-20201209001503 144 144 100.0% 144 1112799924 1112799924 100.0% 1112799924 0    0    100.0%
    logstash-2020.12.07 4 42.8s peer        index 10.201.76.193  vms-logs-data-es-log-data-1 10.201.70.65   vms-logs-data-es-log-data-0 n/a                      n/a                 144 131 91.0%  144 1112799923 408418291  36.7%  1112799923 0    0    100.0%
    
    

Restoring Cassandra Backups

The following sections are provided as a general guide to one approach for restoring Cassandra data. For more details on performing the restore operation, see the official documentation at https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsBackupSnapshotRestore.html.

Preparing the Backup to Restore From

Use the following procedure to prepare the Cassandra backup that you will be restoring from:

  1. Download the backup tgz file from the backup location.

  2. Copy the backup tgz file to the deployer container.

  3. Copy the backup tgz file to a master node.

  4. Copy the backup to the cassandra-0 node.

    kubectl cp cassandra-<version-date>.tgz cassandra-0:/tmp/
  5. Log in to cassandra-0.

    kubectl exec -it cassandra-0 -c cassandra -- /bin/bash
  6. Change the directory to /tmp (or whatever cassandra-0 directory you copied the backup file to).

  7. Extract the node backups from the combined tgz archive.
    tar zxvf cassandra-<version-date>.tgz
  8. Change directory to the created backup directory.
    cd  /tmp/cassandra-<version-date>
  9. Make directories for the various nodes.
    mkdir cassandra-0
    mkdir cassandra-1
    mkdir cassandra-2
    
  10. Extract the tgz files to the directories that were created.
    tar zxvf cassandra-0--<version-date>.tgz -C cassandra-0
    tar zxvf cassandra-1--<version-date>.tgz -C cassandra-1
    tar zxvf cassandra-2--<version-date>.tgz -C cassandra-2
    

Restoring a Table from the Backups

You will likely need to truncate the table you plan on restoring. For an explanation of the reasons, see the official documentation at https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsBackupSnapshotRestore.html

Use the following command to truncate a table:

cqlsh -u vmsdba -p <cassandra_pass> -e 'CONSISTENCY LOCAL_QUORUM; TRUNCATE <keyspace>.<table>;'

# cqlsh -u vmsdba -p somepassword -e 'CONSISTENCY LOCAL_QUORUM; TRUNCATE skyfall_idm.customeruser;'

Use the following command to load the backup data from all three Cassandra nodes:

sstableloader -u vmsdba -pw <cassandra_pass> -d cassandra-0 cassandra-<node-number>/skyfall_idm/customeruser

The following is an example for loading backup data from Cassandra nodes.

# sstableloader -u vmsdba -pw somepassword -d cassandra-0 cassandra-0/skyfall_idm/customeruser
 
WARN  19:14:20,981 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 1243.  You can override this in cassandra.yaml
WARN  19:14:21,119 Only 9.413GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /tmp/cassandra-4.0.0-2020-12-12/cassandra-0/skyfall_idm/customeruser/mc-67-big-Data.db  to [/10.201.222.136, /10.201.143.85, cassandra-0/10.201.60.134]
progress: [/10.201.222.136]0:1/1 100% [/10.201.143.85]0:1/1 100% [cassandra-0/10.201.60.134]0:0/1 0  % total: 66% 2.256KiB/s (avg: 2.256KiB/s)
progress: [/10.201.222.136]0:1/1 100% [/10.201.143.85]0:1/1 100% [cassandra-0/10.201.60.134]0:0/1 0  % total: 66% 0.000KiB/s (avg: 2.254KiB/s)
progress: [/10.201.222.136]0:1/1 100% [/10.201.143.85]0:1/1 100% [cassandra-0/10.201.60.134]0:1/1 100% total: 100% 3.349MiB/s (avg: 3.381KiB/s)
progress: [/10.201.222.136]0:1/1 100% [/10.201.143.85]0:1/1 100% [cassandra-0/10.201.60.134]0:1/1 100% total: 100% 0.000KiB/s (avg: 3.269KiB/s)
progress: [/10.201.222.136]0:1/1 100% [/10.201.143.85]0:1/1 100% [cassandra-0/10.201.60.134]0:1/1 100% total: 100% 0.000KiB/s (avg: 3.233KiB/s)
progress: [/10.201.222.136]0:1/1 100% [/10.201.143.85]0:1/1 100% [cassandra-0/10.201.60.134]0:1/1 100% total: 100% 0.000KiB/s (avg: 3.204KiB/s)
 
Summary statistics:
   Connections per host    : 1        
   Total files transferred : 3        
   Total bytes transferred : 19.702KiB
   Total duration          : 6151 ms  
   Average transfer rate   : 3.202KiB/s
   Peak transfer rate      : 3.381KiB/s
 
 
 
# sstableloader -u vmsdba -pw somepassword -d cassandra-0 cassandra-1/skyfall_idm/customeruser
WARN  19:14:37,936 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 1243.  You can override this in cassandra.yaml
WARN  19:14:38,085 Only 9.412GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /tmp/cassandra-4.0.0-2020-12-12/cassandra-1/skyfall_idm/customeruser/mc-61-big-Data.db /tmp/cassandra-4.0.0-2020-12-12/cassandra-1/skyfall_idm/customeruser/mc-62-big-Data.db  to [/10.201.222.136, /10.201.143.85, cassandra-0/10.201.60.134]
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:1/2 85 % total: 85% 4.780KiB/s (avg: 4.780KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:1/2 85 % total: 85% 0.000KiB/s (avg: 4.777KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:1/2 85 % total: 85% 0.000KiB/s (avg: 4.776KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:1/2 85 % total: 90% 76.632KiB/s (avg: 5.027KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 95% 826.219KiB/s (avg: 5.294KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 175.651KiB/s (avg: 5.553KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 0.000KiB/s (avg: 5.233KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 0.000KiB/s (avg: 5.229KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 0.000KiB/s (avg: 5.225KiB/s)
 
Summary statistics:
   Connections per host    : 1        
   Total files transferred : 6        
   Total bytes transferred : 23.024KiB
   Total duration          : 4408 ms  
   Average transfer rate   : 5.223KiB/s
   Peak transfer rate      : 5.553KiB/s
 
 
 
# sstableloader -u vmsdba -pw somepassword -d cassandra-0 cassandra-2/skyfall_idm/customeruser
WARN  19:14:51,994 Small cdc volume detected at /var/lib/cassandra/cdc_raw; setting cdc_total_space_in_mb to 1243.  You can override this in cassandra.yaml
WARN  19:14:52,127 Only 9.412GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /tmp/cassandra-4.0.0-2020-12-12/cassandra-2/skyfall_idm/customeruser/mc-61-big-Data.db /tmp/cassandra-4.0.0-2020-12-12/cassandra-2/skyfall_idm/customeruser/mc-62-big-Data.db  to [/10.201.222.136, /10.201.143.85, cassandra-0/10.201.60.134]
progress: [/10.201.222.136]0:0/2 0  % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:1/2 85 % total: 57% 3.159KiB/s (avg: 3.159KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:1/2 85 % total: 85% 2.456MiB/s (avg: 4.736KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:1/2 85 % total: 85% 0.000KiB/s (avg: 4.731KiB/s)
progress: [/10.201.222.136]0:1/2 85 % [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:2/2 100% total: 90% 406.330KiB/s (avg: 4.995KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:1/2 85 % [cassandra-0/10.201.60.134]0:2/2 100% total: 95% 1.476MiB/s (avg: 5.260KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 628.697KiB/s (avg: 5.523KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 0.000KiB/s (avg: 5.129KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 0.000KiB/s (avg: 5.062KiB/s)
progress: [/10.201.222.136]0:2/2 100% [/10.201.143.85]0:2/2 100% [cassandra-0/10.201.60.134]0:2/2 100% total: 100% 0.000KiB/s (avg: 5.048KiB/s)
 
Summary statistics:
   Connections per host    : 1        
   Total files transferred : 6        
   Total bytes transferred : 23.024KiB
   Total duration          : 4562 ms  
   Average transfer rate   : 5.046KiB/s
   Peak transfer rate      : 5.523KiB/s

Rebuilding Cassandra Table Indices

As a final step, rebuild the table indices by first retrieving the index names of the table and then use that information to rebuild the indices.

  1. Get the index names of the table.

    lsh -u vmsdba -p <cassandra_pass> -e 'DESCRIBE schema;' | grep "CREATE INDEX" | grep -i <keyspace>.<table> | awk '{print $3};'
    # cqlsh -u vmsdba -p some password -e 'DESCRIBE schema;' | grep "CREATE INDEX" | grep -i skyfall_idm.customeruser | awk '{print $3};'
     
    customeruser_roles
    customeruser_clientid_idx
    customeruser_deleted_idx
    customeruser_tenantidset_idx
    customeruser_pwdpolicyname
    
  2. Rebuild the indices.

    nodetool rebuild_index -- <keyspace> <table> <index>
    # nodetool rebuild_index -- skyfall_idm customeruser customeruser_roles
    # nodetool rebuild_index -- skyfall_idm customeruser customeruser_clientid_idx
    # nodetool rebuild_index -- skyfall_idm customeruser customeruser_deleted_idx
    # nodetool rebuild_index -- skyfall_idm customeruser customeruser_tenantidset_idx
    # nodetool rebuild_index -- skyfall_idm customeruser customeruser_pwdpolicyname
    
    

Restoring Consul Data

The following sections are provided as a general guide to one approach for restoring Consul data. For more details on performing the restore operation, see the official documentation at https://www.consul.io/commands/kv/import.

Preparing a Consul Backup for Restoring

Use the following procedure to prepare the Consul data backup that you will be restoring from:

  1. Download the backup gz file from the backup location.

  2. Transfer the backup gz file to the deployer container.

  3. Transfer the backup gz file to a master node.

  4. Copy the Consul tar to the Consul container on the kube-master.

    docker cp <consul-backup-tag>.json.gz consul:/tmp/
  5. Login to the Consul container on the kube-master.

    docker exec -it consul bash
  6. Unzip consul backup tar.

    gunzip -f /tmp/<consul-backup-tag>.json.gz
  7. Set the required Consul environment variables.

    export CONSUL_HTTP_TOKEN=<consul_master_token>
    
    export CONSUL_HTTP_SSL=true
    
    export CONSUL_HTTP_SSL_VERIFY=false

Restoring Consul Data

Use the following command to restore the Consul backup that you have prepared:

consul kv import @/tmp/<consul-backup-tag>.json

The following is an example for restoring Consul backup:

#Download the backup gz file from the backup location to installer container

aws s3 cp s3://saitestcsr-msx-bucket.platform.ciscovms.com/backup-rclone/2021-01-07T22:50:06Z-consul-backup-4.0.0-1.8.2-159.json.gz /tmp/

#Transfer the backup gz file to a master node

scp -i keys/id_rsa /tmp/2021-01-07T22:50:06Z-consul-backup-4.0.0-1.8.2-159.json.gz centos@10.20.0.9:/tmp/

#Copy consul tar to consul container

docker cp /tmp/2021-01-07T22:50:06Z-consul-backup-4.0.0-1.8.2-159.json.gz consul:/tmp/

#login to consul container

docker exec -it consul bash

#from inside consul container

gunzip -f /tmp/2021-01-07T22\:50\:06Z-consul-backup-4.0.0-1.8.2-159.json.gz

export CONSUL_HTTP_TOKEN=308f8ce8-c3f8-5719-8772-f16425635f76

export CONSUL_HTTP_SSL=true

export CONSUL_HTTP_SSL_VERIFY=false

consul kv import @/tmp/2021-01-07T22:50:06Z-consul-backup-4.0.0-1.8.2-159.json

Restoring ArangoDB

The following sections are provided as a general guide to one approach for restoring ArangoDB data. For more details on performing the restore operation, see the official documentation at https://www.arangodb.com/docs/stable/programs-arangorestore.html.

Preparing the ArangoDB Backup for Restoring

Use the following procedure to prepare the ArangoDB backup that you will be restoring from:

  1. Download the backup gz file from the backup location.

  2. Transfer the backup gz file to the deployer container.

  3. Transfer the backup gz file to a master node.

  4. Copy the ArangoDB-backup tar to the ArangoDB pod.

    kubectl -n vms cp <arango backup tar>.gz <pers-arangodb-sngl-pod>:/tmp/<arango backup tar>.gz
  5. Log in to the ArangoDB pod.

    kubectl -n vms exec -it <pers-arangodb-sngl-pod> /bin/sh
  6. Extract the ArangoDB backup tar.

    /bin/tar zxf /tmp/<arangodb backup tar>.gz -C /tmp/

Restoring ArangoDB Data

Use the following command to restore the ArangoDB backup that you have prepared:

for TENANT in $(ls /tmp/ | grep ^tenant_); do /usr/bin/arangorestore --server.database $TENANT --server.endpoint http+ssl://127.0.0.1:8529 --server.username root --create-database true --server.password <arangodb_password> --input-directory /tmp/$TENANT; done

Post-Restore Cleanup

After you have performed your restore operation, clean up the following files:

rm -f /tmp/<arangodb backup tar>.gz
rm -rf /tmp/tenant*

The following is an example for ArangoDB restore and cleanup operation:

#Download the backup gz file from the backup location to the installer container

aws s3 cp s3://saitestcsr-msx-bucket.platform.ciscovms.com/backup-rclone/20210108-000111-arangodb-backup-infra-ei-isolated-4.0.0-3.6.1-hardened-alpine3.11.6.json.gz /tmp/

#Transfer the backup gz file to a master node

scp -i keys/id_rsa /tmp/20210108-000111-arangodb-backup-infra-ei-isolated-4.0.0-3.6.1-hardened-alpine3.11.6.json.gz centos@10.20.0.9:/tmp/

#Copy the arangodb backup tar to the arangodb pod

kubectl -n vms cp /tmp/20210108-000111-arangodb-backup-infra-ei-isolated-4.0.0-3.6.1-hardened-alpine3.11.6.json.gz pers-arangodb-sngl-51xsckxs-bebf75:/tmp/20210108-000111-arangodb-backup-infra-ei-isolated-4.0.0-3.6.1-hardened-alpine3.11.6.json.gz

#Login into arangodb pod

kubectl -n vms exec -it pers-arangodb-sngl-51xsckxs-bebf75 /bin/sh

#Extract the arangodb tar

/bin/tar zxf /tmp/20210108-000111-arangodb-backup-infra-ei-isolated-4.0.0-3.6.1-hardened-alpine3.11.6.json.gz -C /tmp

#Restore arangodb data

for TENANT in $(ls /tmp/ | grep ^tenant_); do /usr/bin/arangorestore --server.database $TENANT --server.endpoint http+ssl://127.0.0.1:8529 --server.username root --create-database true --server.password 7hfCWS91cYxvOAr6ipsi --input-directory /tmp/$TENANT; done

#Post-restore cleanup

rm -f /tmp/20210108-000111-arangodb-backup-infra-ei-isolated-4.0.0-3.6.1-hardened-alpine3.11.6.json.gz

rm -rf /tmp/tenant*

Restoring Postgres Data

The following sections are provided as a general guide to one approach for restoring Postgres data. For more details on performing the restore operation, see the official documentation at https://www.postgresql.org/docs/9.1/backup-dump.html#BACKUP-DUMP-RESTORE.

Preparing the Postgres Backup for Restoring

Use the following procedure to prepare the Postgres backup that you will be restoring from:

  1. Download the backup gz file from the backup location.

  2. Transfer the backup gz file to the deployer container.

  3. Transfer the backup gz file to a master node.

  4. Copy the postgres backup tar to suite-postgresql-0.

    kubectl -n vms cp <postgres backup tar>.gz suite-postgresql-0:/tmp/<postgres backup tar>.gz
  5. Login to suite-postgresql-0.

    kubectl -n vms exec -it suite-postgresql-0 bash
  6. Extract postgres backup tar.

    /bin/tar zxf /tmp/<postgres backup tar>.gz -C /tmp/

Restoring Postgres Data

Use the following procedure to restore the Postgres backup that you have prepared:

cat /tmp/suite-cryptoservice | psql

Post-Restore Cleanup

After you have performed your restore operation, clean up the following files:

rm -f /tmp/<postgres backup tar>.gz /tmp/suite-cryptoservice

The following is an example for Postgres restore and cleanup operation:

#Download the backup gz file from the backup location to installer container

aws s3 cp s3://saitestcsr-msx-bucket.platform.ciscovms.com/backup-rclone/20210108-010117-pgsqldb-backup-9.6.tar.gz /tmp/

#Transfer the backup gz file to a master node

scp -i keys/id_rsa /tmp/20210108-010117-pgsqldb-backup-9.6.tar.gz centos@10.20.0.9:/tmp/

#Copy arangodb backup tar to arangodb pod

kubectl -n vms cp /tmp/20210108-010117-pgsqldb-backup-9.6.tar.gz suite-postgresql-0:/tmp/20210108-010117-pgsqldb-backup-9.6.tar.gz

#login into suite-postgresql-0 pod

kubectl -n vms exec -it suite-postgresql-0 bash

#extract postgres tar

/bin/tar zxf /tmp/20210108-010117-pgsqldb-backup-9.6.tar.gz -C /tmp

#restore postgres data

cat /tmp/suite-cryptoservice | psql

#post restore cleanup

rm -rf /tmp/20210108-010117-pgsqldb-backup-9.6.tar.gz /tmp/suite-cryptoservice

Restoring CockroachDB Data

The following sections are provided as a general guide to one approach for restoring CockroachDB data. For more details on performing the restore operation, see the official documentation at https://www.cockroachlabs.com/docs/v20.2/cockroach-dump#restore-a-table-from-a-backup-file.

Preparing the CockroachDB Backup for Restoring

Drop the existing database that you want to restore:

ansible -m shell -a 'kubectl -n vms exec cockroachdb-0 -c cockroachdb -- /cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs/ --execute "drop database <database_name> cascade;"' kube-master[0]

Create a new database, user, and password:

ansible-playbook cockroachdb-add-database.yml --extra-vars '{"newDatabase":{ "name": "<db_name>", "user":"<db_username>", "service":"<db_service>" }}'

Prepare the Backup to Restore from

Use the following procedure to prepare the CockroachDB backup that you will be restoring from:

  1. Download the backup gz file from the backup location.

  2. Transfer the backup gz file to the deployer container.

  3. Transfer the backup gz file to a master node.

  4. Copy CockroachDB backup tar to cockroachdb-0.

    kubectl -n vms cp <cockroachdb backup tar>.gz cockroachdb-0:/tmp/<cockraochdb backup tar>.gz -c cockroachdb
  5. Log in to CockroachDB.

    kubectl -n vms exec -it cockroachdb-0 -c cockroachdb bash
  6. Create a dump directory that will be used to extract the cockroachdb tar.

    mkdir /tmp/dump
  7. Extract cockroachdb backup tar.

    tar xvf /tmp/<cockroachdb backup tar>.gz -C /tmp/dump
  8. Unzip the database to restore.

    /bin/gzip -d -f /tmp/dump/<database_tag>.sql.gz

Restoring the Cockroach Database

Use the following procedure to restore the CockroachDB backup that you have prepared:

/cockroach/cockroach sql --host cockroachdb-public --certs-dir=/cockroach/cockroach-certs --database=<db_name> 
< /tmp/dump/<database_tag>.sql

Post Restore Cleanup

After you have performed your restore operation, clean up the following files:

rm -f /tmp/<cockraochdb backup tar>.gz 
rm -rf /tmp/dump

The following is an example for CockroachDB restore and cleanup operation:

# Drop existing database

ansible -m shell -a 'kubectl -n vms exec cockroachdb-0 -c cockroachdb -- /cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs/ --execute "drop database serviceconfigmanager cascade;"' kube-master[0]

# Create database

ansible-playbook cockroachdb-add-database.yml --extra-vars '{"newDatabase":{ "name": "serviceconfigmanager", "user":"serviceconfigmanager", "service":"serviceconfigmanager" }}'

#Download the backup gz file from the backup location to installer container

aws s3 cp s3://saitestcsr-msx-bucket.platform.ciscovms.com/backup-rclone/20210108-010100-cockroachdb-backup-4.0.0-20.2.0-159.tar.gz /tmp/

#Transfer the backup gz file to a master node

scp -i keys/id_rsa /tmp/20210108-010100-cockroachdb-backup-4.0.0-20.2.0-159.tar.gz centos@10.20.0.9:/tmp/

#Copy cockraochdb backup tar to cockroachdb pod

kubectl -n vms cp /tmp/20210108-010100-cockroachdb-backup-4.0.0-20.2.0-159.tar.gz cockroachdb-0:/tmp/20210108-010100-cockroachdb-backup-4.0.0-20.2.0-159.tar.gz -c cockroachdb

#Login to cockroachdb pod

kubectl -n vms exec -it cockroachdb-0 -c cockroachdb bash

#Create dump dir to extract cockroachdb tar

mkdir /tmp/dump

#Extract cockroachdb tar

tar xvf /tmp/20210108-010100-cockroachdb-backup-4.0.0-20.2.0-159.tar.gz -C /tmp/dump

#Unzip database needed to restore

/bin/gzip -d -f /tmp/dump/2021-01-08T01:30:43Z-serviceconfigmanager.sql.gz

#Restoring database

/cockroach/cockroach sql --host cockroachdb-public --certs-dir=/cockroach/cockroach-certs --database=serviceconfigmanager < /tmp/dump/2021-01-08T01:30:43Z-serviceconfigmanager.sql

#Post restore cleanup

rm -f /tmp/20210108-010100-cockroachdb-backup-4.0.0-20.2.0-159.tar.gz

rm -rf /tmp/dump

Restoring NSO

When restoring NSO, keep in mind the following:

  • The backup tarball contains NSO cdb and streams files which will be restored.

  • The restore playbook assumes that the target NSO pod(s) do not exist.

  • After the cdb and streams are restored, the playbook does not start the target pods. Therefore, there will be no target NSO pods before and after the restore operation is complete.

  • The NSO restore playbook will allow you to specify which specific shard you would like to restore.

Use the following procedure to restore NSO:

  1. Choose a backup file from the S3 drive in the format of nso-TAG-3.x.y-yyyy-mm-dd.hh-mm-ss.tgz. For example, nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06.tgz.

  2. Go to the installer node and place it under /{msx-version}/ansible/vms-backup. For example: /msx-4.0.0/ansible/vms-backup

  3. Untar the tarball using 'tar xzvf {nso-TAG-vms-version}-yyyy-mm-dd.hh-mm-ss.tgz. For example:

    #cd /msx-4.0.0/ansible/vms-backup #tar xzvf nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06.tgz
    
    nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06/
    
    nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06/ncs-data-vol-nso-manageddevice-shard0-0.tgz
    
    nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06/ncs-streams-nso-manageddevice-shard0-0.tgz
    
    nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06/ncs-data-vol-nso-manageddevice-shard1-0.tgz
    
    nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06/ncs-streams-nso-manageddevice-shard1-0.tgz

To restore all the shards, run the following command:

library/ansible-playrole backup/restore/nso-restore kube-master[0] "servicepack_name: <SP_name>, backup_tag: nso-TAG-3.x.y-yyyy-mm-dd.hh-mm-ss, BR_mode: restore"

Where 'nso-TAG-3.x.y-yyyy-mm-dd.hh-mm-ss' is the backup file name without .tgz.

The following is an example for restoring all Shards:

#cd /msx-4.0.0/ansible

#ls -l vms-backup

drwxr-xr-x 2 root root 4096 Dec 18 19:49 infra

drwxr-xr-x 2 root root 4096 Jan 9 01:17 nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06

#library/ansible-playrole backup-restore/nso-restore kube-master[0] "servicepack_name: manageddevice, backup_tag: nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06, BR_mode: restore "

To restore a specific shard, run the following command:

library/ansible-playrole backup/restore/nso-restore kube-master[0] "servicepack_name: <SP_name>, backup_tag: nso-TAG-3.x.y-yyyy-mm-dd.hh-mm-ss, shardNumber: <num>"

where <num> is shard0, shard1, shard2, etc.

The following is an example for restoring Shard1:

#cd /msx-4.0.0/ansible

#ls -l vms-backup

drwxr-xr-x 2 root root 4096 Dec 18 19:49 infra

drwxr-xr-x 2 root root 4096 Jan 9 01:17 nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06

#library/ansible-playrole backup-restore/nso-restore kube-master[0] "servicepack_name: manageddevice, backup_tag: nso-infra-ei-isolated-4.0.0-2021-01-09.01-17-06, BR_mode: restore, shardNumber: shard1"

Logging into the Portal

The portal passwords are stored in the passwords.yml file.

The system generates random passwords automatically in passwords.yml file. In this scenario, to retrieve the credentials, run this command:


ansible-vault --vault-password-file vault view group_vars/all/passwords.yml

Procedure


Step 1

Verify your DNS entries. If you have used route53 to register the FQDN of the MSX portal to the ciscovms.com domain, ensure that the AWS Route53 entries have propagated to your server. You can verify this on the following website:


https://dnschecker.org/

Your FQDN setting from main.yml file (host.ciscovms.com) must match the values of your OpenStack instances, for example, edge-instance-host-1

Step 2

Log in to the portal with the following URL:

https://<msx_subdomain>.<msx_domain>

or

https://<your_portal_fqdn>

Add this FQDN and IP address to your local machine's hosts file.


Integrating Google Maps

To use the Address Form and the Google Maps, you must use the Google Maps API key. If you are using your own API Key, you can enable Google Maps using the procedure below. If you do not have your own API key, you can ask Cisco TAC to give you an API key. The Cisco team will generate a key for you and allow your domain.

Use the following procedure to update the Google Maps API Key in MSX.

Procedure


Step 1

Log in to the kubernetes-master-1 node.


ssh -i id_rsa centos@_INCEPTION_FLOATING_IP_ADDRESS_ -t ssh _kubernetes-master-1_IP_ADDRESS_
Step 2

In the file `/data/vms/skyfallui/gconfig.js, replace the GOOGLE_API_KEY line with the following:


var GOOGLE_API_KEY = 'AIzaSyGKCGans9q5vrZNtngc2D5vOIrpEXAMPLE'

Restricting Access to the Inception VM

Inception VM allows SSH access from any source IP address. This is primarily for debugging purposes and is required for the deployment to succeed. To restrict this access, add the required source IP addresses and update the security group attached to the Inception VM, after the deployment is complete.

Removing an MSX Installation

Procedure


Step 1

Export the ANSIBLE_VAULT_PASSWORD_FILE main.yml variable to the path of the password file.

ANSIBLE_VAULT_PASSWORD_FILE=<path to the file>
Step 2

Invoke the following playbook:

ansible-playbook destroy-infra.yml (OpenStack)
ansible-playbook destroy-infra-aws.yml (AWS)

Warning: When you use any destroy playbook, it will delete all content within S3 or MinIO, which includes CockroachDB backups. Ensure that you have relocated those backups before running the destroy playbook if you need to restore your instance.

This playbook removes MSX VMs, cleans up the cinder volumes, sec-groups, keys, floating IP addresses, the neutron router, and de-registers the vms_subdomain from route53. This playbook does not delete the CSR VM, or remove the Security Groups, Images, or Key Pairs from OpenStack.