Cloud Infrastructure

Cloud-hosted control component snapshots

The system takes regular snapshots of cloud-hosted Cisco SD-WAN Manager control components that we manage, based on the snapshot frequency. By default, the snapshot frequency is once every day, typically at midnight of the region of deployment, and the system retains the last seven snapshots. You can set the snapshot frequency to values between once a day to once every four days. To learn more about snapshots, refer to Information About Snapshots in the Cisco Catalyst SD-WAN Portal Configuration Guide.

Open a Technical Assistance Center (TAC) support case to review the current snapshot setting, or use the Cisco Catalyst SD-WAN Portal to change it. You can retain a maximum of seven periodic snapshots.


Note


Since Cisco SD-WAN Validator and Cisco Catalyst SD-WAN Controller are stateless, snapshots are not captured. Use a Cisco SD-WAN Manager template to configure and save Cisco SD-WAN Validator and Cisco Catalyst SD-WAN Controller configuration settings.


Snapshots are stored in your cloud account and cannot be downloaded. However, you can download the config-db backup file from Cisco SD-WAN Manager and save the configurations, including the templates, with the command request nms configuration-db backup path.

Take an on-demand snapshot


Note


You can only use the on-demand snapshot process with fabrics with Cisco-hosted, cloud-based, dedicated, single-tenant control components. This is not applicable if you have a shared tenant fabric.


For any major planned change windows for Cisco SD-WAN Manager, you can take on-demand snapshot using Cisco Catalyst SD-WAN Portal. You can request this by opening a TAC support case with the CloudOps team. Freeze configuration changes and allocate up to eight hours prior to the change window to allow the on-demand snapshot to be taken and completed. You can store only one on-demand snapshot at a time for up to ten days from the creation date. If you create a new on-demand snapshot, the system removes and replaces the previous snapshot.

Mandatory maintenance of cloud-hosted control components

The CloudOps team may at times need to perform maintenance on your instances. The instances are then rebooted before the cloud provider's maintenance window. This process allows you to move the instances from a hardware node that requires maintenance to a new and healthy hardware node. This approach prevents disruption of service.

You receive notifications at your registered email address for your fabric in the CloudOps system. The registered email address is initially configured using your original Sales Order's End Customer Email Address field. You can update it by logging into the Cisco Catalyst SD-WAN Portal at https://ssp.sdwan.cisco.com. The registered email address is not derived from the Cisco SD-WAN Manager Settings page.


Note


You receive email notices about mandatory reboots of cloud-hosted control components hosted in Amazon Web Services (AWS) only.


You can reschedule the change window, as long as the requested date and time is before the cloud provider's maintenance window time.You may not always receive advance notice, as the timing depends on the severity of the issue on the cloud provider's hardware node.

Cisco Catalyst SD-WAN disaster recovery guidelines

Cisco Catalyst SD-WAN disaster recovery (DR) is based on Cisco SD-WAN Manager disk volume snapshots and configuration database backups.

About backups and snapshots

The system takes configuration database backups and volume snapshots daily, typically around midnight at the location of the Cisco SD-WAN Manager instance. They are securely stored on the cloud.

Starting with Cisco SD-WAN Release 20.3.x and later, you can turn off the configuration database backup feature, make your own backups, and provide them to CloudOps when needed for recovery of the service.

Cisco SD-WAN Manager disk volume snapshots are taken every night, on-demand at your request, or at the start of major change windows. Each Cisco SD-WAN Manager has two or more disks, and a snapshot of each of the volumes is taken at the same time to form an overall backup of the Cisco SD-WAN Manager instance.

The completed snapshots from the region where the Cisco SD-WAN Manager is running are then copied over to the designated backup region, which is usually a different geographic area.

For example, Cisco SD-WAN Manager may be running in US-East with the backup region designated as US-West. The backup region is an identically configured region where the second Cisco Catalyst SD-WAN Validator and Cisco Catalyst SD-WAN Controller are already running.

Cisco Catalyst SD-WAN Validator and Cisco Catalyst SD-WAN Controller are stateless services that have their configuration managed by Cisco SD-WAN Manager or via CLI, so they are not backed up.

High availability for Cisco SD-WAN Manager is handled by a cluster with three or six nodes in the same availability zone and region. The backup region does not include a standby or active Cisco SD-WAN Manager service

Cisco Catalyst SD-WAN Validator and Cisco Catalyst SD-WAN Controller services are deployed in both primary and backup regions. Both work in active mode. Device and policy information is pushed to both instances from Cisco SD-WAN Manager. When one region fails, Cisco Catalyst SD-WAN Controller and Cisco Catalyst SD-WAN Validator continue to function in the backup region.

Cisco Catalyst SD-WAN is designed for the data plane to continue to function even if all the control components fail. GR (Graceful Restart) timer configuration enables the high availability of the data plane. The GR timer holds the routes advertised by Cisco Catalyst SD-WAN Controllers for 12 hours by default. Choose your GR timer value carefully to ensure your control components can be backed up in case of failures and to support learning the new routes from network changes.

The configuration database-based recovery method allows the restoration of templates and policies only. In contrast, volume-based recovery includes collected statistics data.

Configuration database backup

Prior to Cisco vManage Release 20.3.1, the configuration database is backed up only if all these conditions are met:

  • Monitoring is enabled in the CloudInfra system. If the viptelatac user is unusable on the Cisco SD-WAN Manager for any reason, monitoring is disabled and you are notified with a request for correction.

  • The viptelatac user is usable on the Cisco SD-WAN Manager.

  • The configuration database size is less than 4 GB.

In Cisco vManage Release 20.3.1 and later, the configuration database is backed up only if all these conditions are met:

  • Monitoring is enabled in the CloudInfra system.


    Note


    In Cisco SD-WAN Manager, if the cloud service is disabled for any reason, monitoring is disabled on the CloudInfra system and you are notified with a request for correction.


  • The nms configuration-db daily-backup service is enabled in the Cisco SD-WAN Manager CLI.

  • Cloud Services, vMonitoring, and OTP are enabled in Cisco SD-WAN Manager Settings.

  • The configuration database size is less than 4 GB.

Volume snapshot-based recovery

After the CloudOps team determines that the Cisco SD-WAN Manager instance needs to be replaced with a backup, we can initiate the DR process.

For DR in the same region, we select the same region and datacenter as the current Cisco SD-WAN Manager instance location. We specify the snapshot date and time of the snapshot based on requirements and availability.

Once DR triggers, the system first shuts down the existing Cisco SD-WAN Manager instance.

The system then uses the volume snapshots to create a new cloud instance with the same set of disks, instance size specifications, private subnets, security access list, and isolated environment that the original Cisco SD-WAN Manager had. Once the instance is up, the system swaps the public IPs from the old Cisco SD-WAN Manager instance to the new Cisco SD-WAN Manager instance.

The new running Cisco SD-WAN Manager instance has new private IPs but the same public IPs, software version, configuration, and data as when the snapshot was taken.

Cisco SD-WAN Manager is configured with the information necessary to join the fabric. You can use the same FQDN or URL to log in to the Cisco SD-WAN Manager instance as before.

In the unlikely case where the primary region of Cisco SD-WAN Manager has failed and is unavailable, we use the exact same process for DR to the backup region, except that the backup cloud region is selected.

When the new Cisco SD-WAN Manager instance runs in the backup region, the system does not swap public IPs between regions. Cloud regions have a specific public IP pool per region and cannot be assigned to instances across regions.

Thus, the new DR Cisco SD-WAN Manager instance in the backup region has new public IPs. The system updates the FQDN or DNS with the new public IP of the Cisco SD-WAN Manager.

In this case, you may need to update the enterprise end firewall with the new public IP of the Cisco SD-WAN Manager.

Configuration database-based recovery

If we cannot take a volume snapshot, we use the configuration database recovery process. We create a new Cisco SD-WAN Manager instance and use the configuration database backup to restore the original configuration files. With this method, the statistics database of the original Cisco SD-WAN Manager instance is not restored. This method restores your templates and policies configuration. The new Cisco SD-WAN Manager instance in this case has both new public IPs and new private IPs.

We update the FQDN or DNS of the Cisco SD-WAN Manager to use the new public IP of the new instance.

In this case, you may need to update the enterprise end firewall with the new public IP of the Cisco SD-WAN Manager.

The process for using a configuration database backup for DR is identical for both same region and backup region recovery.

For process details, refer to the section Restore a Cisco SD-WAN Manager Instance from Backup in the Recover Cisco Catalyst SD-WAN Manager Troubleshooting TechNote.