Cisco Catalyst SD-WAN disaster recovery (DR) is based on Cisco SD-WAN Manager disk volume snapshots and configuration database backups.
About backups and snapshots
The system takes configuration database backups and volume snapshots daily, typically around midnight at the location of the
Cisco SD-WAN Manager instance. They are securely stored on the cloud.
Starting with Cisco SD-WAN Release 20.3.x and later, you can turn off the configuration database backup feature, make your own backups, and provide them to CloudOps
when needed for recovery of the service.
Cisco SD-WAN Manager disk volume snapshots are taken every night, on-demand at your request, or at the start of major change windows. Each Cisco SD-WAN Manager has two or more disks, and a snapshot of each of the volumes is taken at the same time to form an overall backup of the Cisco SD-WAN Manager instance.
The completed snapshots from the region where the Cisco SD-WAN Manager is running are then copied over to the designated backup region, which is usually a different geographic area.
For example, Cisco SD-WAN Manager may be running in US-East with the backup region designated as US-West. The backup region is an identically configured region
where the second Cisco Catalyst SD-WAN Validator and Cisco Catalyst SD-WAN Controller are already running.
Cisco Catalyst SD-WAN Validator and Cisco Catalyst SD-WAN Controller are stateless services that have their configuration managed by Cisco SD-WAN Manager or via CLI, so they are not backed up.
High availability for Cisco SD-WAN Manager is handled by a cluster with three or six nodes in the same availability zone and region. The backup region does not include
a standby or active Cisco SD-WAN Manager service
Cisco Catalyst SD-WAN Validator and Cisco Catalyst SD-WAN Controller services are deployed in both primary and backup regions. Both work in active mode. Device and policy information is pushed
to both instances from Cisco SD-WAN Manager. When one region fails, Cisco Catalyst SD-WAN Controller and Cisco Catalyst SD-WAN Validator continue to function in the backup region.
Cisco Catalyst SD-WAN is designed for the data plane to continue to function even if all the control components fail. GR (Graceful Restart) timer
configuration enables the high availability of the data plane. The GR timer holds the routes advertised by Cisco Catalyst SD-WAN Controllers for 12 hours by default. Choose your GR timer value carefully to ensure your control components can be backed up in case
of failures and to support learning the new routes from network changes.
The configuration database-based recovery method allows the restoration of templates and policies only. In contrast, volume-based
recovery includes collected statistics data.
Configuration database backup
Prior to Cisco vManage Release 20.3.1, the configuration database is backed up only if all these conditions are met:
-
Monitoring is enabled in the CloudInfra system. If the viptelatac user is unusable on the Cisco SD-WAN Manager for any reason, monitoring is disabled and you are notified with a request for correction.
-
The viptelatac user is usable on the Cisco SD-WAN Manager.
-
The configuration database size is less than 4 GB.
In Cisco vManage Release 20.3.1 and later, the configuration database is backed up only if all these conditions are met:
-
Monitoring is enabled in the CloudInfra system.

Note
|
In Cisco SD-WAN Manager, if the cloud service is disabled for any reason, monitoring is disabled on the CloudInfra system and you are notified with
a request for correction.
|
-
The nms configuration-db daily-backup service is enabled in the Cisco SD-WAN Manager CLI.
-
Cloud Services, vMonitoring, and OTP are enabled in Cisco SD-WAN Manager Settings.
-
The configuration database size is less than 4 GB.
Volume snapshot-based recovery
After the CloudOps team determines that the Cisco SD-WAN Manager instance needs to be replaced with a backup, we can initiate the DR process.
For DR in the same region, we select the same region and datacenter as the current Cisco SD-WAN Manager instance location. We specify the snapshot date and time of the snapshot based on requirements and availability.
Once DR triggers, the system first shuts down the existing Cisco SD-WAN Manager instance.
The system then uses the volume snapshots to create a new cloud instance with the same set of disks, instance size specifications,
private subnets, security access list, and isolated environment that the original Cisco SD-WAN Manager had. Once the instance is up, the system swaps the public IPs from the old Cisco SD-WAN Manager instance to the new Cisco SD-WAN Manager instance.
The new running Cisco SD-WAN Manager instance has new private IPs but the same public IPs, software version, configuration, and data as when the snapshot was
taken.
Cisco SD-WAN Manager is configured with the information necessary to join the fabric. You can use the same FQDN or URL to log in to the Cisco SD-WAN Manager instance as before.
In the unlikely case where the primary region of Cisco SD-WAN Manager has failed and is unavailable, we use the exact same process for DR to the backup region, except that the backup cloud region
is selected.
When the new Cisco SD-WAN Manager instance runs in the backup region, the system does not swap public IPs between regions. Cloud regions have a specific public
IP pool per region and cannot be assigned to instances across regions.
Thus, the new DR Cisco SD-WAN Manager instance in the backup region has new public IPs. The system updates the FQDN or DNS with the new public IP of the Cisco SD-WAN Manager.
In this case, you may need to update the enterprise end firewall with the new public IP of the Cisco SD-WAN Manager.
Configuration database-based recovery
If we cannot take a volume snapshot, we use the configuration database recovery process. We create a new Cisco SD-WAN Manager instance and use the configuration database backup to restore the original configuration files. With this method, the statistics
database of the original Cisco SD-WAN Manager instance is not restored. This method restores your templates and policies configuration. The new Cisco SD-WAN Manager instance in this case has both new public IPs and new private IPs.
We update the FQDN or DNS of the Cisco SD-WAN Manager to use the new public IP of the new instance.
In this case, you may need to update the enterprise end firewall with the new public IP of the Cisco SD-WAN Manager.
The process for using a configuration database backup for DR is identical for both same region and backup region recovery.
For process details, refer to the section Restore a Cisco SD-WAN Manager Instance from Backup in the Recover Cisco Catalyst SD-WAN Manager Troubleshooting TechNote.