Data Protection Overview
The HX Data Platform disaster recovery feature allows you to protect virtual machines from a disaster by setting up replication of running VMs between a pair of network connected clusters. Protected virtual machines running on one cluster replicate to the other cluster in the pair, and vice versa. The two paired clusters typically are located at a distance from each other, with each cluster serving as the disaster recovery site for virtual machines running on the other cluster.
Once protection has been set up on a VM, HX Data Platform periodically takes a replication snapshot of the running VM on the local cluster and replicates (copies) the snapshot to the paired remote cluster. In the event of a disaster at the local cluster, the most recently replicated snapshot of each protected VM is used by the user to recover and run the VM at the remote cluster. Each cluster that serves as a disaster recovery site for another cluster, must be sized with adequate spare resources so that upon a disaster, it can run the newly recovered virtual machines in addition to its normal workload.
Each virtual machine can be individually protected by assigning it protection attributes, chief among which is the replication interval (schedule). The shorter the replication interval, the fresher the replicated snapshot data is likely to be, when it is time to recover the VM after a disaster. Replication intervals can range between 15 minutes and 24 hours.
A new per-cluster grouping construct, called a Protection Group, groups protected VMs and assigns them the same protection attributes. A VM can be protected simply by adding it to a protection group for which attributes have already been defined.
Setting up replication requires two existing clusters running HX Data Platform version 2.5 or higher. This setup can be completed online.
Replication is between two clusters, both clusters must be either all flash or hybrid. Mixed configuration is not supported. Replication cannot be between one all flash and one hybrid cluster.
First, each cluster is set up for replication networking. This involves, using HX Connect to provide a set of IP addresses to be used by local cluster nodes to replicate to the remote cluster. As part of the process, HX Connect creates VLANs through UCS Manager, for dedicated replication network use.
Next, the two clusters, and their corresponding existing relevant datastores must be explicitly paired. The pairing setup can be completed using HX Connect from one of the two clusters. This requires administrative credentials of the other cluster.
Finally, virtual machines can be protected (or have their existing protection attributes modified) by using HX Connect at the cluster where they are currently active.
HX Connect can be used to monitor status of both incoming and outgoing replications at a cluster.
After a disaster, a protected VM can be recovered and run at the cluster that serves as the disaster recovery site for that
VM, using the
stcli command line tool invoked on any node within the cluster.
Replication and Recovery Considerations
The following is a list of considerations when configuring virtual machine replication and performing disaster recovery of virtual machines.
All replication and recovery tasks, except monitoring, can only be performed by users with administrator privileges on the local cluster. For tasks involving a remote cluster, both the local and remote users provided must have administrator privileges and be configured with vCenter SSO on their respective clusters.
Ensure you have sufficient space on the remote cluster to support your replication schedule. The protected virtual machines are replicated (copied) to the remote cluster at every scheduled interval. Though storage capacity methods are applied, (deduplication, compression), each replicated virtual machine does consume some storage space.
Not having sufficient storage space on the remote cluster can cause the remote cluster to reach capacity usage maximums. If you see Out of Space errors, see Handling Out of Space Errors. Pause all replication schedules until you have appropriately adjusted the space available on the HX Cluster. Always ensure your cluster capacity consumption is below the space utilization warning threshold.
Replication protection is between two HX Clusters.
Both clusters must be of the same type, either both all flash or both hybrid.
Replication between, to, or from Edge clusters is not supported.
Do not reboot any nodes in the HX Cluster during any restore, replication, or recovery operation.
Protected virtual machines are recovered with thin provisioned disks irrespective of how disks were specified in the originally protected virtual machine.
If you have protected a VM that includes storage on a non-HX datastore, attempts to periodically replicate this VM fail. Either unprotect this VM, or remove its non-HX storage.
Similarly, Do not move protected VMs from HX datastores to non-HX datastores. If a VM is moved to a non-HX datastore through storage vMotion, unprotect the VM, then reapply the protection.
Hierarchy of recovered virtual machines.
Recovery of virtual machines replication keeps the hierarchy as is.
Test recovery consolidates the replication virtual machines consolidates the replication hierarchy into a common cloned base disk. The data is all in the cloned common disk. However, since it is testing as a clone with a new vm-uuid, the test recovery starts a new replication virtual machine hierarchy.
HX Data Platform uses several forms of snapshot technology. Each snapshot satisfies a specific use case and has specific traits. They are not interchangeable.
A ReadyClone―similar to a standard clone, is a copy of an existing VM. The existing VM is called the host VM. When the cloning operation is complete, the ReadyClone is a separate guest VM.
A Native Snapshots―a backup feature that saves versions (states) of working VMs. VMs can be reverted back to native snapshots.
A Replication Snapshot―created as part of the VM replication protection. At the scheduled time, a replication snapshot is taken of a running VM. This snapshot is then replicated (copied) to the remote cluster.
A Recovery Test Snapshot―a temporary snapshot used to verify that the recovery system is working.
A Recovered VM―the restored VM, created by restoring a most recent replication snapshot from the recovery cluster.
Data Protection Terms
Failover―The process of shifting the location of the working VMs, in the event of a disaster on the source cluster. In other words, the primary cluster fails and the secondary cluster assumes the working VMs.
Local cluster―The local cluster is the cluster where the configuration task is performed for the replication paired with a remote cluster. Typically used in context of replication.
Migration―Used at anytime, for routine system maintenance and management, that is not a disaster. In a migration, similar to recovery, a recent replication snapshot copy of the VM becomes the working VM. The pairing between the source and target cluster remains.
Primary cluster―Another set of terms similar to source vs target, except the primary is simply, where the working VMs currently reside. If the VMs are migrated to the secondary, the secondary becomes the primary. Typically used in context with failover.
Protected virtual machine―A VM that has replication configured, either individually or through a protection group.
Protection group―A means to apply the same replication configuration on a group of VMs.
Recovery test―A maintenance task that ensures the recovery process would be successful in the event of a disaster.
Remote cluster―The remote cluster is the other half of the replication pair with a local cluster. Typically used in context of replication.
Replication pair―Two clusters that provide a remote location for storing replication snapshots of the local VMs.
Replication snapshot―Part of the replication protection mechanism. A type of snapshot taken of the protected VM. Copied from the source cluster to the target cluster.
Secondary cluster―Another set of terms similar to source vs target, except the primary is simply, where the working VMs currently reside. If the VMs are migrated to the secondary, the secondary becomes the primary. Typically used in context with failover.
Source cluster―The source cluster is where the VMs reside. Typically used in context with recovery.
Target cluster―The target cluster is where the VM replication snapshot copies reside. Typically used in context with recovery.
Resides on a datastore in the source cluster of a replication pair.
Has a replication schedule configured either as an individual VM or through a protection group.
Protection group―A mechanism for assigning replication schedules to a group of VMs
Interval―Part of the replication schedule configuration. This is how often the protected VMs replication snapshot is taken and copied to the target cluster.