Data Protection Overview
The HX Data Platform disaster recovery feature allows you to protect virtual machines from a disaster by setting up replication of running VMs between a pair of network connected clusters. Protected virtual machines running on one cluster replicate to the other cluster in the pair, and vice versa. The two paired clusters typically are located at a distance from each other, with each cluster serving as the disaster recovery site for virtual machines running on the other cluster.
Once protection has been set up on a VM, HX Data Platform periodically takes a replication snapshot of the running VM on the local cluster and replicates (copies) the snapshot to the paired remote cluster. In the event of a disaster at the local cluster, the most recently replicated snapshot of each protected VM is used by the user to recover and run the VM at the remote cluster. Each cluster that serves as a disaster recovery site for another cluster, must be sized with adequate spare resources so that upon a disaster, it can run the newly recovered virtual machines in addition to its normal workload.
Each virtual machine can be individually protected by assigning it protection attributes, chief among which is the replication interval (schedule). The shorter the replication interval, the fresher the replicated snapshot data is likely to be, when it is time to recover the VM after a disaster. Replication intervals can range between 15 minutes and 24 hours.
A new per-cluster grouping construct, called a Protection Group, groups protected VMs and assigns them the same protection attributes. A VM can be protected simply by adding it to a protection group for which attributes have already been defined.
Setting up replication requires two existing clusters running HX Data Platform version 2.5 or higher. Both clusters must be on the same HX Data Platform version. This setup can be completed online.
First, each cluster is set up for replication networking. This involves, using HX Connect to provide a set of IP addresses to be used by local cluster nodes to replicate to the remote cluster. As part of the process, HX Connect creates VLANs through UCS Manager, for dedicated replication network use.
Next, the two clusters, and their corresponding existing relevant datastores must be explicitly paired. The pairing setup can be completed using HX Connect from one of the two clusters. This requires administrative credentials of the other cluster.
Finally, virtual machines can be protected (or have their existing protection attributes modified) by using HX Connect at the cluster where they are currently active.
HX Connect can be used to monitor status of both incoming and outgoing replications at a cluster.
After a disaster, a protected VM can be recovered and run at the cluster that serves as the disaster recovery site for that
VM, using the
stcli command line tool invoked on any node within the cluster.
Replication and Recovery Considerations
The following is a list of considerations when configuring virtual machine replication and performing disaster recovery of virtual machines.
Administrator―All replication and recovery tasks, except monitoring, can only be performed by users with administrator privileges on the local cluster. For tasks involving a remote cluster, both the local and remote users provided must have administrator privileges and be configured with vCenter SSO on their respective clusters.
Storage Space―Ensure you have sufficient space on the remote cluster to support your replication schedule. The protected virtual machines are replicated (copied) to the remote cluster at every scheduled interval. Though storage capacity methods are applied, (deduplication, compression), each replicated virtual machine does consume some storage space.
Not having sufficient storage space on the remote cluster can cause the remote cluster to reach capacity usage maximums. If you see Out of Space errors, see Handling Out of Space Errors. Pause all replication schedules until you have appropriately adjusted the space available on the HX Cluster. Always ensure your cluster capacity consumption is below the space utilization warning threshold.
Unsupported Clusters―Replication protection is between two HX Clusters.
Replication between, to, or from Edge clusters is not supported.
Rebooting Nodes―Do not reboot any nodes in the HX Cluster during any restore, replication, or recovery operation.
Thin Provision―Protected virtual machines are recovered with thin provisioned disks irrespective of how disks were specified in the originally protected virtual machine.
Protection Group Limitations―
The maximum number of VMs allowed in a protection group is 32.
Do not add VMs with ISOs or floppies to protection groups.
Non-HX Datastores―If you have protected a VM that includes storage on a non-HX datastore, attempts to periodically replicate this VM fail. Either unprotect this VM, or remove its non-HX storage.
Similarly, Do not move protected VMs from HX datastores to non-HX datastores. If a VM is moved to a non-HX datastore through storage vMotion, unprotect the VM, then reapply the protection.
Hierarchy of Recovered Virtual Machines―
Recovery of virtual machines replication keeps the hierarchy as is.
Test recovery consolidates the replication virtual machines consolidates the replication hierarchy into a common cloned base disk. The data is all in the cloned common disk. However, since it is testing as a clone with a new
vm-uuid, the test recovery starts a new replication virtual machine hierarchy.
Snapshot Memory Option―Do not include the snapshot memory option when configuring virtual machines for data protection replication or recovery.
A memory snapshot includes the memory and power state of the VM. This type of snapshot takes longer to complete. See VMware vSphere best practices documentation for additional information.
Snapshot Types―HX Data Platform uses several forms of snapshot technology. Each snapshot satisfies a specific use case and has specific traits. They are not interchangeable.
A ReadyClone―similar to a standard clone, is a copy of an existing VM. The existing VM is called the host VM. When the cloning operation is complete, the ReadyClone is a separate guest VM.
A Native Snapshots―a backup feature that saves versions (states) of working VMs. VMs can be reverted back to native snapshots.
A Replication Snapshot―created as part of the VM replication protection. At the scheduled time, a replication snapshot is taken of a running VM. This snapshot is then replicated (copied) to the remote cluster.
A Recovery Test Snapshot―a temporary snapshot used to verify that the recovery system is working.
A Recovered VM―the restored VM, created by restoring a most recent replication snapshot from the recovery cluster.
Data Protection Terms
Failover―Part of the manual VM recovery process in the event of a disaster on the source cluster. Failover, in this context, is converting a replication snapshot on the target cluster into a working VM.
Interval―Part of the replication schedule configuration. This is how often the protected VMs replication snapshot is taken and copied to the target cluster.
Local cluster―One of a VM replication cluster pair. The cluster you are currently logged into through HX Connect. From the local cluster, you configure replication protection for locally resident VMs. The VMs are then replicated to the paired remote cluster.
Migration―A routine system maintenance and management task where a recent replication snapshot copy of the VM becomes the working VM. The replication pair of source and target cluster does not change.
Primary cluster―An alternative name for the source cluster in VM disaster recovery.
Protected virtual machine―A VM that has replication configured. Protected VMs:
Reside on a datastore in the local cluster of a replication pair.
Have a replication schedule configured either individually or through a protection group.
Protection group―A means to apply the same replication configuration on a group of VMs.
Recovery process―Manual process to recover protected VMs in the event the source cluster fails or a disaster occurs.
Recovery test―A maintenance task that ensures the recovery process would be successful in the event of a disaster.
Remote cluster―One of a VM replication cluster pair. The remote cluster receives the replication snapshots from the local cluster's protected VMs.
Replication pair―Two clusters that together provide a remote cluster location for storing replication snapshots of local cluster VMs.
Clusters in a replication pair can be both a remote or local cluster. Both clusters in a replication pair can have resident VMs. Each cluster is local to its resident VMs. Each cluster is remote to the VMs that reside on the paired local cluster.
Replication snapshot―Part of the replication protection mechanism. A type of snapshot taken of the protected VM. Copied from the local cluster to the remote cluster.
Secondary cluster―An alternative name for the target cluster in VM disaster recovery.
Source cluster―One of a VM recovery cluster pair. The source cluster is where the protected VMs reside.
Target cluster―One of a VM recovery cluster pair. The target cluster receives the replication snapshots from the source cluster's VMs. The target cluster is used to recover the VMs in the event of a disaster on the source cluster.