A comprehensive business continuity plan mandates the deployment of multiple data centers located far enough apart to protect against regional power failures and disasters. Synchronous remote data replication is the appropriate solution for organizations seeking the fastest possible data recovery, minimal data loss, and protection against database integrity problems. However, application performance is affected by the distance and latency between the data centers, which restricts the location of the data centers.
The main drawback of synchronous replication is its distance limitation. Fibre Channel, the primary enterprise storage transport protocol, is limited only by its physical-layer flow-control mechanism. However, latency becomes a problem because propagation delays lengthen with increased distance. Propagation delays can significantly slow down a system by forcing it to wait for confirmation of each storage operation at local and the remote sites. This means that the practical distance for synchronous replication is about 100 to 200 km, or 60 to 120 miles, depending on the application response time tolerance and other factors.
® MDS 9000 I/O Accelerator (IOA) software offers traffic acceleration services capable of dramatically reducing application response time and increasing the distance for synchronous replication solutions. This document describes a joint Cisco and Hitachi Data Systems (HDS) test and shows how Cisco MDS 9000 IOA enhances the performance of the Hitachi TrueCopy Synchronous replication solution.
Hitachi TrueCopy Remote Replication software provides a continuous, nondisruptive, host-independent remote data replication solution for disaster recovery or data migration over any distance. Within a metropolitan area, Hitachi TrueCopy Synchronous software provides a no-data-loss, rapid-restart solution. Hitachi TrueCopy Synchronous software is available for every Hitachi storage platform. For enterprise environments, Hitachi TrueCopy Synchronous software, combined with Hitachi Universal Replicator and the Hitachi Universal Storage Platform, supports three advanced data center configurations for optimal data protection.
The recovery point objective (RPO)-the amount of data loss measured in the time allowed-and recovery time objective (RTO)-the amount of time that the business needs to recover during a disaster-determines the type of replication deployed. Synchronous data replication provides the best RPO, but it requires that the data centers to be located close enough to maintain a consistent application performance. Hitachi TrueCopy is offered with several options for different service-level requirements: synchronous mirroring for zero data exposure, and asynchronous mode for unlimited distance.
In the case of Hitachi TrueCopy Synchronous, the TrueCopy initiator waits until the TrueCopy target successfully stores the data and acknowledges the current outstanding I/O. This process helps ensure that both data centers have the same data at the same time, enabling zero data loss and the best RPO. However, the distance between data centers may not meet the requirements of the business continuity and disaster recovery (BC/DR) plan.
Asynchronous replication supports greater distance by not waiting for the secondary storage array to complete the I/O operation before completing the I/O operation at the application level. Therefore, the secondary data center lags behind the primary data center by a fixed amount of time, and the data is not the same at both data centers at any given point in time. This approach implies that data may be lost if a failure occurs.
Therefore, a method to reduce application response time or increase distance is required for comprehensive BC/DR plans that require Hitachi TrueCopy Synchronous replication to meet RPO and RTO objectives.
Figure 1 shows the effect of latency in the communication channel on the time taken to complete the I/O operation during a Small Computer System Interface (SCSI) write operation. The time added to the net execution time of the operation is at least twice the round-trip time (RTT) delay between the host and the target, because of the transfer of the command, the Transfer Ready (XferReady) message, the data, and the status.
Cisco MDS 9000 Family traffic acceleration services mitigate the effect of latency by proxying XferReady and allowing the data to be sent on the line without waiting for the target XferReady message to be transferred all the way back to the host, as shown in the Figure 2. To preserve data integrity, the status message is not proxied. The net effect is that the application response time can be reduced by up to 50 percent at the same distance, or the distance between data centers can be increased by up to 100 percent with the same latency.
Cisco MDS 9000 Family switches offer traffic acceleration services capable of dramatically reducing application response time and increasing the distance for Hitachi TrueCopy Synchronous replication solutions.
Fibre Channel over IP Write Acceleration (FCIP-WA) software offers a cost-effective write acceleration solution for replication over FCIP and is included as part of the Cisco MDS SAN Extension license.
Cisco MDS 9000 IOA provides increased scalability and is a comprehensive next-generation, transport-independent acceleration solution for both Fibre Channel and FCIP. The Cisco MDS 9000 IOA industry-leading, highly resilient, cluster technology, with integrated compression and an error recovery mechanism, provides the best I/O acceleration service in the industry.
Figure 3 shows a typical collapsed-core deployment of Hitachi TrueCopy Synchronous replication with Cisco MDS 9000 IOA. Cisco MDS 9000 IOA is deployed on the same switch as both the application hosts and storage. This setup was used for the test examples described in this document.
Alternatively, IOA can be deployed in a core-edge or edge-core-edge topology because of the Cisco MDS 9000 Family Fibre Channel Redirect (FCR) function that transparently reroutes traffic in the switch fabric to the Cisco MDS 9000 IOA engine.
Figure 3. Hitachi TrueCopy Deployment with Cisco MDS 9000 IOA (Only SAN A Is Shown)
Although Cisco MDS 9000 IOA eliminates one RTT associated with the write command and improves application throughout, the actual performance improvement depends on the following factors:
• Distance between data centers (sites)
• Size of I/O write operations
• Number of outstanding I/O operations maintained by the applications
Distance between the data centers increases the application response time, and hence any RTT savings has a positive effect on throughout. The effect of distance is somewhat masked by larger I/O sizes. Larger I/O sizes can use more of the physical link and may not have as much of an effect on application performance. However, I/O sizes are generally dictated by the application and its performance requirements.
The effect of distance is more severe if the number of outstanding I/O operations between the initiator and the target is small. In the case of Hitachi TrueCopy Synchronous replication, there is only one outstanding I/O operation, which means that the initiator does not send the next write command until it gets a response from the previous write command. In asynchronous replication, there can be many outstanding I/O operations. This means that multiple write operations may be performed at the same time, better utilizing the physical link. Hence, the performance improvement for Hitachi TrueCopy Synchronous replication is greater than for asynchronous replication.
To demonstrate the performance characteristics of Cisco MDS 9000 IOA, tests using an I/O meter with SCSI write-only traffic were performed. Performance was measured in terms of relative increases in application throughput and decreases in application response time over different distances and I/O sizes. These tests were performed accelerating the traffic flow between two HDS Universal Storage Platform V storage arrays running Hitachi TrueCopy Synchronous replication for 16 remote data facility (RDF) pairs.
Figure 4 shows the effect of distance on application throughout and demonstrates that Cisco MDS 9000 IOA eliminates the negative performance impact of increased distance on application throughput and helps reduce application response time for a given distance. Note that similar improvements are seen with Hitachi TrueCopy Synchronous deployed with FCIP-WA.
Figure 4. Cisco MDS 9000 Family Acceleration Services Mitigate Effect of Distance on Application Throughput and Latency (I/O Size = 256 KB)
Figure 5 shows the effect of distance and how Cisco MDS 9000 IOA performs with varying distances and I/O sizes. With Cisco MDS 9000 IOA, the performance improvement is greater with larger I/O sizes; however, the improvement is significant even at the smaller I/O sizes (50 to 70 percent improvements in application throughput compared to the case without Cisco MDS 9000 IOA).
Figure 5. Performance Improvement Depends on I/O Size
• One Cisco MDS 9000 IOA engine is required per site per fabric. However, the use of two engines per site per fabric is recommended for increased redundancy and capacity. The number of Cisco MDS 9000 IOA engines required also depends on application throughout. Each engine can provide 10 Gbps of raw application throughout.
• Native Fibre Channel is typically the transport of choice for the data center interconnect (DCI) when running synchronous replication between storage arrays. Depending on the application response requirements, FCIP may be used.
• Multiple paths for DCI provides resiliency after physical link failures. PortChannels-logical links composed of more than one physical link-provide the best link availability.
The tests clearly show that Cisco MDS 9000 IOA enables increased distance between data centers for Hitachi TrueCopy Synchronous replications by reducing latency caused by distance. Hence, it removes location restrictions for the data center, enabling more flexible data center location.