Interconnecting Cloud Data Centers can be a complex undertaking for Enterprises and SP’s. Enabling business critical applications to operate across or migrate between metro/geo sites impacts each Tier of the Cloud Data Center as described in Figure 2-1. Customers require a validated end-to-end DCI solution that integrates Cisco’s best in class products at each tier, to address the most common Business Continuity and workload mobility functions. To support workloads that move between geographically diverse data centers, VMDC DCI provides Layer 2 extensions that preserve IP addressing, extended tenancy and network containers, a range of stateful L4-L7 services, extended hypervisor geo-clusters, geo-distributed virtual switches, distributed storage clusters, different forms of storage replication (synchronous and asynchronous), geo-extensions to service orchestration tools, IP path optimization to redirect users to moved VMs and workloads, and finally, support across multiple hypervisors. The cumulative impact of interconnecting data centers is significant and potentially costly for SPs and Enterprises. Lack of technical guidance and best practices for an “end-to-end” business continuity solution is a pain point for customers that are not staffed to sift through these technical issues on their own. In addition, multiple vendors and business disciplines are required to design and deploy a successful business continuity and workload mobility solution. VMDC DCI simplifies the design and deployment process by providing a validated reference design for each tier of the Cloud Data Center.
Figure 2-1 Extending Cloud Data Centers Across Infrastructure Tiers
The VMDC DCI design uses the following definitions to assess the overall cost of a recovery time resulting from workload mobility or a recovery plan:
The Business Criticality of an application will define an acceptable RPO and RTO target in the event of a planned or unplanned outage. (Figure 2-2)
Figure 2-2 RPO and RTO Definitions
Achieving necessary recovery objectives involves diverse operations teams and an underlying Cloud infrastructure that has been built to provide business continuity and workload mobility. Each application and infrastructure component has unique mechanisms for dealing with mobility, outages, and recovery. The challenge of an end-to-end cloud data center solution is to combine these methods in a coherent way so as to optimize the recovery/mobility process across metro and geo sites, and reduce the overall complexity for operations teams. This is the ultimate goal of the VMDC DCI solution.
A critical component of a successful DCI strategy is to align the business criticality of an application with a commensurate infrastructure design that can meet those application requirements. Defining how an application or service outage will impact Business will help to define an appropriate redundancy and mobility strategy. A critical first step in this process is to map each application to a specific Critically Level as described in Figure 2-3.
Figure 2-3 Application Criticality Levels
Industry standard application criticality levels range from Mission Imperative (C1) in which any outage results in immediate cessation of a primary business function, therefore no downtime or data loss is acceptable, to Business Administrative (C5) in which a sustained outage has little to no impact on a primary business function. Applications representing more Business Critical functions (C1-C3) typically have more stringent RTO/RPO targets than those toward the bottom of the spectrum (C4-C5). In addition, most SP and Enterprise Cloud Providers have applications mapping to each Criticality Level. A typical Enterprise distribution of applications described above shows roughly 20% of applications are Mission Imperative and Mission Critical (C1, C2) and the remainder of applications fall into lower categories of Business Critical, Business Operational, and Business Administrative (C3-C5). The VMDC Cloud Data Center must therefore accommodate different levels and provide Business Continuity and workload mobility capabilities to support varied RPO/RTO targets.
It important to note that even a relatively outage (less than one hour) can have a significant business impact to enterprises and service providers. Figure 2-4 describes the typical Recovery Point Objective (RPO) requirements for different enterprises. In this study, 53% of Enterprises will have significant revenue loss or business impact if they experience an outage of just one hour of Tier-1 data (Mission Critical data). In addition, 48% of these same enterprises will have a significant revenue loss or business impact if they experience an outage of less than 3 hours of Tier-2 data (Business Critical data). Even tighter RPO requirements are applicable to SP Cloud Providers. Enterprise and SP Cloud Providers have a strong incentive to implement Business Continuity and workload mobility functions to protect critical workloads and support normal IT operations. VMDC DCI provides a validated framework to achieve these goals within Private Clouds, Public Clouds, and Virtual Private Clouds.
Figure 2-4 Typical Enterprise RPO Requirements1
VMDC DCI implements a reference architecture that meets two of the most common RPO/RTO targets identified across Enterprise Private Clouds and SP Private/Public Clouds. The two RPO/RTO target use cases are described in Figure 2-5. The first case covers an RTO/RPO target of 0 to 15 minutes which addresses C1 and C2 criticality levels. Achieving near zero RTO/RPO requires significant infrastructure investment, including synchronous storage replication, Live VM migrations with extended clusters, LAN extensions, and metro services optimizations. Achieving near zero RTO/RPO typically requires 100% duplicate resources at the recovery site, representing the most capital intensive business continuity/workload mobility option. The second use case covers an RPO/ RTO target of more than 15 minutes which addresses Critically Levels C3 and C4. Achieving a 15 minute target is less costly, less complex, and can utilize a many-to-one resource sharing model at the recovery site.
Figure 2-5 Validated RPO/RTO Targets
To cover both of these recovery targets, the VMDC DCI design must support two operational models. The first operational model, Active-Active metro design, is derived from two physical sites spanning a metro distance, operating as a single Logical Data Center. The second operational model represents a more traditional Active-Backup metro/geo Design, where two independent data centers provide recovery and workload mobility functions across both metro and geo distances. A brief description of both VMDC DCI options is provided below.
The active-active metro design is described in Figure 2-6. This model provides DCI extensions between two metro sites, operating together as a single Logical Data Center. This design accommodates the most stringent RTO/RPO targets for Business Continuity and Workload Mobility. This model supports applications that require live workload mobility, near zero RTO/RPO, stateful services, and a synchronous storage cluster across a metro distance.
Figure 2-6 Active-Active Metro Design
Applications mapped to this infrastructure may be distributed across metro sites and also support Live Workload mobility across metro sites. Distributed applications and Live Workload Mobility typically requires stretched clusters, LAN extensions, and synchronous storage replication, as described in Figure 2-7. DCI extensions must also support Stateful L4-L7 Services during workload moves, preservation of network QoS and tenancy across sites, and virtual switching across sites. A single Operational domain with Service Orchestration is typically used to manage and orchestrate multiple data centers in this model.
Figure 2-7 Distributed Clusters and Live Workload Mobility
The key VMDC DCI design choices for the Active-Active metro design are described in Figure 2-8.
Figure 2-8 Active-Active Metro Design Choices
The second model, Active-Backup metro/geo Design represents a more traditional primary/backup redundancy design, where two independent data centers provide recovery and workload mobility functions across both metro and geo distances, as described in Figure 2-9. This model can address less stringent RTO/RPO targets, where applications require Cold workload mobility/recovery in which applications and corresponding network services are restarted at the recovery location.
Figure 2-9 Active-Backup Metro/Geo Design
This Business Continuity and Workload Mobility design is best suited for moving or migrating “stopped workloads” between different Cloud data centers as described in Figure 2-10. These less stringent RPO/RTO requirements enable the participating data center to span a geo distance of more than 200 km. In this model, LAN extensions between data centers is optional, but may be necessary for operators that need to preserve to IP addressing for applications and services. In addition, Asynchronous data replication used to achieve less stringent RPO/RTO targets.
Figure 2-10 Migrating Stopped Workloads
The key VMDC DCI design choices for the Active-Backup metro/geo design are described in Figure 2-11.
Figure 2-11 Active-Backup Metro/Geo Design Choices
It is important to note that BOTH of these design options are typically required by Enterprises and SPs, to address their wide range of applications in a cost efficient way. Therefore, VMDC DCI integrates the Active-Active metro design and the Active-Backup metro/geo design into a single Cloud data center that can be used to provide Business Continuity and Workload Mobility for a wide range applications and RPO/RTO targets.
Based on the recent survey sited in the Figure 2-12, almost half of all Enterprises have their primary backup facility within a 250 mile distance. As a result, most Enterprises can therefore implement both metro and geo business continuity and workload models across their current data center locations. Large Tier 1 Service Providers and Enterprises typically span longer distances and many regions.
Figure 2-12 Typical Enterprise Geo-Redundancy2
Top level use cases validated in VMDC DCI are mapped to one of the following design choices:
VMDC DCI used the following design parameters in the Active-Active metro design.
Live Workload Mobility can Solve Specific Business Problems
Hypervisor tools utilized to implement Live Workload Mobility
Metro Data Center Infrastructure to support Live Workload Mobility
– Simplified LAN extensions using Overlay Transport Virtualization (OTV) is used to preserve IP addressing of applications and support Live migrations
– Virtual switches distributed across metro data centers
– Tenant Containers spanning multiple sites
– Maintain traffic QoS and packet markings across metro networks
– Support a combination of services hosted physical appliances, as well as virtual services hosted on the UCS
– Minimize traffic tromboning between metro data centers
– Multiple UCS systems across metro DC's to support workload mobility
– Distributed storage clusters spanning metro data centers
Figure 2-13 shows a typical live migration of an active workload. Each tier of data center is impacted by this use case.
Figure 2-13 Live Workload Mobility
VMDC DCI used the following design parameters in the Active-Standby metro/geo design.
Cold Workload Mobility can solve specific Business problems
Hypervisor tools utilized to implement Cold Workload Mobility
Metro/Geo Data Center Infrastructure to support Cold Workload Mobility
– Simplified LAN extensions using Overlay Transport Virtualization (OTV) is used to preserve IP addressing of applications
– Multiple UCS systems utilized to house moved workloads at the recovery site
– Create new tenant containers at recovery site to support the moved workloads
– New network containers and services created at new site
– Traffic tromboning between metro DCs can be avoided in many cases
– Virtual Volumes silo’d to each DC
Figure 2-14 shows the different infrastructure components involved in the cold migration of a stopped workload. Each tier of data center is impacted by this use case.
Figure 2-14 Components of Stopped Workload Cold Migration
Top lever use components validated in VMDC DCI are mapped to one of the following design choices:
The Active-Active metro design used in the VMDC DCI system is included in Figure 2-15. The physical sites are separated by a metro distance of 75 Km. Layer 2 LAN extensions are included to support multi-site hypervisor clusters, stretched network containers, and preservation of IP addressing for workloads. Storage is extended between sites to support active-active clusters and synchronous storage replication. Asynchronous storage replication between sites is also provided for less business critical applications.
Figure 2-15 Active-Active Metro Design Topology
The Active-Backup metro/geo Design validated in the VMDC DCI system is included in Figure 2-16. The physical sites are separated by a geo Distance of 1000 Km. Layer 2 LAN extensions are optional. Storage is contained to each site. Asynchronous storage replication provides long distance data replication between sites.
Figure 2-16 Active-Backup Metro/Geo Design Topology
Table 2-1 and Table 2-2 list product components for Cisco and Partners, respectively.
UCS B-200-M3s /M2 |
|