FlexPod is a leading converged infrastructure, supporting a broad range of enterprise workloads and use cases including virtualization hypervisors and bare metal operating systems. FlexPod is a pre-validated architecture adhering to best practices and built on the Cisco Unified Computing System (Cisco UCS), the Cisco Nexus® family of switches, and NetApp storage systems. The FlexPod architecture is highly modular, delivers a baseline configuration, and also has the flexibility to be sized and optimized to accommodate many different use cases and the requirements of the next-generation data center.
FlexPod SolidFire (FlexPod SF) is a specific variant of the FlexPod family, leveraging NetApp SolidFire storage, while adhering to configuration best practices with Nexus and Cisco UCS components. The FlexPod architecture can serve as the foundation for a both scale up (adding additional resources within a FlexPod unit) and scale out (adding additional FlexPod units). The FlexPod SF with Red Hat OpenStack Platform 10 validated design is an extension to FlexPod portfolio, providing guidance for OpenStack environments.
Red Hat OpenStack Platform 10, engineered with Red Hat hardened OpenStack Newton code delivers a stable release for production scale environment. Red Hat OpenStack Platform 10 adopters have an advantage of immediate access to bug fixes and critical security patches, tight integration with Red Hat’s enterprise security features including SELinux, and a steady release cadence between OpenStack versions.
Cisco Unified Computing System is a next-generation datacenter platform that unifies computing, networking, storage access, and virtualization into a single cohesive system, which makes Cisco UCS an ideal platform for OpenStack architecture. Combination of Cisco UCS platform and Red Hat OpenStack Platform architecture accelerates your IT transformation by enabling faster deployments, greater flexibility of choice, efficiency, and lower risk. Furthermore, Cisco Nexus family of switches provides the network foundation for next-generation datacenter.
Automation, virtualization, cost, and ease of deployment are the key criteria to meet the growing IT challenges. Virtualization is a key and critical strategic driver for reducing the Total Cost of Ownership (TCO), achieving better hardware and software resource utilization, offering to host more applications, and becoming more secure. The platform should be flexible, reliable and cost effective for multiple enterprise applications.
This document describes design framework used to deploy Red Hat OpenStack platform 10 on Cisco and NetApp® FlexPod® converged infrastructure with the NetApp SolidFire® solution. NetApp’s SolidFire is a block-based all flash storage solution designed for highly virtualized and automated environments. This implementation provides a very simplistic yet fully integrated and validated infrastructure to deploy VMs of various sizes to suit your application needs. FlexPod provides a very agile platform which allows server hardware to be abstracted into Cisco UCS service profiles framework.
FlexPod SF is equipped with programmable interfaces that allows automation of Red Hat OpenStack platform deployment. The Red Hat OpenStack Platform Director is an automation tool for installing and managing a complete OpenStack environment. It is based primarily on TripleO (OpenStack on OpenStack). TripleO takes ownership to install a fully operational OpenStack environment. This includes new OpenStack components that provision and control bare metal systems to use as OpenStack nodes. This provides a simple method for installing a complete Red Hat OpenStack Platform environment with a rich user experience.
NetApp SolidFire Cinder driver is seamlessly integrated within Red Hat OpenStack platform, thereby enabling effortless automated storage provisioning. The NetApp SolidFire Cinder driver also exposes the underlying enterprise storage features such as per volume QoS and seamless scalability to OpenStack Cinder volumes. Additionally, all the infrastructure hosts are booted from SAN (iSCSI) for non-disruptive operations and upgrade capabilities.
As shown in Figure 1, FlexPod SF is deployed with Cisco UCS B-Series servers, Nexus switches, and NetApp SolidFire SF9608 nodes.
Figure 1 FlexPod SF Components
The audience for this document includes, but is not limited to, sales engineers, field consultants, professional services, IT managers, partner engineers, IT architects, and customers who want to take advantage of an infrastructure that is built to deliver IT efficiency and enable IT innovation. The reader of this document is expected to have the necessary training and background to install and configure Red Hat Enterprise Linux, Cisco Unified Computing System, and Cisco Nexus Switches, NetApp storage as well as high-level understanding of OpenStack components. External references are provided where applicable and it is recommended that the reader be familiar with these documents.
Readers are also expected to be familiar with the infrastructure, network, and security policies of the customer installation.
This document describes Red Hat OpenStack Platform 10, which is based on “Newton” OpenStack release, built on the FlexPod SF from Cisco and NetApp. This document discusses design choices and best practices of deploying the shared infrastructure comprised of Cisco UCS, Nexus, and NetApp SolidFire All-Flash Array.
The FlexPod SF solution is designed with the next generation of Fabric Interconnect which provide 40Gbe ports. This solution is designed with 40Gbe network connectivity between compute and 10Gbe storage network. NetApp’s SolidFire storage array provides storage for boot disk to all the servers, Cinder volumes and LUNs for OpenStack Swift service.
FlexPod SF is designed for customers experiencing specific challenges that are not satisfied by traditional architectures. The five principles of a successful next-generation data center (NGDC) include the following:
· The ability to scale-out offers seamless, transparent resource expansion without the cost and complexity of traditional infrastructure migrations.
· Quality-of-service resource controls must be used across the entire infrastructure or any guarantee is only as good as the weakest link.
· Automation across the stack is vital in the NGDC, where speed and innovation rule.
· Data assurance is achieved when enterprise architects plan for failure while reducing its likelihood.
· Global efficiencies are created by leveraging pools of compute and storage and allocating them for common workloads.
The FlexPod® solution portfolio combines NetApp® storage systems, Cisco® Unified Computing System servers, and Cisco Nexus fabric into a single, flexible architecture. FlexPod architectures can scale up for greater performance and capacity or scale out for environments that require consistent, multiple deployments. FlexPod provides the following:
· Converged infrastructure of compute, network, and storage components from Cisco and NetApp
· Is a validated enterprise-class IT platform
· Rapid deployment for business critical applications
· Reduces cost, minimizes risks, and increases flexibility and business agility
· Scales up or out for future growth
· Scale with predictable performance and capacity with NetApp SolidFire
This solution is based on OpenStack “Newton” release hardened and streamlined by Red Hat OpenStack Platform 10. The advantages of Cisco Unified Computing System, NetApp SolidFire storage system, and Red Hat OpenStack Platform combine to deliver OpenStack Infrastructure as a Service (IaaS) deployment that is quick and easy to deploy.
FlexPod SF with Red Hat OpenStack Platform helps IT organizations accelerate cloud deployments while retaining control and choice over their environments with open and inter-operable cloud solutions. FlexPod SF with Red Hat OpenStack Platform 10 offers high availability from both hardware perspective and software perspective (OpenStack services running in a HA manner). All the storage requirements for the OpenStack deployment (storage for Cinder, Swift) are satisfied by NetApp storage systems.
The solution offers redundant architecture from a compute, network, and storage perspective. The solution consists of the following key components:
· Cisco Unified Computing System (Cisco UCS)
· Cisco UCS Manager
· Cisco UCS 6300 Series Fabric Interconnects
· Cisco 2304XP IO Module or Cisco UCS Fabric Extenders
· Cisco B200 M5 Servers
· Cisco VIC 1340
· Cisco Nexus 9300 Series Switches
· NetApp SolidFire SF9608 All-Flash Array
· Red Hat Enterprise Linux 7.4
· Red Hat OpenStack Platform 10 (Newton)
FlexPod SF System Overview
FlexPod SF is a best practice next-gen datacenter architecture that includes these components:
· Cisco Unified Computing System (Cisco UCS)
· Cisco Nexus switches
· NetApp SolidFire storage arrays
These components are connected and configured according to best practices of both Cisco and NetApp, and provide the ideal platform for running a variety of enterprise workloads with confidence. As previously mentioned, the reference architecture covered in this document leverages the Cisco Nexus 9000 Series switch and NetApp SolidFire. One of the key benefits of FlexPod SF is the ability to maintain consistency at scaling, including scale up and scale out. Each of the component families shown in Figure 1 (Cisco Unified Computing System, Cisco Nexus, and NetApp storage systems) offers platform and resource options to scale the infrastructure up or down, while supporting the same features and functionality that are required under the configuration and connectivity best practices of FlexPod SF.
As customers transition toward shared infrastructure or cloud computing they face a number of challenges such as initial transition hiccups, return on investment (ROI) analysis, infrastructure management and future growth plans. The FlexPod SF architecture is designed to help with proven guidance and measurable value. By introducing standardization, FlexPod SF helps customers mitigate the risk and uncertainty involved in planning, designing, and implementing a new datacenter infrastructure. The result is a more predictive and adaptable architecture capable of meeting and exceeding customers' IT demands. FlexPod SF benefits NGDC use cases by providing:
· Multitenant and multi-app shared infrastructure delivering guaranteed performance
· Programmable infrastructure producing cloud-like agility
· Support for applications born in the cloud
· Flash-enabled storage for performance intensive workloads and consolidation
Cisco and NetApp have thoroughly validated and verified the FlexPod SF solution architecture and its many use cases while creating a portfolio of detailed documentation, information, and references to assist customers in transforming their datacenters to this shared infrastructure model. This portfolio includes, but is not limited to the following items:
· Best practice architectural design
· Workload sizing and scaling guidance
· Implementation and deployment instructions
· Technical specifications (rules for FlexPod configuration dos and don’ts)
· Frequently asked questions (FAQs)
· Cisco Validated Designs (CVDs) and NetApp Verified Architectures (NVAs) focused on a variety of use cases
Cisco and NetApp have also built a robust and experienced support team focused on FlexPod solutions, from customer account and technical sales representatives to professional services and technical support engineers. The Co-operative Support Program extended by NetApp, Cisco and Redhat provides customers and channel service partners with direct access to technical experts who collaborate with cross vendors and have access to shared lab resources to resolve potential issues. FlexPod supports tight integration with virtualized and cloud infrastructures, making it a logical choice for long-term investment. The following IT initiatives are addressed by the FlexPod solution.
FlexPod SF is a pre-validated infrastructure that brings together compute, storage, and network to simplify, accelerate, and minimize the risk associated with datacenter builds and application rollouts. These integrated systems provide a standardized approach in the datacenter that facilitates staff expertise, application onboarding, and automation as well as operational efficiencies relating to compliance and certification.
FlexPod SF is a highly available and scalable infrastructure that IT can evolve over time to support multiple physical and virtual application workloads. FlexPod SF has no single point of failure at any level, from the server through the network, to the storage. The fabric is fully redundant and scalable, and provides seamless traffic failover, should any individual component fail at the physical or virtual layer.
FlexPod SF using the NetApp SolidFire Element OS is a scale-out, all-flash storage platform designed for large-scale infrastructure. SolidFire is unique in its ability to manage storage performance independent of capabilities. FlexPod SF guarantees the performance of not just one application, but of thousands of applications within a single storage platform.
FlexPod SF addresses four primary design principles:
· Application availability: Makes sure that services are accessible and ready to use.
· Scalability: Addresses increasing demands with appropriate resources.
· Flexibility: Provides new services or recovers resources without requiring infrastructure modifications.
· Manageability: Facilitates efficient infrastructure operations through open standards and APIs.
IT organizations are rapidly adopting a cloud services model for all IT services. OpenStack is an open-source virtualized infrastructure and management framework that offers a variety compute, storage, and networking services with a common API layer. This framework helps customers deliver self-service capabilities to end-users and streamlines management and operations for IT staff. The Red Hat OpenStack platform is an industry-leading distribution of OpenStack that delivers the power and flexibility of OpenStack along with enterprise-class support.
FlexPod SF is the ideal physical platform for cloud infrastructures, offering unmatched flexibility, scalability, and nondisruptive operations. SolidFire storage systems were designed for cloud architectures from their inception. They are fully integrated with OpenStack to allow the management of all capabilities from the OpenStack management interfaces.
Features such as robust quality of service (QoS), rapid cloning, and dynamic account creation allow administrators to manage storage resources effectively while still allowing end users the flexibility to provision and manage their own environments. Cisco UCS servers have high-density CPU and memory options and policy-based management that enables highly efficient provisioning and management of compute resources. Cisco Nexus switches include high-bandwidth interconnect capabilities, and both Cisco UCS and Cisco Nexus systems have certified OpenStack Neutron drivers to enable dynamic network configuration from the OpenStack interface.
NetApp and Cisco have partnered with Red Hat to integrate the installation of all necessary drivers and services into Red Hat OpenStack, enabling streamlined deployment and lifecycle management using OpenStack Director. Once deployment is complete, FlexPod SF with Red Hat OpenStack Platform 10 delivers a highly-flexible and automated infrastructure.
The OpenStack framework is primarily used to deliver IaaS. This reference architecture showcases the following features and capabilities that enable IaaS for production as well as development and test environments:
· Block storage services using Cinder
· SolidFire storage using the iSCSI protocol
· QoS integrated with OpenStack
· Cinder storage services catalog using extra specifications and volume types
· Storage multitenancy integrated with OpenStack
· Thin provisioning
· Rapid cloning
· Seamless scalability
· High availability (HA) deployment of OpenStack services using Pacemaker as per best practices
· Integration with Red Hat OpenStack Director for lifecycle management
· Live migration of Nova compute instances
· Neutron network integration with Cisco UCS and Nexus OpenStack drivers
Cisco Unified Computing System™ (Cisco UCS) is a next-generation data center platform that integrates computing, networking, storage access, and virtualization resources into a cohesive system designed to reduce total cost of ownership (TCO) and increase business agility. The system integrates a low-latency; lossless 10 or 40 Gigabit Ethernet unified network fabric with enterprise-class, x86-architecture servers. The system is an integrated, scalable, multi-chassis platform where all resources are managed through a unified management domain.
The Cisco Unified Computing System consists of the following components:
· Compute - The system is based on an entirely new class of computing system that incorporates rack mount and blade servers based on latest Intel’s x86 Processors.
· Network - The system is integrated onto a low-latency, lossless, 10Gbps or 40Gbps unified network fabric. This network foundation consolidates Local Area Networks (LAN’s), Storage Area Networks (SANs), and high-performance computing networks which are separate networks today. The unified fabric lowers costs by reducing the number of network adapters, switches, and cables, and by decreasing the power and cooling requirements.
· Virtualization - The system unleashes the full potential of virtualization by enhancing the scalability, performance, and operational control of virtual environments. Cisco security, policy enforcement, and diagnostic features are now extended into virtualized environments to better support changing business and IT requirements.
· Storage access - The system provides consolidated access to both SAN storage and Network Attached Storage (NAS) over the unified fabric. It is also an ideal system for Software defined Storage (SDS). Combining the benefits of single framework to manage both the compute and Storage servers in a single pane, Quality of Service (QOS) can be implemented if needed to inject IO throttling in the system. In addition, the server administrators can pre-assign storage-access policies to storage resources, for simplified storage connectivity and management leading to increased productivity.
· Management - the system uniquely integrates all system components to enable the entire solution to be managed as a single entity by the Cisco UCS Manager. The Cisco UCS Manager has an intuitive graphical user interface (GUI), a command-line interface (CLI), and a powerful scripting library module for Microsoft PowerShell built on a robust application programming interface (API) to manage all system configuration and operations.
Cisco Unified Computing System fuses access layer networking and servers. This high-performance, next-generation server system provides a data center with a high degree of workload agility and scalability.
Cisco UCS Manager
Cisco Unified Computing System Manager (Cisco UCS Manager) provides unified, embedded management for all software and hardware components in the Cisco UCS. Using Single Connect technology, it manages, controls, and administers multiple chassis for thousands of virtual machines. Administrators use the software to manage the entire Cisco Unified Computing System as a single logical entity through an intuitive GUI, a command-line interface (CLI), or an XML API. The Cisco UCS Manager resides on a pair of Cisco UCS 6200 Series Fabric Interconnects using a clustered, active-standby configuration for high-availability.
Cisco UCS Manager offers unified embedded management interface that integrates server, network, and storage. Cisco UCS Manager performs auto-discovery to detect inventory, manage, and provision system components that are added or changed. It offers comprehensive set of XML API for third part integration, exposes 9000 points of integration and facilitates custom development for automation, orchestration, and to achieve new levels of system visibility and control.
Service profiles benefit both virtualized and non-virtualized environments and increase the mobility of non-virtualized servers, such as when moving workloads from server to server or taking a server offline for service or upgrade. Profiles can also be used in conjunction with virtualization clusters to bring new resources online easily, complementing existing virtual machine mobility.
For more Cisco UCS Manager Information, refer to: http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-manager/index.html.
Cisco UCS Fabric Interconnects
The Fabric interconnects provide a single point for connectivity and management for the entire system. Typically deployed as an active-active pair, the system’s fabric interconnects integrate all components into a single, highly-available management domain controlled by Cisco UCS Manager. The fabric interconnects manage all I/O efficiently and securely at a single point, resulting in deterministic I/O latency regardless of a server or virtual machine’s topological location in the system.
Cisco UCS 6200 Fabric Interconnects
Cisco UCS 6200 Series Fabric Interconnects support the system’s 80-Gbps unified fabric with low-latency, lossless, cut-through switching that supports IP, storage, and management traffic using a single set of cables. The fabric interconnects feature virtual interfaces that terminate both physical and virtual connections equivalently, establishing a virtualization-aware environment in which blade, rack servers, and virtual machines are interconnected using the same mechanisms. The Cisco UCS 6248UP is a 1-RU Fabric Interconnect that features up to 48 universal ports that can support 80 Gigabit Ethernet, Fiber Channel over Ethernet, or native Fiber Channel connectivity.
For more information, visit the following link: http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-6200-series-fabric-interconnects/index.html.
Figure 2 Cisco UCS 6248UP Fabric Interconnect
Cisco UCS 6300 Series Fabric Interconnects
Cisco UCS 6300 Series Fabric Interconnects provide a 40Gbps unified fabric with double the switching capacity of the Cisco UCS 6200 Series and support higher workload density. When combined with the newer 40Gbps Cisco UCS 2300 Series Fabric Extenders (see the following section for more information), they provide 40GbE / FCoE port connectivity to enable end-to-end 40GbE / FCoE solution. The unified ports now support 16G FC for high speed FC connectivity to SAN.
Two 6300 Fabric Interconnect models have been introduced supporting Ethernet, FCoE, and FC ports.
Cisco UCS 6332 Fabric Interconnects is a 1RU, top-of-rack switch with 32 40-Gigabit QSFP+ ports, and FCoE switch offering up to 2.56 Tbps throughput. The switch has 32x40 Gbps fixed Ethernet, and FCoE ports. This Fabric Interconnect is targeted for IP storage deployments requiring high-performance 40 Gbps FCoE connectivity to Cisco MDS switches.
Figure 3 Cisco UCS 6332 Fabric Interconnect – Front and Rear
Cisco UCS 5108 Blade Server Chassis
The Cisco UCS 5100 Series Blade Server Chassis is a crucial building block of the Cisco Unified Computing System, delivering a scalable and flexible blade server chassis. The Cisco UCS 5108 Blade Server Chassis is six rack units (6RU) high and can mount in an industry-standard 19-inch rack. A single chassis can house up to eight half-width Cisco UCS B-Series Blade Servers and can accommodate both half-width and full-width blade form factors. Four single-phase, hot-swappable power supplies are accessible from the front of the chassis. These power supplies are 92 percent efficient and can be configured to support non-redundant, N+ 1 redundant and grid-redundant configurations. The rear of the chassis contains eight hot-swappable fans, four power connectors (one per power supply), and two I/O bays for Cisco UCS 2000 Series Fabric Extenders. Two fabric extenders can be deployed to provide higher uplink bandwidth with redundancy. A passive mid-plane provides up to 80 Gbps of I/O bandwidth per server slot and can support future 40 GbE standards.
Cisco UCS 5108 blade server chassis uses a unified fabric and fabric-extender technology to simplify and reduce cabling by eliminating the need for dedicated chassis management and blade switches. The unified fabric also reduces TCO by reducing the number of network interface cards (NICs), host bus adapters (HBAs), switches, and cables that need to be managed, cooled, and powered. This architecture enables a single Cisco UCS domain to scale up to 20 chassis with minimal complexity.
For more information, please refer to the following link: http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-5100-series-blade-server-chassis/index.html.
Figure 4 Cisco UCS 5108 Blade Chassis
The enterprise-class Cisco UCS B200 M4 Blade Server extends the capabilities of the Cisco Unified Computing System portfolio in a half-width blade form factor. The Cisco UCS B200 M4 uses the power of the latest Intel® Xeon® E5-2600 v3 and v4 Series processor family CPUs with up to 1536 GB of RAM (using 64 GB DIMMs), two solid-state drives (SSDs) or hard disk drives (HDDs), and up to 80 Gbps throughput connectivity. The Cisco UCS B200 M4 Blade Server mounts in a Cisco UCS 5100 Series blade server chassis or Cisco UCS Mini blade server chassis. It has 24 total slots for registered ECC DIMMs (RDIMMs) or load-reduced DIMMs (LR DIMMs) for up to 1536 GB total memory capacity. It supports one connector for the Cisco VIC 1340 or 1240 adapters, which provides Ethernet and FCoE.
For more information, see http://www.cisco.com/c/en/us/products/servers-unified-computing/ucs-b200-m4-blade-server/index.html.
The Cisco UCS B200 M5 server is a half-width blade. Up to eight servers can reside in the 6-Rack-Unit (6RU) Cisco UCS 5108 Blade Server Chassis, offering one of the highest densities of servers per rack unit of blade chassis in the industry. You can configure the Cisco UCS B200 M5 to meet your local storage requirements without having to buy, power, and cool components that you do not need. Cisco UCS B200 M5 main features includes Up to two Intel Xeon Scalable CPUs with up to 28 cores per CPU, 24 DIMM slots for industry-standard DDR4 memory at speeds up to 2666 MHz, with up to 3 TB of total memory when using 128-GB DIMMs, Modular LAN On Motherboard (mLOM) card with Cisco UCS Virtual Interface Card (VIC) 1340, a 2-port, 40 Gigabit Ethernet, Fibre Channel over Ethernet (FCoE)–capable mLOM mezzanine adapter, Support for up to two optional GPUs, and so on. For more information about Cisco UCS B200 M5, see https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-servers/datasheet-c78-739296.html.
Figure 5 Cisco UCS B200 M5 Blade Server
Cisco UCS Fabric Extenders
The Cisco UCS Fabric extenders multiplexes and forwards all traffic from servers in a blade server chassis to a parent Cisco UCS Fabric Interconnect over a 10Gbps or 40Gbps unified fabric links. All traffic, including traffic between servers on the same chassis, or between virtual machines on the same server, is forwarded to the parent fabric interconnect, where network profiles and polices are maintained and managed by the Cisco UCS Manager. The Fabric extender technology was developed by Cisco. Up to two fabric extenders can be deployed in a Cisco UCS chassis.
The Cisco UCS Fabric Extender family is comprised of the Cisco UCS 2200 and 2300 series fabric extenders for the blade server chassis. The three available models of Cisco UCS Fabric Extenders for the blade server chassis are as follows:
· Cisco UCS 2204XP Fabric Extender has four 10 Gigabit Ethernet, FCoE-capable, SFP+ ports that connect the blade chassis to the fabric interconnect. Each Cisco UCS 2204XP has sixteen 10 Gigabit Ethernet ports connected through the mid-plane to each half-width slot in the chassis. Typically configured in pairs for redundancy, two fabric extenders provide up to 80 Gbps of I/O to the chassis.
· Cisco UCS 2208XP Fabric Extender has eight 10 Gigabit Ethernet, FCoE-capable, Enhanced Small Form-Factor Pluggable (SFP+) ports that connect the blade chassis to the fabric interconnect. Each Cisco UCS 2208XP has thirty-two 10 Gigabit Ethernet ports connected through the mid-plane to each half-width slot in the chassis. Typically configured in pairs for redundancy, two fabric extenders provide up to 160 Gbps of I/O to the chassis.
Figure 6 Cisco UCS 2204XP/2208XP Fabric Extender
· Cisco UCS 2304XP Fabric Extender has four 40 Gigabit Ethernet, FCoE-capable, Quad Small Form-Factor Pluggable (QSFP+) ports that connect the blade chassis to the fabric interconnect. Each Cisco UCS 2304XP has eight 40 Gigabit Ethernet ports connected through the mid-plane to each half-width slot in the chassis. Typically configured in pairs for redundancy, two fabric extenders provide up to 320 Gbps of I/O to the chassis.
Figure 7 Cisco UCS 2304 Fabric Extender
For more information, see Cisco UCS 2200 and 2300 Series Fabric Extenders.
The Cisco UCS Virtual Interface Card (VIC) 1340 is a 40-Gbps Ethernet or 4 x 10-Gbps Ethernet, FCoE-capable modular LAN on motherboard (mLOM) designed exclusively for the Cisco UCS B200 Series Blade Servers. When used in combination with an optional port expander, the Cisco UCS VIC 1340 capabilities provides an additional 40Gbps of uplink bandwidth. The Cisco UCS VIC 1340 enables a policy-based, stateless, agile server infrastructure that can present over 256 PCIe standards-compliant interfaces to the host that can be dynamically configured as either network interface cards (NICs) or host bus adapters (HBAs). In addition, the Cisco UCS VIC 1340 supports Cisco® Virtual Machine Fabric Extender (VM-FEX) technology, which extends the Cisco UCS Fabric interconnect ports to virtual machines, simplifying server virtualization deployment and management.
For more information, see http://www.cisco.com/c/en/us/products/interfaces-modules/ucs-virtual-interface-card-1340/index.html.
Cisco Unified Computing System is revolutionizing the way servers are managed in data-center. The following are the unique differentiators of Cisco UCS and Cisco UCS Manager:
1. Embedded Management —In Cisco UCS, the servers are managed by the embedded firmware in the Fabric Interconnects, eliminating need for any external physical or virtual devices to manage the servers.
2. Unified Fabric —In Cisco UCS, from blade server chassis or rack servers to FI, there is a single Ethernet cable used for LAN, SAN and management traffic. This converged I/O results in reduced cables, SFPs and adapters – reducing capital and operational expenses of overall solution.
3. Auto Discovery —By simply inserting the blade server in the chassis or connecting rack server to the fabric interconnect, discovery and inventory of compute resource occurs automatically without any management intervention. The combination of unified fabric and auto-discovery enables the wire-once architecture of Cisco UCS, where compute capability of Cisco UCS can be extended easily while keeping the existing external connectivity to LAN, SAN and management networks.
4. Policy Based Resource Classification —When a compute resource is discovered by Cisco UCS Manager, it can be automatically classified to a given resource pool based on policies defined. This capability is useful in multi-tenant cloud computing. This CVD showcases the policy based resource classification of Cisco UCS Manager.
5. Combined Rack and Blade Server Management —Cisco UCS Manager can manage Cisco UCS B-Series blade servers and Cisco UCS C-Series rack server under the same UCS domain. This feature, along with stateless computing makes compute resources truly hardware form factor agnostic.
6. Model based Management Architecture —Cisco UCS Manager Architecture and management database is model based and data driven. An open XML API is provided to operate on the management model. This enables easy and scalable integration of Cisco UCS Manager with other management systems.
7. Policies, Pools, Templates —The management approach in Cisco UCS Manager is based on defining policies, pools and templates, instead of cluttered configuration, which enables a simple, loosely coupled, data driven approach in managing compute, network and storage resources.
8. Loose Referential Integrity —In Cisco UCS Manager, a service profile, port profile or policies can refer to other policies or logical resources with loose referential integrity. A referred policy cannot exist at the time of authoring the referring policy or a referred policy can be deleted even though other policies are referring to it. This provides different subject matter experts to work independently from each-other. This provides great flexibility where different experts from different domains, such as network, storage, security, server and virtualization work together to accomplish a complex task.
9. Policy Resolution —In Cisco UCS Manager, a tree structure of organizational unit hierarchy can be created that mimics the real life tenants and/or organization relationships. Various policies, pools and templates can be defined at different levels of organization hierarchy. A policy referring to another policy by name is resolved in the organization hierarchy with closest policy match. If no policy with specific name is found in the hierarchy of the root organization, then special policy named “default” is searched. This policy resolution practice enables automation friendly management APIs and provides great flexibility to owners of different organizations.
10. Service Profiles and Stateless Computing —a service profile is a logical representation of a server, carrying its various identities and policies. This logical server can be assigned to any physical compute resource as far as it meets the resource requirements. Stateless computing enables procurement of a server within minutes, which used to take days in legacy server management systems.
11. Built-in Multi-Tenancy Support —The combination of policies, pools and templates, loose referential integrity, policy resolution in organization hierarchy and a service profiles based approach to compute resources makes Cisco UCS Manager inherently friendly to multi-tenant environment typically observed in private and public clouds.
12. Extended Memory — the enterprise-class Cisco UCS B200 M5 blade server extends the capabilities of Cisco Unified Computing System portfolio in a half-width blade form factor. The Cisco UCS B200 M5 harnesses the power of the latest Intel® Xeon® Series processor family CPUs with up to 3 TB of total memory when using 128 GB DIMMs – allowing huge VM to physical server ratio required in many deployments, or allowing large memory operations required by certain architectures like Big Data.
13. Virtualization Aware Network —VM-FEX technology makes the access network layer aware about host virtualization. This prevents domain pollution of compute and network domains with virtualization when virtual network is managed by port-profiles defined by the network administrators’ team. VM-FEX also off-loads hypervisor CPU by performing switching in the hardware, thus allowing hypervisor CPU to do more virtualization related tasks. VM-FEX technology is well integrated with VMware vCenter, Linux KVM and Hyper-V SR-IOV to simplify cloud management.
14. Simplified QoS —Even though Fiber Channel and Ethernet are converged in Cisco UCS fabric, built-in support for QoS and lossless Ethernet makes it seamless. Network Quality of Service (QoS) is simplified in Cisco UCS Manager by representing all system classes in one GUI panel.
Cisco UCS for OpenStack
Cloud-enabled applications can run on organization premises, in public clouds, or on a combination of the two (hybrid cloud) for greater flexibility and business agility. Finding a platform that supports all these scenarios is essential. With Cisco UCS, IT departments can take advantage of technological advancements and lower the cost of their OpenStack deployments.
1. Open Architecture—A market-leading, open alternative to expensive, proprietary environments, the simplified architecture of Cisco UCS running OpenStack software delivers greater scalability, manageability, and performance at a significant cost savings compared to traditional systems, both in the datacenter and the cloud. Using industry-standard x86-architecture servers and open source software, IT departments can deploy cloud infrastructure today without concern for hardware or software vendor lock-in.
2. Accelerated Cloud Provisioning—Cloud infrastructure must be able to flex on demand, providing infrastructure to applications and services on a moment’s notice. Cisco UCS simplifies and accelerates cloud infrastructure deployment through automated configuration. The abstraction of Cisco Unified Compute System Integrated Infrastructure for Red Hat Enterprise Linux server identity, personality, and I/O connectivity from the hardware allows these characteristics to be applied on demand. Every aspect of a server’s configuration, from firmware revisions and BIOS settings to network profiles, can be assigned through Cisco UCS Service Profiles. Cisco service profile templates establish policy-based configuration for server, network, and storage resources and can be used to logically preconfigure these resources even before they are deployed in the cloud infrastructure.
3. Simplicity at Scale—With IT departments challenged to deliver more applications and services in shorter time frames, the architectural silos that result from an ad hoc approach to capacity scaling with traditional systems poses a barrier to successful cloud infrastructure deployment. Start with the computing and storage infrastructure needed today and then scale easily by adding components. Because servers and storage systems integrate into the unified system, they do not require additional supporting infrastructure or expert knowledge. The system simply, quickly, and cost-effectively presents more computing power and storage capacity to cloud infrastructure and applications.
4. Virtual Infrastructure Density—Cisco UCS enables cloud infrastructure to meet ever-increasing guest OS memory demands on fewer physical servers. The system’s high-density design increases consolidation ratios for servers, saving the capital, operating, physical space, and licensing costs that would be needed to run virtualization software on larger servers. With Cisco UCS B200 M4 latest Intel Xeon E5-2600 v3 Series processor up to 1536 GB of RAM (using 64 GB DIMMs), OpenStack deployments can host more applications using less-expensive servers without sacrificing performance.
5. Simplified Networking—In OpenStack environments, underlying infrastructure can become sprawling complex of networked systems. Unlike traditional server architecture, Cisco UCS provides greater network density with less cabling and complexity. Cisco’s unified fabric integrates Cisco UCS servers with a single high-bandwidth, low-latency network that supports all system I/O. This approach simplifies the architecture and reduces the number of I/O interfaces, cables, and access-layer switch ports compared to the requirements for traditional cloud infrastructure deployments. This unification can reduce network complexity by up to a factor of three, and the system’s wire-once network infrastructure increases agility and accelerates deployment with zero-touch configuration.
6. Installation Confidence—Organizations that choose OpenStack for their cloud can take advantage of the Red Hat OpenStack Platform Director. This software performs the work needed to install a validated OpenStack deployment. Unlike other solutions, this approach provides a highly available, highly scalable architecture for OpenStack services.
7. Easy Management—Cloud infrastructure can be extensive, so it must be easy and cost effective to manage. Cisco UCS Manager provides embedded management of all software and hardware components in Cisco UCS. Cisco UCS Manager resides as embedded software on the Cisco UCS fabric interconnects, fabric extenders, servers, and adapters. No external management server is required, simplifying administration and reducing capital expenses for the management environment.
The Cisco Nexus 9000 Series delivers proven high performance and density, low latency, and exceptional power efficiency in a broad range of compact form factors. Operating in Cisco NX-OS Software mode or in Application Centric Infrastructure (ACI) mode, these switches are ideal for traditional or fully automated data center deployments.
Figure 8 Cisco Nexus 9000 Series Switch
The Cisco Nexus 9000 Series Switches offer both modular and fixed 10/40/100 Gigabit Ethernet switch configurations with scalability up to 30 Tbps of non-blocking performance with less than five-microsecond latency, 1152 10 Gbps or 288 40 Gbps non-blocking Layer 2 and Layer 3 Ethernet ports and wire speed VXLAN gateway, bridging, and routing
Cisco Nexus 93180YC-EX Switches are used in the validation of this solution.
The Cisco 93180YC-EX switches used in this design offers the following benefits:
· High-performance, non-blocking architecture in a 1RU form factor with 1.8 Tbps of bandwidth, latency of less than 2 microseconds
· 8 fixed 1/10/25-Gbps SFP+ ports and 6 fixed 40/100-Gbps QSFP+ for uplink connectivity.
· Built-in hardware sensors for Cisco Tetration Analytics enable line-rate data collection and rich telemetry. Visibility into real-time buffer utilization per port and per queue enable monitoring of micro-bursts and application traffic patterns.
· 1+1 redundant hot-swappable power supplies and 3+1 hot swappable redundant fan trays
· Simplified Operations through PXE and POAP support, automation and configuration management with DevOps tools (Puppet, Chef, Ansible) and Open NX-OS, API support for HTTP(s) management through RPC, JSON or XML and Python based programmatic on-box or off-box access to switch.
· Investment Protection through 10/25 Gbps access connectivity and 40/100 Gbps uplink connectivity that facilitate migration to faster speeds. Use of Cisco 40Gbps bidirectional transceiver enables reuse of existing 10 Gigabit Ethernet multimode cabling for 40 Gigabit Ethernet.
For more information, see http://www.cisco.com/c/en/us/support/switches/nexus-93180yc-ex-switch/model.html#~tab-documents.
In summary, the FlexPod SF with Red Hat OpenStack Platform 10, leveraging Cisco Nexus Cloud Scale switches cannot only improve response time and performance of application traffic, including storage traffic, it can also provide innovative features and capabilities important for next generation data center.
NetApp SolidFire adds to the FlexPod portfolio of storage solutions and delivers a scale-out, all-flash storage for the next generation data center. NetApp SolidFire is a programmable all-flash array that enables storage automation for the non-expert in storage. Starting with a minimum of four nodes, customers can scale their storage cluster as needed based on tenant or workload requirements, all while configuring individual performance guarantees for each tenant or workload. NetApp SolidFire is the preferred storage array for service providers, enterprises managing multiple workloads, tenants on common infrastructure, and those customers looking for simple to use storage resources.
The NetApp SolidFire SF9608 storage appliance is an iSCSI-based, all-flash storage solution based on the Cisco UCS C220 M4S server platform. Leveraging industry-leading QoS and scale-out capabilities, NetApp SolidFire can deliver guaranteed performance to hundreds of applications on a single, highly-available storage array. SolidFire storage nodes leverage high-bandwidth 10GbE connectivity to provide up to 20Gbps throughput per storage node. The shared-nothing, scale-out architecture of NetApp SolidFire allows you to scale storage capacity and performance when adding additional nodes to the storage array without downtime.
Figure 9 illustrates the front and rear of the SolidFire SF9608 node.
Figure 9 NetApp SolidFire SF9608
The SolidFire SF9608 comes in a 1RU form factor with the following specifications:
· 75,000 IOPS per node available for workloads
· 7.6TB of raw capacity per node
· CPU: 2x2.6GHx CPU (E5-2640v3)
· Memory: 256GB of RAM
· 8 x 960GB SSD drives (non-SED)
SF9608 comes with the following physical interfaces:
· 2 x 1 GbE interfaces (Bond1G): Optionally used as the in-band management interface of the node, and/or the API endpoint
· 2 x 10 GbE interfaces (Bond10G): Used for iSCSI storage and intra-cluster communication
· 1 x 1 GbE interface (CIMC): Out-of-Band management of the node
Each pair of 1GbE and 10GbE interfaces are bundled into a single logical interface (Bond1G, Bond10G) to facilitate redundancy and higher aggregate uplink bandwidth. By default, both bonds are set into Active/Passive bond mode. LACP bond mode is recommended and used on the storage 10G uplinks.
A SolidFire storage array, also known as a SolidFire cluster, is comprised of a minimum of four SF9608 storage nodes, scalable up to 40 nodes in a single storage cluster. Within the cluster, each SolidFire node’s CPU, networking, and storage contributes to the available resources of the overall cluster. As nodes are added or removed from the cluster, the cluster’s overall available resources scale linearly.
Initially, all SolidFire storage nodes exist as stand-alone nodes, but they cannot serve data until they are joined to a cluster. After the proper network settings and proposed cluster name are configured on each node, a cluster can be created and nodes can be added. With initial cluster creation, one of the nodes from the cluster group is elected to be the cluster master. The cluster master is responsible for overall cluster management and decision making as well as redirecting iSCSI initiators to the proper member storage node(s) for volume access.
Administrators and hosts can access the cluster using virtual IP addresses. Any node in the cluster can host the virtual IP addresses. The Management Virtual IP (MVIP) enables cluster management through a 1GbE connection, while the Storage Virtual IP (SVIP) enables host access to storage through a 10GbE connection.
Management Virtual IP
Storage Virtual IP
These virtual IP address enable consistent connections regardless of the size or makeup of the cluster. If a node hosting a virtual IP address fails, another node in the cluster begins hosting the virtual addresses.
NetApp SolidFire leverages the iSCSI storage protocol, a standard way of encapsulating iSCSI commands on a traditional TCP/IP network. NetApp SolidFire is a block-only array by design because iSCSI is a simple to use, standards-based method of communicating between hosts, and storage. When iSCSI standards change or the performance of Ethernet networks improve, the iSCSI storage protocol benefits without the need for any changes.
One iSCSI feature that NetApp SolidFire takes advantage of is iSCSI login redirection. When a host communicates with storage by using iSCSI, a log-in request is the first step in the process of establishing a connection or log-in.
This login request includes the following information:
· Host initiator iSCSI Qualified Name (IQN)
· iSCSI target IQN
· Authentication information
· Parameters that allow the host and storage to negotiate
An iSCSI session is established when an iSCSI initiator successfully connects with a iSCSI target and logs in. With NetApp SolidFire, the iSCSI target IQN is assigned on a per-volume basis. Although all storage nodes have a management IP and a storage IP, NetApp SolidFire advertises a single SVIP address for all storage traffic of the cluster. As a part of the iSCSI log-in process, the storage can respond that the target volume has been moved to a different address and therefore it cannot proceed with the negotiation process. The host then reissues the login request to the new address in a process that requires no host-side reconfiguration. This process is known as iSCSI login redirection.
iSCSI login redirection is a key part of the NetApp SolidFire cluster. When a host login request is received, the node decides which member of the cluster should handle the traffic based on the IOPS and capacity requirements for the volume. Volumes are distributed across the SolidFire cluster and are redistributed if a single node is handling too much traffic for its volumes or a new node is added. Multiple copies of a given volume are allocated across the array. In this manner, if a node failure is followed by volume redistribution, there is no effect on host connectivity beyond a logout and login with redirection to the new location. With iSCSI redirection, NetApp SolidFire is a self-healing, scale-out architecture capable of nondisruptive upgrades and operations.
Each SF9608 node contains 8 SSD drives that are used to store a portion of the data for the cluster. Data is distributed so that individual drive or node failures do not affect availability or performance. Two types of drives exist in a SolidFire storage node:
· Metadata drives: Store compressed information that defines and tracks each volume, clone, or NetApp Snapshot® within a cluster.
· Block drives: Store the compressed, de-duplicated data blocks for server application volumes.
The drive in slot one acts as both the boot and metadata drive. It is important to note that if the slot1 drive fails, the SF node will become unresponsive and require an immediate replacement of the failed drive. However, the cluster will work around the failed node and provide access to the same data through other nodes. From a client perspective, iSCSI timeouts will determine failover to other nodes to maintain access to data.
Volumes (Provisioned banks of storage) in a SolidFire storage cluster, are assigned to an account and are accessed through a unique iSCSI target IQN. Volumes possess the following characteristics:
· Each volume has a unique iSCSI IQN address and a single LUN on that target.
· When a volume is provisioned, it consumes only a small amount of space on the metadata drives.
· Each volume is associated with a single tenant account, (or owner) at the time of creation. The account is used for security control, iSCSI access, and reporting purposes.
· SolidFire uses Advanced Format (4K) blocks internally. Volumes can be accessed using 4K native (4Kn) sectors for compatible operating systems, or 512 emulation (512e) for legacy operating systems. When partitions are properly aligned, there is no performance effect from using 512e mode.
· NetApp SolidFire Helix® promotes data availability because the metadata for an individual volume is stored on a metadata drive and replicated to a secondary metadata drive for redundancy.
Accounts are a multitenancy construct that allows for logical partitioning of the available storage resources. There are two types of accounts on a SolidFire cluster:
· Cluster admin accounts. These accounts are used to administer and manage a SolidFire cluster.
· Tenant accounts. These accounts are used to control access to SolidFire volumes from iSCSI initiators. When a unique account user name is created, CHAP initiator and target passwords are defined, and are used to secure volume access based on the account assigned to the volume.
A maximum of 2000 volumes can be assigned to a tenant account. A volume can have only one tenant account assigned to it.
As part of the base configuration, you should create an infrastructure “administrator” tenant account to host boot and other infrastructure-related volumes.
A volume access group is a collection of volumes that users can access using iSCSI initiators. Access groups are created by mapping iSCSI initiator IQNs in a collection of volumes. Each IQN in the access group can access each volume in the group without requiring CHAP authentication.
A SolidFire cluster allows QoS to be dynamically configured on a per-volume basis. Per-volume QOS settings can be used to control storage performance based on customer-defined SLAs.
There are three configurable parameters that define the QoS:
· Min IOPS. The minimum number of sustained inputs and outputs per second (IOPS) that are provided by the SolidFire cluster to a volume. The Min IOPS configured for a volume is the guaranteed level of performance for a volume. Per-volume performance does not drop below this level.
· Max IOPS. The maximum number of sustained IOPS that are provided by the SolidFire cluster to a particular volume.
· Burst IOPS. The maximum number of IOPS allowed in a short burst scenario. If a volume has been running below the maxIOPS, burst credits are accumulated. When performance levels become very high and are pushed to maximum levels, short bursts of IOPS beyond Max IOPS are allowed on the volume.
With NetApp SolidFire, dedicated storage silos are a thing of the past. SolidFire’s combination of advanced QoS, security, and flexible networking features facilitate mixed-workload and multi-tenant environments on a single SolidFire cluster while guaranteeing individual application performance.
Secure multi-tenancy is achieved through the following features:
· Secure Authentication: CHAP for secure volume access; LDAP for secure access to the cluster for management and reporting.
· Volume Access Groups (VAGs): Optionally, VAGs can be utilized in lieu of authentication, mapping any number iSCSI initiator specific IQNs, to one or more volumes. In order to access a volume in a VAG, the initiator’s IQN must be in the allowed IQN list for the group of volumes.
· Tenant VLANs: At the network level, end-to-end network security between iSCSI initiators and SolidFire Storage Arrays are facilitated by using VLANs. For any VLAN created to isolate a workload or tenant, SolidFire creates a separate iSCSI target SVIP that is only accessible through the specific VLAN.
· VRF-Enabled VLANs: To further support security and scalability in the data center, SolidFire allows one to enable any tenant VLAN for “VRF-like” functionality. This feature adds two key capabilities:
- L3 routing to a tenant SVIP – This feature allows you to situate iSCSI initiators on a separate network/VLAN from that of the SolidFire storage array.
- Overlapping/Duplicate IP subnets – Provides the ability to add a template to tenant environments, allowing each respective tenant VLAN to be assigned IP addresses from the same IP subnet. This can be useful in service provider environments where scale and preservation of IP space is important.
The SolidFire storage system leverages key features to increase the overall storage efficiency and performance. These features are performed inline, are always on, and require no manual configuration by the user.
· Deduplication. The system only stores unique 4K blocks. Any duplicate 4K blocks are automatically associated to an already stored version of the data. Data is on block drives and is mirrored by using the SolidFire Helix feature. This system significantly reduces capacity consumption and write operations within the system.
· Compression. Compression is performed inline before data is written to NVRAM. Data is compressed and stored in 4K blocks and, once compressed, it remains compressed in the system. This significantly reduces capacity consumption, write operations, and bandwidth consumption across the cluster.
· Thin provisioning. This capability provides the right amount of storage at the time it is needed eliminating capacity consumption caused by over provisioning volumes or underutilized volumes.
· SolidFire Helix®. The metadata for an individual volume is stored on a metadata drive and replicated to a secondary metadata drive for redundancy
Red Hat OpenStack Platform Director manages the provisioning of Red Hat OpenStack Platform components on a set of machines. Red Hat OpenStack Platform Director provides web-based graphical user interface for managing the installation, configuration, and scalability of OpenStack environments. The application achieves this through discovering bootable hosts and mapping OpenStack services to them via web interface. Director uses DHCP, DNS, and PXE services to perform OpenStack deployment on remote hosts.
Red Hat Enterprise Linux 7.4 lays the foundation for the open hybrid cloud and serves enterprise workloads across converged infrastructure. Red Hat Enterprise Linux 7.4 works on four platforms: Bare metal servers, virtual machines (VM), OpenStack based Infrastructure-as-a-Service (IaaS), and Platform-as-a-Service (PaaS) clouds. These, in turn, can be used together to form a robust, powerful datacenter and cloud environment for business. While Red Hat Enterprise Linux 7.4 still uses Kernel Virtual Machine (KVM) for datacenter and cloud virtualization, it also adopts container technology so that users can get even more applications working on the same server hardware. Red Hat Enterprise Linux 7.4 provides many stability and performance upgrades.
Red Hat OpenStack Platform provides Infrastructure-as-a-Service (IaaS) foundation for public, private or hybrid cloud computing environment on top of Red Hat Enterprise Linux. Red Hat OpenStack Platform meets enterprise requirements with ability to extensively scale and provide a fault tolerant and highly available environment.
OpenStack is made up of many different moving parts. Because of its open nature, anyone can add additional components to OpenStack to meet their requirements. The Red Hat OpenStack Platform IaaS cloud is implemented by a collection of interacting services that control its computing, storage, and networking resources.
Red Hat OpenStack Platform offer IaaS cloud as a collection of interacting services that controls compute, networking, and storage resources. The cloud manageability can be performed either via GUI based dashboard or using rich set of CLI commands. Command line allows cloud administrator to control, provision, automate, and orchestrate OpenStack resources. OpenStack also offers extensive REST based API which can be leveraged for integrating the platform with other automation and orchestration tools.
Figure 10 provides a high-level view of OpenStack core services architecture.
Figure 10 OpenStack Core Services
Table 1 describes each component that forms interacting cloud services for OpenStack.
Table 1 OpenStack Core Services
Web browser-based dashboard that you use to manage OpenStack services.
Centralized service for authentication and authorization of OpenStack services and for managing users, projects, and roles.
Provides connectivity between the interfaces of OpenStack services.
Manages persistent block storage volumes for virtual machines.
Manages and provisions virtual machines running on hypervisor nodes.
Registry service that you use to store resources such as virtual machine images and volume snapshots.
Allows users to store and retrieve files and arbitrary data.
Provides measurements of cloud resources.
Template-based orchestration engine that supports automatic creation of resource stacks.
For more information, see https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/product_guide/ch-rhosp-software.
The Red Hat OpenStack Platform director is a toolset for installing and managing a complete OpenStack environment. It is based primarily on the OpenStack project TripleO, which is an abbreviation for "OpenStack-On-OpenStack". This project takes advantage of OpenStack components to install a fully operational OpenStack environment; this includes new OpenStack components that provision and control bare metal systems to use as OpenStack nodes. This provides a simple method for installing a complete Red Hat OpenStack Platform environment that is both lean and robust. The Red Hat OpenStack Platform director uses two main concepts: an Undercloud and an Overcloud. The Undercloud installs and configures the Overcloud.
Red Hat OpenStack Platform 10 employs several technologies to implement high-availability. High availability is offered in different ways for controller, compute, and storage nodes in your OpenStack configuration. To investigate how high availability is implemented, log into each node and run commands, as described in the following sections. The resulting output shows you the high availability services and processes running on each node.
Most of the coverage of high availability (HA) in this document relates to controller nodes. There are three primary HA technologies used on Red Hat OpenStack Platform 10 controller nodes:
· Pacemaker: By configuring virtual IP addresses, services, and other features as resources in a cluster, Pacemaker makes sure that the defined set of OpenStack cluster resources are running and available. When a service or entire node in a cluster fails, Pacemaker can restart the service, take the node out of the cluster, or reboot the node. Requests to most of those services is done through HAProxy.
· HAProxy: When you configure more than one controller node with the director in Red Hat OpenStack Platform 10, HAProxy is configured on those nodes to load balance traffic to some of the OpenStack services running on those nodes.
· Galera: Red Hat OpenStack Platform uses the MariaDB Galera Cluster to manage database replication.
Highly available services in OpenStack run in one of two modes:
· Active/active: In this mode, the same service is on multiple controller nodes, then traffic can either be distributed across the nodes running the requested service by HAProxy or directed to a particular controller via a single IP address. In some cases, HAProxy distributes traffic to active/active services in a round robin fashion. Performance can be improved by adding more controller nodes.
· Active/passive: Services that are not capable of or reliable enough to run in active/active mode are run in active/passive mode. This means that only one instance of the service is active at a time.
For Galera, HAProxy uses stick-table options to make sure incoming connections are directed to a single backend service. Galera master-master mode can deadlock when services are accessing the same data from multiple Galera nodes at once.
As you begin exploring the high availability services described in this document, keep in mind that the director system (referred to as the undercloud) is itself running OpenStack. The purpose of the undercloud (director system) is to build and maintain the systems that will become your working OpenStack environment. That environment you build from the undercloud is referred to as the overcloud. To get to your overcloud, this document has you log into your undercloud, then choose which Overcloud node you want to investigate.
To make the right decisions to store and protect your data, it is important to understand the various types of storage that you may come across in the context of an OpenStack cloud.
Cloud users do not have access to any form of persistent storage by default if you only deploy Nova (compute service). When a user terminates a VM, the associated ephemeral disks are lost along with their data.
Persistent Storage As the name suggests, persistent storage allows your saved data and storage resources to exist even if an associated instance is removed. OpenStack supports the types of persistent storage listed in the table below.
Table 2 Types of persistent storage in OpenStack
Also called volume storage, users can use block storage volumes for their VM instance boot volumes and attached secondary storage volumes. Unlike ephemeral volumes, block volumes retain their data when they are remounted on another VM. Cinder provides block storage services in an OpenStack cloud. It enables access to the underlying storage hardware’s block device through block storage drivers. This results in improved performance and allows users to consume any feature or technology supported by the underlying storage hardware, such as deduplication, compression, and thin provisioning. To learn more about Cinder and block storage, see https://wiki.openstack.org/wiki/Cinder
The OpenStack object storage service (Swift) allows users to access binary objects through a REST API, which is useful for the management of large datasets in a highly scalable, highly available manner. To learn more about Swift and object storage, see http://docs.openstack.org/developer/swift/.
File share systems
A share is a remote, mountable file system that can be shared among multiple hosts at the same time. The OpenStack file share service (Manila) is responsible for providing the required set of services for the management of shared file systems in a multitenant cloud.
Table 3 Different Storage Types in an OpenStack Cloud
OpenStack project name
Deleted when instance is deleted
Not deleted when instance is deleted1 ; persists until deleted by user
Persists until deleted by user
Block device that can be formatted and used with a file system
Used for running the operating system of a VM
Used for providing additional block storage for VM instances2
Used for storing and managing large datasets that may include VM images
The OpenStack Block Storage service provides management of persistent block storage resources. In addition to acting as secondarily attached persistent storage, you can write images into a Cinder volume for Nova to utilize as a bootable, persistent root volume for an instance.
As a management service, Cinder controls the provisioning and lifecycle management of block storage volumes. It does not reside in the I/O (data) path between the hypervisor and the storage controller, as depicted in Figure 11.
Figure 11 OpenStack Cinder Workflow
SolidFire is integrated with OpenStack through the Cinder driver and the standard iSCSI protocol. SolidFire’s Cinder driver is elegant and straightforward, allowing customers to configure OpenStack for SolidFire quickly, without the need for additional configuration libraries or add-ons. The maturity of the driver reveals itself in its easy setup for users and the completeness of the feature set once setup is complete:
· Clone full environment with no impact to storage capacity
· Guarantee performance through full Quality of Service (QoS)
· Speed deployment with Glance image caching, and boot-from-volume capabilities
· Live-migrate instances between OpenStack compute servers completely non-disruptively
· Easily triage issues through
- 1:1 ID mapping between Cinder Vol-ID and SolidFire volume name
- 1:1 mapping and automated creation of tenant/project IDs as SolidFire accounts
· Automated configuration of the SolidFire’s Cinder driver available through the Puppet classes of TripleO
SolidFire Cinder drivers are included with all recent OpenStack distributions and are integrated with Red Hat OpenStack Platform Director to enable seamless implementation during Overcloud deployment. Heat templates are used to complete the configuration, in which the cinder.conf file is populated with the storage service name, IP address, and login credentials. All Nova compute nodes are then automatically configured to use the Cinder service as specified during Overcloud deployment
SolidFire cinder drives enables to associate the QoS attributes like minIOPS, maxIOPS and burstIOPS (as described in the section above) using the volume types and the QoS specs available in OpenStack.
The OpenStack image service (Glance) provides discovery, registration, and delivery services for VM, disk, and server images. Glance provides a RESTful API that allows the querying of VM image metadata as well as the retrieval of the actual image. A stored image can be used as a template to start up new servers quickly and consistently, rather than provisioning multiple servers, installing a server operating system, and individually configuring additional services. Such an image can also be used to store and catalog an unlimited number of backups.
To make sure that VM images are consistent and highly available for the Glance service, Swift storage is used as a backup for Glance. By default, Swift is configured as the backend for Glance out of the box in the Red Hat OpenStack platform.
Swift is a highly available, distributed, and consistent (ensured by the Swift replication mechanism) object store. Object storage does not present a traditional file system. Instead, object storage is a distributed storage system for static, unstructured data objects such as VM images, photo storage, e-mail storage, backups, and archives. Swift is also configured as a default backend for storing VM images for Glance.
Customers can use NetApp SolidFire storage to provide block storage (LUNs) for the Swift service and then scale horizontally by adding additional SF9608 nodes as the object store grows. The NetApp SolidFire hosts the Swift data using the iSCSI protocol. The three OpenStack controller nodes are also used as Swift nodes and handle account, container, and object services for Swift. In addition, these three nodes also serve as proxy servers for the Swift service.
Swift uses zoning to isolate the cluster into separate partitions and isolate the cluster from failures. Swift data is replicated across the cluster in zones that are as unique as possible. Typically, zones are established using the physical attributes of the cluster, including geographical locations, separate networks, equipment racks, storage subsystems, or even single drives. Zoning allows the cluster to function and tolerate equipment failures without data loss or loss of connectivity to the remaining cluster.
By default, Swift replicates data three times across the cluster. Swift replicates data across zones in a unique way that promotes HA and high durability. Swift chooses a server in an unused zone before it chooses an unused server in a zone that already has a replica of the data.
Table 4 and Table 5 describes solution components and the hardware and software releases used for solution validation.
Table 4 Validated Hardware Versions
Cisco UCS Manager
Cisco UCS B200 M5 Server
Cisco UCS Fabric Interconnects 6332UP
Cisco UCS VIC 1340
Cisco eNIC Driver
Cisco Nexus 93180YC
NetApp SolidFire SF9608
Table 5 Validated Software Versions
Red Hat Enterprise Linux
Red Hat OpenStack Platform
10 (Newton Based)
Red Hat OpenStack Platform Director
ML2 Drivers (Red Hat Certified)
NetApp SolidFire Cinder driver
Neutron driver for UCSM
Neutron driver for Nexus
FlexPod SF with Red Hat OpenStack Platform 10 physical topology is made up of the following components.
Table 6 Solution Components
Cisco UCS B200 M5 Blade Servers
4 per Chassis
Cisco UCS 5108 Chassis
Cisco UCS 2304XP IO Modules
2 Per Chassis
Cisco UCS 6332UP Fabric Interconnect
Cisco Nexus 93180CY
NetApp SolidFire SF9608 All-Flash Array
This section describes the physical topology of this solution. Figure 12 illustrates all of the connected devices.
Figure 12 High-Level Diagram Showing All Connected Devices
Figure 13 demonstrates the FlexPod SF with Red Hat OpenStack Platform 10.0 compute, network, and storage design overview. The infrastructure is fully redundant and highly available end-to-end. The solution also incorporates NetApp technologies and feature to further enhance design efficiencies.
Figure 13 FlexPod SF with Red Hat OpenStack Platform 10.0 Design Overview
The Cisco VIC 1340, the next generation 2-port, 40 Gigabit Ethernet, Fibre Channel over Ethernet (FCoE)-capable modular LAN on motherboard (mLOM) mezzanine adapter is designed for both B200 M4 and M5 generation of Cisco UCS B-Series Blade Servers.
For more information, see http://www.cisco.com/c/en/us/products/interfaces-modules/ucs-virtual-interface-card-1340/index.html.
Cisco UCS Fabric Extender also knows as IO Modules operates as a remote line card to the Fabric Interconnect. Each Cisco UCS chassis is equipped with a pair of Cisco UCS Fabric Extenders. The IO-Module has Network facing interfaces (NIF), which connect the IOM to the Fabric Interconnect, and host facing interfaces (HIF), which connect the IOM to the adapters on the blades. All interfaces are 10Gb DCE (Datacenter Ethernet) in Cisco UCS 2200 Series FEX. There are two different IOMs in this series, each providing a different number of interfaces, 2204XP and 2208XP. Cisco UCS 2208XP has eight 10 Gigabit Ethernet, FCoE-capable ports that connect the blade chassis to the fabric interconnect. The Cisco UCS 2204 has four external ports with identical characteristics to connect to the fabric interconnect. Each Cisco UCS 2208XP has thirty-two 10 Gigabit Ethernet ports connected through the mid-plane to the eight half-width slots (4 per slot) in the chassis, while the 2204XP has 16 such ports (2 per slot).
Cisco UCS 2304 Fabric Extender brings the unified fabric into the blade server enclosure, providing multiple 40 Gigabit Ethernet connections between blade servers and the fabric interconnect, It is a third-generation I/O Module (IOM) that shares the same form factor as the second-generation Cisco UCS 2200 Series Fabric Extenders and is backward compatible with the shipping Cisco UCS 5108 Blade Server Chassis. The Cisco UCS 2304 can be inserted into an existing blade chassis.
For more information about Cisco UCS 2304 FEX, see https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-6300-series-fabric-interconnects/datasheet-c78-675243.html.
Selection of the FEX, VIC and Mezzanine cards plays an important role in determining the aggregate traffic throughput to and from a server. Figure 14 shows an overview of backplane connectivity of Cisco VIC 1340 architecture. The number of KR lanes indicates the 10GbE paths available to the chassis and therefore blades. As shown in Figure 14, depending on the models of I/O modules and VICs, traffic aggregation differs. 2204XP enables 2 KR lanes per half-width blade slot whereas the 2208XP enables all four. Similarly number of KR lanes varies based on selection of VIC 1240/1340, VIC 1240/1340 with Port Expander and VIC 1280/1380.
Figure 14 Cisco UCS VIC 1340 Architecture
In this FlexPod design, validated IO components are listed in Table 7.
Table 7 I/O Components
In FlexPod SF with Red Hat OpenStack Platform 10.0, Cisco UCS B200 M5 servers have been validated with Cisco VIC 1340 and Cisco 2304XP Fabric Extender.
Each adapter in Cisco UCS is a dual-port adapter that connects to both fabrics (A and B). The two fabrics in Cisco UCS provide failover protection in the event of planned or unplanned component downtime in one of the fabrics. Typically, host software such as NIC teaming for Ethernet or multipath I/O (MPIO) for Fibre Channel provides failover across the two fabrics (Figure 15). A vNIC in Cisco UCS is a host-presented PCI device that is centrally managed by Cisco UCS Manager. The fabric-based failover feature, which you enable by selecting the high-availability vNIC option in the service profile definition, allows network interface virtualization (NIV)-capable adapters and the fabric interconnects to provide active-standby failover for Ethernet vNICs without any NIC-teaming software on the host. For unicast traffic failover, the fabric interconnect in the new path sends Gratuitous Address Resolution Protocols (GARPs). This process refreshes the forwarding tables on the upstream switches.
Figure 15 Fabric Failover
For multicast traffic, the new active fabric interconnect sends an Internet Group Management Protocol (IGMP) Global Leave message to the upstream multicast router. The upstream multicast router responds by sending an IGMP query that is flooded to all vNICs. The host OS responds to these IGMP queries by rejoining all relevant multicast groups. This process forces the hosts to refresh the multicast state in the network in a timely manner.
Cisco UCS fabric failover is an important feature because it reduces the complexity of defining NIC teaming software for failover on the host. It does this transparently in the fabric based on the network property that is defined in the service profile.
Cisco UCS Physical Connectivity to Nexus 9000
Cisco UCS Fabric Interconnects are connected with Nexus 93180CY as shown in the Figure 16. It is configured with two port-channels, one from each Cisco UCS Fabric Interconnect to Cisco Nexus 93180CY. These port channels carry all the storage and data traffic. In this validated design, two uplinks from each fabric interconnect to the leaf switches have been utilized for an aggregate bandwidth of 40Gbe (4x10Gbe). However, the number of links can be increased based on the customer’s throughput requirements.
Figure 16 Cisco UCS Physical Connectivity to Cisco Nexus 9000 Series Switch
Cisco Unified Computing System can be configured to discover a chassis using Discrete Mode or the Port-Channel mode (Figure 17). In Discrete Mode each FEX KR connection and therefore server connection is tied or pinned to a network fabric connection homed to a port on the Fabric Interconnect. In the presence of a failure on the external "link" all KR connections are disabled within the FEX I/O module. In Port-Channel mode, the failure of a network fabric link allows for redistribution of flows across the remaining port channel members. Port-Channel mode therefore is less disruptive to the fabric and hence recommended in the FlexPod designs.
Figure 17 Chassis Discovery Policy – Discrete vs. Port Channel
FlexPod SF accommodates a myriad of traffic types (iSCSI, NFS, VM traffic, etc.) and is capable of absorbing traffic spikes and protect against traffic loss. Cisco Unified Computing System and Nexus QoS system classes and policies deliver this functionality. In this validation effort the FlexPod SF was configured to support jumbo frames with an MTU size of 9000. Enabling jumbo frames allows the FlexPod SF environment to optimize throughput between devices while simultaneously reducing the consumption of CPU resources.
When setting jumbo frames, it is mandatory that MTU settings are applied uniformly across the stack to prevent packet drops and adverse performance.
Cisco Nexus 9000 Series Modes of Operation
The Cisco Nexus 9000 family of switches also supports two modes of operation: NxOS standalone mode and Application Centric Infrastructure (ACI) fabric mode. In standalone mode, the switch performs as a typical Cisco Nexus switch with increased port density, low latency and 40Gbe connectivity. In fabric mode, the administrator can take advantage of Cisco ACI. The Cisco Nexus 9000 stand-alone mode FlexPod design consists of a pair of Cisco Nexus 9000 Series top of rack switches. When leveraging ACI fabric mode, the Cisco Nexus 9500 and 9300 switches are deployed in a spine-leaf architecture. Although the reference architecture covered in this document does not leverage ACI, it lays the foundation for customers to migrate to Cisco ACI by leveraging the Cisco Nexus 9000 switches. Cisco ACI is a holistic architecture with centralized automation and policy-driven application profiles. Cisco ACI delivers software flexibility with the scalability of hardware performance. Key characteristics of ACI include:
· Simplified automation by an application-driven policy model
· Centralized visibility with real-time, application health monitoring
· Open software flexibility for DevOps teams and ecosystem partner integration
· Scalable performance and multi-tenancy in hardware
The future of networking with Cisco ACI is about providing a network that is deployed, monitored, and managed in a fashion that supports DevOps and rapid application change. Cisco ACI does so through the reduction of complexity and a common policy framework that can automate provisioning and managing of resources.
In FlexPod SF with Red Hat OpenStack Platform 10.0, a pair of Cisco Nexus 9000 series switch is used in standalone NX-OS mode
In this validated architecture, a pair of Cisco Nexus 9000 is deployed in NxOS standalone mode. Cisco Nexus 9000 related best practices are used in the validation of the FlexPod SF with Red Hat OpenStack Platform 10.0 architecture. It is summarized as below:
· Cisco Nexus 9000 features enabled
- Link Aggregation Control Protocol (LACP part of 802.3ad)
- Cisco Virtual Port Channeling (vPC) for link and device resiliency
- Enable Cisco Discovery Protocol (CDP) for infrastructure visibility and troubleshooting
- Interface VLAN for configuring the Layer-3 SVI interfaces for the VLANs designated for forwarding.
· vPC considerations
- Define a unique domain ID
- Set the priority of the intended vPC primary switch lower than the secondary (default priority is 32768)
- Establish peer keepalive connectivity. It is recommended to use the out-of-band management network (mgmt0) or a dedicated switched virtual interface (SVI)
- Enable vPC auto-recovery feature
- Enable peer-gateway. Peer-gateway allows a vPC switch to act as the active gateway for packets that are addressed to the router MAC address of the vPC peer allowing vPC peers to forward traffic
- Enable IP arp synchronization to optimize convergence across the vPC peer link. Note: Cisco Fabric Services over Ethernet (CFSoE) is responsible for synchronization of configuration, Spanning Tree, MAC and VLAN information, which removes the requirement for explicit configuration. The service is enabled by default.
- A minimum of two 10 Gigabit Ethernet connections are required for vPC
- All port channels should be configured in LACP active mode
· Spanning tree considerations
- The spanning tree priority was not modified. Peer-switch (part of vPC configuration) is enabled which allows both switches to act as root for the VLANs
- Loopguard is disabled by default
- BPDU guard and filtering are enabled by default
- Bridge assurance is only enabled on the vPC Peer Link.
- Ports facing the NetApp storage controller and UCS are defined as "edge" trunk ports
For configuration details, see the Cisco Nexus 9000 Series Switches Configuration Guide.
FlexPod Cisco Nexus 9000 standalone design is an end-to-end IP-Based storage solution that supports SAN access using iSCSI. The solution provides a 10GbE-enabled, 40G capable, fabric defined by Ethernet uplinks from the Cisco UCS Fabric Interconnects and NetApp storage devices connected to the Cisco Nexus switches, as The Cisco Nexus 9000 standalone design does not employ a dedicated SAN switching environment and requires no direct Fibre Channel connectivity as iSCSI is the SAN protocol leveraged.
There are no local disks in this architecture. Red Hat Enterprise Linux 7.4 is deployed on boot LUNs provided by NetApp SolidFire through the iSCSI protocol.
As illustrated in Figure 17, link aggregation technologies play an important role, providing improved aggregate bandwidth and link resiliency across the solution stack. The NetApp storage controllers, Cisco Unified Computing System, and Cisco Nexus 9000 platforms support active port channeling using 802.3ad standard Link Aggregation Control Protocol (LACP). Port channeling is a link aggregation technique offering link fault tolerance and traffic distribution (load balancing) for improved aggregate bandwidth across member ports. In addition, the Cisco Nexus 9000 series features virtual PortChannel (vPC) capabilities. vPC allows links that are physically connected to two different Cisco Nexus 9000 Series devices to appear as a single "logical" port channel to a third device, essentially offering device fault tolerance. vPC addresses aggregate bandwidth, link, and device resiliency. The Cisco UCS Fabric Interconnects and NetApp SolidFire controllers benefit from the Cisco Nexus vPC abstraction, gaining link and device resiliency as well as full utilization of a non-blocking Ethernet fabric.
This dedicated uplink design leverages IP-based storage-capable NetApp SolidFire controllers. From a storage traffic perspective, both standard LACP and the Cisco vPC link aggregation technologies play an important role in the FlexPod distinct uplink design. Figure 17 illustrates the use of dedicated 10GbE uplinks between the Cisco UCS Fabric Interconnects and the Cisco Nexus 9000 unified switches. vPC links between the Cisco Nexus 9000 and the NetApp storage controllers' 10GbE provide a robust connection between host and storage.
vPC Peer Link
vPC requires a "peer link" which is documented as port channel 10 Figure 17. In addition to the vPC peer-link, vPC peer keepalive link is a required component of a vPC configuration. The peer keepalive link allows each vPC enabled switch to monitor the health of its peer. This link accelerates convergence and reduces the occurrence of split-brain scenarios.
OpenStack Modular Layer 2 (ML2) allows the separation of network segment types and the device specific implementation of segment types. ML2 architecture consists of multiple ‘type drivers’ and ‘mechanism drivers’. Type drivers manage the common aspects of a specific type of network while the mechanism driver manages specific device to implement network types.
· Cisco UCS Manager
· Cisco Nexus
· Openvswitch, Linuxbridge
The Cisco Nexus driver for OpenStack Neutron allows customers to easily build their Infrastructure-as-a-Service (IaaS) networks using the industry’s leading networking platform, delivering performance, scalability, and stability with the familiar manageability and control you expect from Cisco® technology. ML2 Nexus drivers dynamically provision OpenStack managed VLANs on Cisco Nexus switches. They configure the trunk ports with dynamically created VLANs solving the logical port count issue on the Nexus switches. They provide better manageability of the network infrastructure.
ML2 Cisco UCS Manager drivers dynamically provision OpenStack managed VLANs on Fabric Interconnects. They configure VLANs on Controller and Compute node VNICs. The Cisco UCS Manager Plugin talks to the Cisco UCS Manager application running on Fabric Interconnect and is part of an ecosystem for Cisco UCS Servers that consists of Fabric Interconnects and IO modules. The ML2 Cisco UCS Manager driver does not support configuration of Cisco UCS Servers, whose service profiles are attached to Service Templates. This is to prevent that same VLAN configuration to be pushed to all the service profiles based on that template. The plugin can be used after the Service Profile has been unbound from the template.
Traditional storage area networks (SANs) were designed to use FC to guarantee high-bandwidth, low-latency connectivity between servers and storage devices. Redundant switches were implemented to provide HA for storage connectivity and to make sure there is no single point of failure for storage access. Separate network infrastructures permit performance and availability, but at the cost of additional hardware and management overhead.
Ethernet networking has evolved and can now deliver much higher throughput with lower latency than ever before. Lossless Ethernet protocols, advanced bandwidth management and HA features like virtual port channels (vPCs) deliver the reliability and resiliency necessary for mission-critical workloads. With these new capabilities, IT organizations are moving away from dedicated FC SAN infrastructures to Ethernet-based storage infrastructures, and protocols such as iSCSI.
SolidFire Element OS was specifically designed to leverage today’s high-bandwidth Ethernet network architectures and the iSCSI Ethernet storage protocol. SolidFire uses a 10GB iSCSI Ethernet network to connect multiple storage nodes and create a clustered storage array. HA within the overall storage solution is facilitated by a combination of physical connectivity redundancy, and SolidFire’s self-healing, Double Helix technology.
NetApp SolidFire SF9608 nodes are connected to the Cisco Nexus 93180YC-EX switches using vPCs, bonding two physical 10G ports on each SolidFire node into a single logical interface (Bond10G). These port channels carry all IP-based ingress and egress storage data traffic and cluster traffic for the NetApp SolidFire nodes.
Figure 18 Network and Storage Physical Connectivity Overview
By default, separate onboard 1G ports are utilized for in-band management traffic and API automation, however in-band management can be moved to the Bond10G interface if converged networking is desired. Below are the connections or network types used in this solution.
· In-Band Management network. A network used for the administration of nodes and the API endpoint
· Out-of-Band Management network. Network used to administer and monitor the SolidFire node hardware
· Storage network. A network used by clients to access storage via the iSCSI protocol
· Ports. Physical 10GbE Port 1 and Port 2 (identified in SolidFire Element OS as Eth0 and Eth1) are used for storage network. Physical 1GbE Port 1 and Port 2 (identified in SolidFire Element OS as Eth3 and Eth4) are used for in-band management network.
Storage Infrastructure Connectivity
The storage infrastructure design makes sure that there is no single point of failure in the storage path. SolidFire storage leverages the iSCSI storage protocol to deliver all storage traffic traverses the dedicated 10G switches for fast, reliable delivery of iSCSI storage traffic.
SolidFire Storage Node Connectivity
SolidFire nodes leverage dual-homed NIC connectivity for both storage and in-band management to provide redundancy, enhanced fault tolerance, and increased bandwidth for storage traffic:
· Storage (vPC)
· 10G link to storage switch N9K-A
· 10G link to storage switch N9K-B
· In-band management (Active/Passive Bond):
· 1 x 1G link to management switch N9K-A
· 1 x 1G link to management switch N9K-B
iSCSI Redirect and Storage Virtual IP
The highly-available storage virtual IP (SVIP) is the single point of access for all iSCSI traffic. The iSCSI redirect process is a key component of SolidFire’s highly available, scale out architecture. By leveraging iSCSI redirect, SolidFire can ensure that no single node is a point of failure or data loss.
The initiator sends the iSCSI login request to the volume via the cluster SVIP, which is logically hosted by the cluster master node.
SolidFire cluster master responds with an iSCSI redirect back to the initiator, directing it to the actual node that hosts the volume, and the initiator redirects its login request towards the designated node’s per-node storage IP (SIP), thus establishing a direct iSCSI connection to the individual node.
Figure 19 illustrates how iSCSI redirect process works with a SolidFire cluster.
Figure 19 iSCSI Redirect Process with SolidFire Cluster
By leveraging iSCSI redirect, SolidFire can maintain its linear scale-out ability while maintaining high availability. As additional storage nodes are added, network bandwidth is added in addition to storage capacity, allowing client access to scale along with capacity.
If a cluster node suffers a failure, iSCSI redirect is used to redirect the initiator to the specific node that hosts the initiator’s secondary copy of its volume, while transparently recreating yet another copy in the background, quickly restoring the volume to a fully redundant state.
In a typical controller-based storage system additional drive shelves can be added, but the controllers and their client network bandwidth will eventually become a bottleneck.
Routing between nodes within a cluster is not supported.
NetApp strongly recommends that you keep all nodes on the same switch to minimize latency.
The infra_boot account on SolidFire contains all the volumes associated with the base infrastructure. All the boot LUNs for the hosts (controllers, computes and Director) are associated with the infra_boot account. The NetApp SolidFire Cinder driver creates a new account corresponding to each project in OpenStack. Therefore, all the Cinder volumes associated with a certain tenant (project ) in OpenStack are associated with a corresponding account on SolidFire thereby ensuring end-end multi-tenancy.
Figure 20 NetApp SolidFire storage layout for OpenStack
Red Hat OpenStack Director deploys OpenStack services to hosts and maps it to the selected roles such as controller and compute hosts. Figure 21 outlines the network topology where networks are isolated in separate VLANs. Each network interface is configured with vNIC failover within Cisco UCS.
Figure 21 Cisco UCS and NetApp SolidFire OpenStack Network Layout Topology
Table 8 Network Details
PXE / Provisioning
Network for provisioning and configuring controller and compute hosts. Director uses this network to boot hosts using PXE, deploy, and configure hosts.
In bound management network for the compute and controller hosts.
Network carrying iSCSI traffic for boot LUNs and LUNs for Swift
Network carrying Cinder storage traffic
OpenStack Object Storage (swift) uses this network to synchronize data objects between participating replica nodes.
Tenant Private network
Network carrying tenant’s VM traffic and provides traffic isolation among tenants
Represents a public network for internet connectivity / Floating network
Network shared across other datacenter entities into OpenStack environment
Network carrying OpenStack management, cluster management, and administrative API traffic
The following subsections describes the NetApp SolidFire storage array's low latency all-flash storage for the components in this solution.
To enable stateless booting (no local drives required), the OpenStack nodes are booted from the boot LUNs provisioned from the NetApp SolidFire storage array. Volumes (60GB in size) are created with default QoS settings and are associated with the infrastructure account. All the controller and compute nodes connect to the respective target IQNs for these volumes and login using the cluster SVIP.
The NetApp SolidFire Cinder driver communicates with the SolidFire cluster using JSON-RPC on an HTTPS and TCP connection to the SolidFire cluster’s MVIP interface. The MVIP interface can move throughout the cluster in response to load and/or failure events.
The SolidFire Cinder driver can perform all volume manipulations, including create, delete, attach, detach, extend, snapshot, clone and others. The SolidFire driver also implements the Cinder functionality for QoS designations on volumes through the volume-type specifications. SolidFire can carry tenant information into the storage subsystem and create unique iSCSI CHAP credentials for each tenant. The SolidFire Cinder driver creates a new Account corresponding to each tenant / project in OpenStack and associates all the Cinder volumes (SolidFire volumes) with this account.
The volume UUID in SolidFire cluster corresponds to the Cinder volume UUID enabling easier troubleshooting.
SolidFire image caching is used to eliminate the copy of glance images to volumes every time a bootable volume is needed. SolidFire image caching is built into the OpenStack Cinder driver and is enabled by default.
The image cache, upon the first copy operation from Glance to a volume (if the property is set), will copy the image to a volume stored under the OpenStack administrative user, and then, using the lightweight cloning operation on the SolidFire cluster, clone a copy for the user who has requested the volume. On subsequent image to volume operations for the same image, the volume for the admin user will be checked to make sure it is still current, and if current, a clone operation will be performed to quickly present the next volume. Thereby, reducing the provisioning times for instances and enabling efficiency for the development and test environments.
Additional cluster SVIP with a different VLAN is created to segregate the infrastructure storage traffic and the Cinder storage traffic.
NetApp SolidFire storage serves as the storage medium for Swift. As part of this design, a 1TB SolidFire volume(associated with the infrastructure account) is attached to each of the Swift nodes to store the account, container, and object data. The default replica count of three makes sure that the storage is still available and accessible in the event of two Swift node failures. The inherent data-efficiency mechanisms of SolidFire nodes, like deduplication and compression, make sure that the actual storage utilization is optimized despite the three replica count. Furthermore, SolidFire Helix data protection makes sure that all the data is reconstructed regardless of drive or other component failures in the storage subsystem.
Scaling the Swift Storage
Swift cluster can be easily scaled by adding additional storage to each of the Swift controller nodes. Additional volumes are provisioned from the SolidFire array and attached to each of the Swift controller nodes to layout the account, container and the object data.
FlexPod is the optimal shared infrastructure foundation to deploy a variety of IT workloads. It is built on leading computing, networking, storage, and infrastructure software components. FlexPod SF with Red Hat OpenStack Platform 10 is a single platform built from unified computing, fabric, and storage technologies from Cisco and NetApp that is both flexible and scalable. FlexPod with Red Hat OpenStack Platform 10 validated design provides production grade OpenStack deployment with ease supported by industry leaders to meet the unique needs of your business. With this solution, we are responding to the increased customer demand for OpenStack on validated converged infrastructure.
To summarize, these are the major benefits of implementing FlexPod SF with Red Hat OpenStack Platform 10:
· Converged Infrastructure based on Cisco Unified Datacenter
· Investment protection in high density, flexible and high performance datacenter environments
· Non-disruptive scale up or out infrastructure
· Highly available and supported OpenStack platform on Red Hat optimized distribution
· End to end hardware level redundancy using Cisco UCS, Cisco Nexus switches, and NetApp high availability features
· Pre-validated design based on best practices to achieve timely, repeatable, and consistent deployments
Muhammad Afzal, Engineering Architect, Cisco UCS Solutions Engineering, Cisco Systems, Inc.
Muhammad Afzal is an Engineering Architect and Technical Marketing Engineer at Cisco Systems in Cisco UCS Product Management and Datacenter Solutions Engineering. He is currently responsible for designing, developing, and producing validated converged architectures while working collaboratively with product partners. Previously, Afzal had been a lead architect for various cloud and data center solutions in Solution Development Unit at Cisco. Prior to this, Afzal has been a Solutions Architect in Cisco’s Advanced Services group, where he worked closely with Cisco's large enterprise and service provider customers delivering data center and cloud solutions. Afzal holds an MBA in finance and a BS in computer engineering.
Amit Borulkar, Technical Marketing Engineer, NetApp, Inc.
Amit is a reference architect with Converged Infrastructure Solutions at NetApp. At NetApp, he focusses on OpenStack reference architectures on FlexPod, container orchestration frameworks like Kubernetes and infrastructure automation across the stack with tools like Puppet and Ansible. He also has a Master's degree in Computer Science from North Carolina State University. While at NC State, his research interests included distributed systems and cloud computing.
The authors of this CVD would like to thank the following for their valuable contribution to this document:
· Shanthi Kumar Adloori, Cisco Systems, Inc.
· Carol Bouchard, Cisco Systems, Inc.
· Sandhya Dasu, Cisco Systems, Inc.
· Christopher Reno, NetApp
· Mark Cates, NetApp
· Thomas Hanvey, NetApp
· Steven Reichard, Red Hat
· Guil Barros, Red Hat