Guest

Cisco UCS C-Series Rack Servers

Reference Architecture for 500-Seat Citrix XenApp 7.5 Deployment on Cisco UCS C240-M3 Rack Servers with On-Board SAS Storage and LSI Nytro MegaRAID Controller

  • Viewing Options

  • PDF (1.8 MB)
  • Feedback

Reference Architecture for 500-Seat Citrix XenApp 7.5 Deployment on Cisco UCS C240-M3 Rack Servers with On-Board SAS Storage and LSI Nytro MegaRAID Controller

June 2014

Image


Contents

1 Executive Summary............................................................................................................................................................... 3

2 Solution Overview and Components................................................................................................................................. 4

2.1 Reference Architecture Overview................................................................................................................................. 4

2.2 Cisco UCS Platform and C-Series Servers................................................................................................................ 5

2.3 LSI Nytro MegaRAID Controller.................................................................................................................................... 9

2.4 Citrix XenServer............................................................................................................................................................ 10

2.5 Citrix XenApp 7.5.......................................................................................................................................................... 10

3 Test Setup and Configurations......................................................................................................................................... 14

3.1 500-Seat Knowledge Worker/600-Seat Task Worker Test Configuration.......................................................... 14

3.2 Testing Methodology.................................................................................................................................................... 17

3.3 Testing Procedure......................................................................................................................................................... 20

4 Login VSI Test Results........................................................................................................................................................ 23

4.1 Results: Single-Server Recommended Maximum Density.................................................................................. 24

4.2 Cisco UCS C240-M3 Full Scale Results for Citrix XenApp 7.5............................................................................ 26

5 Conclusion............................................................................................................................................................................. 29

6 References............................................................................................................................................................................ 30

6.1 Cisco Reference Documents...................................................................................................................................... 30

6.2 Citrix Reference Documents....................................................................................................................................... 30


1 Executive Summary

One of the biggest barriers to entry for desktop virtualization (DV) is the capital expense for deploying to small offices and branch offices. For small and medium-size customers, currently deployment of a DV system for 500 users or fewer is cost prohibitive.

To overcome the entry point barrier, we have developed a self-contained DV solution that can host up to 500 Citrix XenApp 7.5 Hosted Shared Desktops (HSDs) on three managed Cisco UCS C240 M3 Rack Servers and provide system fault tolerance at both the server level and for the following required infrastructure virtual machines (VMs:)

Citrix XenServer 6.2 Hypervisors (3)

Microsoft Server 2012 R2 Infrastructure Virtual Machines (8)

Microsoft Active Directory Domain Controllers (2)

Microsoft SQL Server 2012 R2 (2)

Microsoft DFS File Servers for User Data and User Profiles (2) 1.6 TB

Citrix XenCenter 6.2 (1)

Citrix XenApp 7.5 Desktop Studio (2)

Citrix XenApp 7.5 RDS Virtual Machines (24)

The Cisco UCS components used to validate the configuration are:

Cisco UCS 6248UP 48-port Fabric Interconnects (2)

Cisco UCS Manager 2.2(1d) or later

Cisco UCS C240 M3 Rack Servers (3)

- Intel® Xeon® E5-2697 v2 12-core 2.7 GHz processors (2)

- 256 GB 1866 MHz DIMMs (16 x 16GB)

- Cisco UCS Virtual Interface Card (VIC) 1225

- LSI Nytro MegaRAID 200G Controller

- Cisco® 600-GB 10,000 RPM hot swappable SAS drives (12)

- Cisco 650-watt power supply (2)

Note: All of the infrastructure VMs were hosted on two of the three Cisco UCS C240 M3 Rack Servers. Each rack server hosted eight XenApp 7.5 HSD VMs.

We utilized the unique capabilities of the Nytro MegaRAID 200G controller cache to support our XenApp 7.5 Machine Creation Service (MCS) differencing disks. These disposable disks incur high IOPS during the lifecycle of the Hosted Shared Desktop sessions.

Configuration of the controller flash and SAS drives is accomplished through the Nytro MegaRAID BIOS Config Utility configuration wizard, which is accessed during the Cisco UCS C240 M3 rack server’s boot sequence by pressing the CTRL-H key sequence when the controller BIOS loads. (See Section 3, “Test Setup and Configuration” for details on the test configuration)

Our configuration provides excellent virtual desktop end-user experience for 500 Medium/Knowledge Worker Hosted Shared Desktop sessions as measured by our test tool, Login VSI, at a breakthrough price point with server, infrastructure and user file fault tolerance.

If your environment is primarily Light/Task Worker focused, we demonstrated support for over 300 Light/Task Worker HSD sessions per rack server as measured by our test tool, Login VSI. The solution configuration above, could comfortably host 600 Light/Task Worker HSD sessions with the same levels of fault tolerance described for 500 Medium/Knowledge Workers.

With options to use lower bin processors, such as the Intel Xeon E5-2680 v2, a lower entry price point can be achieved at a slightly lower XenDesktop virtual machine density.

As with any solution deployed to users with data storage requirements, a backup solution must be deployed to ensure the integrity of the user data. Such a solution is outside the scope of this paper.

The intended audience for this paper includes customer, partner, and integrator solution architects, professional services, IT managers, and others who are interested in deploying this reference architecture.

The paper describes the reference architecture and its components, and provides test results, best practice recommendations, and sizing guidelines where applicable. While the reference architecture can deploy seamless applications as well as HSDs, the test configuration validated the architecture with 500-seat HSD Knowledge Worker and 600-seat HSD Task Worker workloads.

2 Solution Overview and Components

2.1 Reference Architecture Overview

There are multiple approaches to application and desktop virtualization. The best method for any deployment depends on the specific business requirements and the types of tasks that the user population will typically perform. This reference architecture represents a straightforward and cost-effective strategy for implementing two virtualization models using XenApp—Streamed Applications and Hosted Shared Desktops—which are defined as follows:

Streamed Applications: Streamed desktops and applications run entirely on the user‘s local client device and are sent from the XenApp server on demand. The user interacts with the application or desktop directly but desktop and application resources are only available while connected to the network.

Hosted Shared Desktops: A hosted, server-based desktop is a desktop where the user interacts through adelivery protocol. With Hosted Shared Desktops, multiple users simultaneously share a single installed instance of a server operating system, such as Microsoft Windows Server 2012. Each user receives a desktop “session” and works in an isolated memory space. Session virtualization leverages server-sideprocessing.

For a more detailed introduction to the different XenApp deployment models, see http://www.citrix.com/products/xenapp/how-it-works/application-virtualization.html.

In this reference architecture (Figure 1,) the combination of Citrix and Cisco technologies transforms the delivery of Microsoft Windows apps and desktops into cost-effective, highly secure services that users can access on any device, anywhere. The solution strives to reduce complexity in design and implementation, enabling a simple, centrally managed virtualization infrastructure.

The managed Cisco UCS C240M3 design provides single-wire connectivity and leverages Cisco’s UCS Service Profiles to ensure that both ongoing maintenance and capacity expansion are seamless and simplified.

Figure 1. Citrix XenApp 7.5 on Managed Cisco UCS C240 M3 500-User Solution Architecture

The Cisco and Citrix components are described in more detail in the following pages.

2.2 Cisco UCS Platform and C-Series Servers

The Cisco Unified Computing System is the first converged data center platform that combines industry-standard, x86-architecture servers with networking and storage access into a single converged system. The system is entirely programmable using unified, model-based management to simplify and speed deployment of enterprise-class applications and services.

The Cisco UCS C-Series Rack Servers extend Cisco UCS innovations—including a standards-based unified network fabric, Cisco VN-Link virtualization support, and Cisco Extended Memory Technology—to an industry standard rack form factor. Figure 1 shows the components integrated in a Cisco UCS platform.

Figure 2. The Cisco UCS Platform and Components

Cisco UCS C-Series servers can be deployed either as standalone servers or as part of a full Cisco Unified Computing System. Organizations can deploy servers incrementally—using as many or as few as needed—on a schedule that meets their timing and budget. In addition, the Cisco UCS “wire-once” approach simplifies the ability to change I/O configurations without needing to install adapters or re-cable racks and switches.

2.2.1 Cisco UCS Fabric Interconnects

The Cisco UCS 6200 Series Fabric Interconnects are a core part of the Cisco Unified Computing System, providing both network connectivity and management capabilities for the system. The Cisco UCS 6200 Series offers line-rate, low-latency, lossless 10 Gigabit Ethernet, Fibre Channel over Ethernet (FCoE), and Fibre Channel functions.

The Cisco UCS 6200 Series provides the management and communication backbone for the Cisco UCS B-Series Blade Servers, Cisco UCS C-Series Rack Servers and 5100 Series Blade Server Chassis. All chassis, and therefore all blades, and all Rack Servers attached to the Cisco UCS 6200 Series Fabric Interconnects become part of a single, highly available management domain. In addition, by supporting unified fabric, the Cisco UCS 6200 Series provides both the LAN and SAN connectivity for all servers within its domain.

From a networking perspective, the Cisco UCS 6200 Series uses a cut-through architecture, supporting deterministic, low-latency, line-rate 10 Gigabit Ethernet on all ports, switching capacity of 2 terabits (Tb), and 320 Gbps bandwidth per chassis, independent of packet size and enabled services. The product family supports Cisco® low-latency, lossless 10 Gigabit Ethernet[1] unified network fabric capabilities, which increase the reliability, efficiency, and scalability of Ethernet networks. The fabric interconnect supports multiple traffic classes over a lossless Ethernet fabric from the blade through the interconnect. Significant TCO savings come from an FCoE optimized server design in which network interface cards (NICs), host bus adapters (HBAs), cables, and switches can be consolidated.

The Cisco UCS 6200 Series hosts and runs Cisco UCS Manager in a highly available configuration, enabling thefabric interconnects to fully manage all Cisco UCS elements. Connectivity to the Cisco UCS 5100 Series bladechassis is maintained through the Cisco UCS 2100 or 2200 Series Fabric Extenders in each blade chassis.Connectivity to the Cisco UCS C-Series Rack Servers is maintained by Cisco VIC 1225 adapters. The Cisco UCS6200 Series interconnects support out-of-band management through a dedicated 10/100/1000-Mbps Ethernet management port as well as in-band management. Cisco UCS Manager typically is deployed in a clustered active-passive configuration on redundant fabric interconnects connected through dual 10/100/1000 Ethernet clusteringports.

Cisco UCS 6248UP 48-Port Fabric Interconnect

The Cisco UCS 6248UP 48-Port Fabric Interconnect is a 1 RU, 10-GE, Cisco Data Center Ethernet, FCoE interconnect providing more than 1Tbps throughput with low latency. It has 32 fixed ports of Fibre Channel, 10-GE, Cisco Data Center Ethernet, and FCoE SFP+ ports.

One expansion module slot can be up to sixteen additional ports of Fibre Channel, 10-GE, Cisco Data Center Ethernet, and FCoE SFP+.

Cisco UCS 6248UP 48-Port Fabric Interconnects were used in this study.

2.2.2 Cisco UCS C240 M3 Rack Server

The Cisco UCS C240 M3 Rack Server (Figure 2) is designed for both performance and expandability for a wide range of storage-intensive infrastructure workloads. The server broadens the Cisco UCS portfolio, supplying a 2RU form factor that houses up to two Intel Xeon processor E5-2600 or E5-2600 v2 processors, 24 DIMM slots, 24 disk drives, and four 1 Gigabit Ethernet LAN-on-motherboard (LOM) ports.

Figure 3. Cisco UCS C240 M3 Rack Server

The Cisco UCS C-Series servers are available with the latest Intel Xeon processor families, keeping pace with recent Intel processor innovations. The Intel Xeon E5-2600 and E5-2600 v2 processors, for example, ramp frequency and add security and availability features. With the increased performance provided by these processors, the Cisco UCS C-Series servers offer an improved price/performance ratio that optimizes this solution’s cost-effectiveness.

The server's local disk configuration delivers balanced performance and expandability to meet workload requirements. Supporting up to 12 LFF (Large Form Factor) or 24 SFF (Small Form Factor) internal drives, the Cisco UCS C240 M3 can be configured with optional 10,000-RPM and 15,000-RPM SAS drives to deliver a high rate of I/O operations. High-capacity SATA drives provide economical, large-capacity storage. Or, for workloads that demand extremely fast access to small amounts of data, SSDs are a third option for on-board storage. A variety of RAID controller options are available to increase disk performance and reliability. In addition to local storage options, shared storage solutions are available from Cisco partners.

Many organizations are attracted to racked servers like the Cisco UCS C-Series because of the wide range of I/O options available as PCI Express (PCIe) adapters. A broad spectrum of PCIe options are available for the C-Series, including interfaces supported directly by Cisco as well as adapters from third parties.

2.2.3 Cisco VIC 1225

The Cisco UCS C240 M3 server interfaces with the Cisco UCS Virtual Interface Card (VIC) 1225 (Figure 3). TheCisco VIC is a dual-port Enhanced Small Form-Factor Pluggable (SFP+) 10 Gigabit Ethernet and Fibre Channel over Ethernet (FCoE)-capable PCI Express (PCIe) card designed exclusively for Cisco UCS C-Series Rack Servers.

Figure 4. Cisco UCS VIC 1225 CAN

The Cisco UCS VIC 1225 incorporates next-generation converged network adapter (CNA) technology. It enables a policy-based, stateless, agile server infrastructure that can present up to 256 PCIe standards-compliant interfaces to the host that can be dynamically configured as either network interface cards (NICs) or host bus adapters (HBAs). In addition, the Cisco UCS VIC 1225 supports Cisco Data Center Virtual Machine Fabric Extender (VM-FEX) technology, which extends the Cisco UCS fabric interconnect ports to virtual machines, simplifying server virtualization deployment.

Figure 5. Cisco UCS VIC 1225 CNA Architecture.

2.3 LSI Nytro MegaRAID Controller

Cisco UCS C-Series Rack Servers offer various raid/caching configuration options, including an on-board controller on the Cisco UCS C-Series servers. To enhance I/O performance (especially for virtualization workloads as in this reference architecture), Cisco also offers the LSI Nytro MegaRAID SAS controller as an option.

LSI Nytro MegaRAID cards accelerate application performance by removing I/O bottlenecks. Each card features flash memory to cache read and write I/O operations to local direct-attached storage (DAS). The cards combine on-board flash memory with RAID data protection and hard disk drive (HDD) connectivity, creating a tiered approach to storage. With intelligent caching of frequently used data to on-board flash, the cards combine the low-latency benefits of flash with the cost and capacity benefits of HDDs. (For more information, see http://www.lsi.com/products/flash-accelerators/pages/default.aspx#tab/product-family-tab-2).

For this reference architecture, the Cisco UCS C-Series servers were configured with LSI Nytro MegaRAID Controller NMR 8110-4i cards that feature 200GB of onboard eMLC flash. The MegaRAID controller enables a low latency, high performance, caching and data protection solution for direct-attached storage (DAS). It supports up to 128 SATA and/or SAS hard drives with data transfer rates of up to 6Gb/s per port. The card complies with the PCI Express 3.0 x8 specification for high-bandwidth applications, and all RAID levels are supported.

Figure 6. LSI Nytro MegaRAID Controller 8110-4i

In this reference architecture, the Nytro MegaRAID controller cache was used to support I/O operations for Machine Creation Service (MCS) differencing disks that XenApp 7.5 uses for app and desktop provisioning. MCS assigns each virtual desktop a dedicated virtual hard disk (vDisk), essentially a write cache, where any delta changes (writes) to the default shared desktop image are recorded. A vDisk is thinly provisioned and attached to each new virtual desktop. During the lifecycle of the XenApp Hosted Shared Desktop, these non-persistent disks incur high IOPS. Using the Nytro MegaRAID controller as a fast cache helps to accelerate IOPs to these vDisks.

2.4 Citrix XenServer

For this reference architecture, the hypervisor selected was Citrix® XenServer® because it provides a highly reliable, cost-effective, and secure open source virtualization platform that offers near-native performance and best-in-class VM density. XenServer provides a rich set of management and automation capabilities, a simple and affordable pricing model, and optimizations for virtual desktop and cloud computing, making it ideal for this reference architecture.

Citrix XenServer is built on the open source Xen® hypervisor technology, widely acknowledged as one of the fastest and most secure virtualization server technologies in the industry. Citrix XenCenter, a Windows-based graphical management console, supports efficient management of the XenServer environment, making it easy to deploy, manage, and monitor virtual machines.

XenServer installation is fast because of an intuitive, wizard-driven installation utility. It’s possible to install XenServer and begin virtualizing workloads in as little as ten minutes. More information on Citrix XenServer is available on the product web site: http://www.citrix.com/products/xenserver/overview.html.

2.5 Citrix XenApp 7.5

The Citrix XenApp 7.5 release delivers these benefits:

Mobilizing Microsoft Windows application delivery, bringing thousands of corporate applications to mobile devices with a native-touch experience and high performance

Reducing costs with simplified and centralized management and automated operations

Securing data by centralizing information and effectively controlling access

With single sign-on and seamless session roaming, XenApp helps IT provide optimal freedom for users without restricting Bring Your Own (BYO) device programs to using solely Windows-based devices.

Figure 7. What’s new in Citrix XenApp 7.5

2.5.1.1 Simplified Deployment and Management

XenApp 7.5 features an easy-to-use Studio console that makes it easy to deploy virtual apps and desktops. XenApp 7.5 supplies a unified set of management consoles that simplify and automate even large-scale deployments, including multi-site and hybrid-cloud implementations. EdgeSight provides powerful monitoring and advanced analytics, and the DirectorTM helpdesk console makes it easy to customize and configure EdgeSight reports, trend analyses, and alerts.

Machine Creation Services (MCS) uses a master image within the environment to manage virtual machines, simplifying target device updates since it deploys a single master image. MCS creates virtual servers and desktops from a master image on demand. Using MCS optimizes storage utilization and provides a pristine virtual machine to users each time they log on. MCS is fully integrated and administrated in Citrix Studio.

2.5.1.2 Hybrid Cloud Provisioning

XenApp 7.5 enables Hybrid Cloud Provisioning, allowing deployments to seamlessly flex or grow from the datacenter to Amazon Web Services (AWS) or Citrix CloudPlatform-based public or private cloud services.

2.5.1.3 High-Definition User Experience (HDX) Technology

High-Definition User Experience (HDX) technology in this release is optimized to improve the user experience for hosted Windows apps on mobile devices. Specific enhancements include:

HDX Mobile technology, designed to cope with the variability and packet loss inherent in today’s mobile networks. HDX technology supports deep compression and redirection, taking advantage of advanced codec acceleration and an industry-leading H.264-based compression algorithm. The technology enables dramatic improvements in frame rates while requiring significantly less bandwidth. HDX technology offers users a rich multimedia experience and optimized performance for voice and video collaboration.

HDX Touch technology enables mobile navigation capabilities similar to native apps, without rewrites or porting of existing Windows applications. Optimizations support native menu controls, multi-touch gestures, and intelligent sensing of text-entry fields, providing a native application look and feel.

HDX 3D Pro uses advanced server-side GPU resources for compression and rendering of the latest OpenGL and DirectX professional graphics apps. GPU support includes both dedicated user and shared user workloads.

The Citrix XenApp 7.5 release is built on the same FlexCast Management Architecture as Citrix XenDesktop 7.5, and features the same built-in configuration, provisioning, and management tools with cloud-style automation and scalability. Starting with the Citrix XenDesktop 7 release, Citrix unified the XenDesktop and XenApp software releases, allowing IT organizations to deploy XenDesktop to deliver a mix of full virtualized desktops (VDI) or hosted shared desktops and applications based on Remote Desktop Services (RDS). Recognizing that some companies do not deploy VDI desktops at all—relying instead on XenApp and RDS technologies—Citrix has conveniently packaged and released XenApp 7.5 as a separate product for virtual application and desktop sessionhosting.

2.5.2 Citrix Design Fundamentals

Citrix XenApp 7.5 includes significant enhancements to help customers deliver Windows apps and desktops as mobile services while reducing management complexity and minimizing the cost of delivering applications to large groups of users. Enhancements in this release include:

The latest generation unified FlexCast Management Architecture.

New and improved management interfaces. Like XenDesktop 7 and later releases, XenApp 7.5 includes two new purpose-built management consoles—one for automating workload provisioning and app publishing and the second for real-time monitoring of the infrastructure.

Enhanced HDX technologies. Since mobile technologies and devices are increasingly prevalent, Citrix has engineered new and improved HDX technologies to improve the user experience for hosted Windows apps and desktops delivered on laptops, tablets, and smartphones.

Unified App Store. The XenApp 7.5release includes a self-service Windows app store, Citrix StoreFront, which provides a single, simple, and consistent aggregation point for user services. IT can publish apps, desktops, and data services to the StoreFront, from which users can search and subscribe to services.

Built-in provisioning through Machine Creation Services (MCS). XenApp 7.5 enables XenApp server provisioning using MCS, which has been used in XenDesktop deployments for many years.

2.5.2.1 FlexCast Management Architecture

Figure 8 shows an overview of the unified FlexCast Management Architecture and its underlying components, which are also described below.

Figure 8. FlexCast Management Architecture used by XenApp 7.5 and XenDesktop 7 and above

http://support.citrix.com/proddocs/topic/xendesktop-71/cds-overview-xd-components.png

Citrix Receiver. Running on user endpoints, Receiver provides users with self-service access to XenApp resources. Receiver combines ease of deployment and use, supplying fast, secure access to hosted applications, desktops, and data. Receiver also provides on-demand access to Windows, Web, and Software-as-a-Service (SaaS) applications. Using Citrix Receiver, users have a device-native experience on endpoints, regardless of the device type: Windows, Mac, Linux, iOS, Android, ChromeOS, or Blackberry.

Citrix StoreFront. StoreFront authenticates users and manages catalogs of desktops and applications. Users can search StoreFront catalogs and subscribe to published services via Citrix Receiver. StoreFront replaces Web Interface.

Citrix Studio. Using the new and improved Studio interface instead of the Delivery Services Console, administrators can easily configure and manage XenApp deployments. Studio provides wizards to guide the process of setting up an environment, creating application delivery groups, shared desktops, and assigning resources to users, automating provisioning and application publishing. Administrative tasks can also be customized and delegated to match site operational requirements.

Delivery Controller. The Delivery Controller is responsible for distributing applications and desktops, managing user access, and optimizing connections to applications. Each site has one or more deliverycontrollers.

Server OS Machines. These are virtual or physical machines (based on a Windows Server operating system) that deliver RDS applications or hosted shared desktops to users.

Desktop OS Machines. Deployed through XenDesktop (and not XenApp), these are virtual or physical machines (based on a Windows Desktop operating system) that deliver personalized VDI desktops that run on a desktop operating system.

Remote PC. XenDesktop with Remote PC allows IT to centrally deploy secure remote access to all Windows PCs on the corporate network. It is a comprehensive solution that delivers fast, secure remote access to all the corporate apps and data on an office PC from any device.

Virtual Delivery Agent. A Virtual Delivery Agent is installed on each virtual or physical machine (within the server or desktop OS) and manages each user connection for application and desktop services. The agent allows OS machines to register with the Delivery Controllers and governs the HDX connection between these machines and Citrix Receiver.

Citrix Director. Citrix Director is a powerful administrative tool that helps administrators quickly troubleshoot and resolve issues. It supports real-time assessment, site health and performance metrics, and end user experience monitoring. Citrix EdgeSight® reports are available from within the Director console and provide historical trending and correlation for capacity planning and service level assurance.

3 Test Setup and Configurations

3.1 500-Seat Knowledge Worker/600-Seat Task Worker Test Configuration

Figure 9 shows the test configuration built for the solution.

Figure 9. Test Configuration

The configuration included these components:

Hardware components

- 3 x Cisco UCS C240-M3 (dual Intel Xeon E5-2697v2 Processors @ 2.7 GHz) Rack Servers with 256GB of memory (16 x 16GB 1866 MHz DIMMs) and 1 Cisco VIC1225 Converged Network Adapter

- 1 x LSI Nytro MegaRAID Controller NMR 8110-4i card per UCS C240-M3 Server

- 12 x 600-GB 10,000 RPM hot swappable hard disk drives

- 2 x Cisco 6248UP Fabric Interconnects

- Load Generators: 2 X Cisco UCS B250-M2 (Intel Xeon 5680 Processors@ 3.333 GHz) blade servers with 192 GB of memory (4 GB X 48 DIMMS @ 1333 MHz) 1 X M81KR (Palo) Converged Network Adapter (Optional—Not required for the solution)

Software components

- Cisco UCS firmware 2.2(1d)

- Citrix XenApp 7.5

- 24 Microsoft Windows Server 2012 R2, 64-bit Remote Desktop Services, 5vCPU, 24GB of dynamicmemory

3.1.1 Controller Cache, Policies, and Setup

The LSI Nytro MegaRAID Controller NMR 8110-4i incorporates 200GB eMLC memory capacity and 1Gb 1333MHz DDR3 SDRAM memory for RAID cache assist. It provides a controller cache in read/write/caching versions, which has an extra protection against power failure through an optional Battery Backed Unit (BBU).

The LSI Nytro MegaRAID BIOS configuration wizard configures the controller flash and its caching and I/O policies. The wizard is accessed during the server boot sequence by pressing the Ctrl-H key sequence when the controller BIOS loads. Documentation for the configuration utility is available at: http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c/sw/raid/configuration/guide/RAID_GUIDE/MegaRAID.html. The wizard defines the read, caching, and write policies as follows:

3.1.1.1 Read Policies

Adaptive Read-Ahead—This policy specifies that the controller uses read-ahead if the two most recent disk accesses occurred in sequential sectors. If all read requests are random, the algorithm reverts to No Read-Ahead, but all requests are still evaluated for possible sequential operation. Data that is read-ahead of the current request is kept in the controller cache.

Read-Ahead—The controller reads ahead all the data until the end of the stripe from the disk.

Normal—Only the requested data is read and the controller does not read-ahead any data.

3.1.1.2 Caching Policies

Direct IO—All read data is transferred directly to host memory bypassing the RAID controller cache. Any read-ahead data is cached. All write data is transferred directly from host memory, bypassing the RAID controller cache if Write-Through cache mode is set.

Cached IO—All read and write data passes through the controller cache memory on its way to or from host memory, including write data in Write-Through mode.

Note: Recommended that Caching Policies are disabled.

3.1.1.3 Write Policies

Write-Through—Write-Through is a caching strategy where data is written to disks before a completion status is returned to the host operating system. It is considered more secure since a power failure will be less likely to cause undetected drive write data loss with no battery-backed cache present. It is recommended to use Write-Through for RAID 0, 1, and 10 to provide optimum performance for streaming/sequential access workloads since data is moved directly from the host to the disks and the controller avoids copying the data into cache as an intermediary step, improving overall performance for streaming workloads when Direct IO mode is set. (This policy is recommended when using Nytro RAIDCache.)

Write-Back—Write-Back is a caching strategy where write operations result in a completion status being sent to the host operating system as soon as data is in written to the RAID cache. Data is written to the disk when it is forced out of controller cache memory. Write-Back is more efficient if the temporal and/or spatial locality of the requests is smaller than the controller cache size. Write-Back is also more efficient in environments with “bursty” write activity. Battery-backed cache can be used to protect against data loss as a result of a power failure or system crash. Write-Back is recommended for RAID 0, 1, and 10 because it provides optimum performance for transactional (random real world) benchmarks. Write-Back is also recommended for RAID 5 and 6 because it can improve performance of RAID-5 and 6 data redundancy generation. (Note: This policy is recommended when using Nytro Flash Cache.)

Disk Cache—Enabling disk cache increases throughput/performance for write operations and access. It is always recommended to supply continuous power supply for hard disks by a UPS upstream. If the system is UPS-protected, then enabling disk cache for performance reasons is recommended. In the event there is no power supply to these disks, there is a risk that important data that has not been written from disk cache to hard disk can be lost if a power failure occurs.

3.1.2 Server Storage Volume Configuration

A differentiator for this solution is the strategic use of flash on the LSI Nytro MegaRAID Controller 8110-4i to accelerate I/O to the local storage, which consisted of 12 600-GB 10K RPM SAS drives. To configure the controller card flash and the SAS drives for the solution, it was necessary to first perform three high-level operations:

Creating Drive Groups.

Adding Drive Groups to Spans.

Creating Virtual Drives, and setting the RAID Level, controller settings, and size. (Note that multiple virtual drives can be configured on a single drive group.)

Drive Groups were configured as follows to support infrastructure, user data, and linked clone disks needed for 200 floating assignment linked clones:

Table 1. Physical Drive Group Configuration

Drive Groups

RAID Configuration

Physical Drives

Purpose

N/A

0

Backplane

Nytro MegaRAID NytroCache

0

5

0-3

Boot/Infrastructure Volumes

1

0

4-7

1st Group for RAID 10 Volume

2

0

8-11

2nd Group for RAID 10 Volume

Drive Groups were added to Spans as shown in Table 2:

Table 2. Spans

Drive Group

Span

0

Nytro MegaRAID Cache Span

1

Boot/Infrastructure Span

2

Floating Assignment Linked Clone Span

3

Floating Assignment Linked Clone Span

We then created three Virtual Drives from the Drive Groups and Spans created above. Only Virtual Drive 2 utilizes the Nytro MegaRAID NytroCache for XenApp 7.5 MCS disks.

Drive Groups

RAID Configuration

Virtual Drive

Capacity

Purpose

0

5

0

20 GB

Boot

0

5

1

1.6 TB

Infrastructure/User Files

1, 2

10

2

2.16 TB

XenDesktop 7.1 MCS disks

The final configuration looks like the screen shot in Figure 10 in the NytroMegaRAID Bios Configuration Utility:

Figure 10. Final Configuration

3.2 Testing Methodology

All validation testing was conducted on-site in the Cisco Performance Solutions Labs in San Jose, CA with support from Citrix. While XenApp supports both seamless application delivery as well as hosted shared desktops, validation and testing focused on a Hosted Shared Desktop (HSD) workload.

Performance metrics were evaluated during the entire workload lifecycle—XenApp virtual machine boot-up, user logon and virtual desktop acquisition (ramp-up,) user workload execution (steady state), and user logoff. Test metrics were analyzed from the hypervisor, virtual desktop, storage, and load generation software to assess the overall success of an individual test cycle. Each test cycle was not considered passing unless all of the planned test users completed the ramp-up and steady state phases (described below) and unless all metrics were within permissible thresholds.

Three successfully completed test cycles were conducted for each test phase and the results were found to be relatively consistent from one test to the next. Multiple test phases—both single-server and full scale tests—wereconducted.

3.2.1 Load Generation and User Workload Simulation with Login VSI

Within the test environment, load generators were used to put demand on the system, simulating multiple users accessing the XenApp 7.5 environment and executing a typical end-user workflow. To generate load, Login VSI 3.7 (from Login VSI) was used to generate the end user connection to the XenApp 7.5 environment, provide unique user credentials to the Citrix StoreFront server, initiate the workload, and evaluate the end-user experience.

Login VSI measures in-session response time, providing an objective way to validate the expected user experience, even during periods of peak resource demand such as a login storm.

Login VSI is the industry-standard load testing tool for benchmarking Server Based Computing (SBC) and Virtual Desktop Infrastructure (VDI) environments. It is completely platform and protocol independent and hence allows customers to easily replicate testing results in their own environments. Login VSI calculates an index (known as Login VSImax) based on the number of simultaneous sessions that can be run on a single machine before performance degrades beyond an acceptable level. Additional information is available at http://www.loginvsi.com.

3.2.2 Login VSI 3.7

Login VSI 3.7 tracks user experience statistics, looping through specific operations and measuring response times at regular intervals. Response times are used to determine Login VSImax, the maximum number of users that the test environment can support before performance degrades consistently. For validation of this reference architecture, we used both Login VSI 3.7 Light and Medium workloads.

3.2.2.1 Medium Workload

The Login VSI medium workload simulates a knowledge worker that runs generic applications such as Microsoft Office, Internet Explorer (including a Flash video applet), printing, and Adobe Acrobat Reader. Like real users, the medium workload leaves multiple applications open at the same time.

Once a session has been started, the medium workload repeats every 12 minutes. During each loop the response time is measured every 2 minutes. The medium workload opens up to 5 apps simultaneously, and approximately 2 minutes of idle time is included to simulate real-world users. The type rate is 160 ms for each character.

Each loop opens and runs these applications:

Outlook 2007/2010: Browsing 10 messages.

Internet Explorer: One instance is left open (BBC.co.uk), one instance is browsed to Wired.com, Lonelyplanet.com, and heavy flash app gettheglass.com.

Word 2007/2010: One instance is opened to measure response time; one instance is opened to review and edit a document.

Bullzip PDF Printer and Acrobat Reader: The Word document is printed and reviewed to PDF.

Excel 2007/2010: A very large randomized sheet is opened.

PowerPoint 2007/2010: A presentation is reviewed and edited.

7-zip: Using the command line version, the output of the session is zipped.

Figure 11 shows a graphical representation of the medium workload.

Figure 11. Login VSI Medium Workload Timing Chart

3.2.2.2 Light Workload

The light workload (Figure 12) runs fewer applications and in most cases closes them right after using them, resulting in lower memory and CPU consumption. The light workload emulates a task worker (such as a call center employee) running Microsoft Internet Explorer, Word, and Outlook. Only two applications are opened simultaneously, and the total idle time is about 1 minute 45 seconds.

Figure 12. Login VSI Medium Workload Timing Chart

You can obtain additional information on Login VSI testing and workloads from http://www.Login VSI.com.

3.3 Testing Procedure

3.3.1 Test Run Setup

The test run setup for both the single-server and multi-server tests was the same. Setup involved the followingsteps:

1. Delivery Group(s) placed in maintenance mode

2. Reboot XenApp Delivery Controller(s)

3. Reboot Citrix StoreFront Server(s)

4. Reboot XenServer Workload Host(s)

5. Reboot Login VSI Launchers

Rebooting the Citrix infrastructure components and the XenApp workload hosts puts the environment in a known, clean state for each test run.

3.3.2 Test Run Protocol

To simulate real-world environments, Cisco requires that the log-on and start-work sequence, known as Ramp Up, must be completed in 30 minutes. Additionally, Cisco requires that all launched sessions must become active within 2 minutes of when the last session is launched.


To begin the testing, we started performance monitoring scripts to record resource consumption for the hypervisor, virtual desktop, storage, and load generation software. At the beginning of each test run, we took the desktops out of maintenance mode, started the virtual machines, and waited for them to register. The Login VSI launchers initiated the desktop sessions and began user logins, which constitutes the ramp-up phase. Once all users were logged in, the steady state portion of the test began in which Login VSI executed the application workload.

The following protocol was used for each test cycle:

1. Start logging on XenServer host servers.

2. Take the HSD Delivery Group out of maintenance mode, start the HSD virtual machines, and then wait for them to register (approximately 5 minutes).

3. Wait an additional 5 minutes for the hypervisor to settle down.

4. Start the Login VSI 3.7 test configured with an 1800-second parallel launching window and 900-second auto-logoff for all sessions. For the single-server tests 14 launchers were used.

5. Wait and verify that all sessions have become active (approximately 30 minutes).

6. Wait for the auto-logoff timer to expire (approximately 15 minutes), which is set to allow the sessions to complete at least one full loop, at which point Login VSI places the logoff.txt file on the VSI Share to initiatelogoff.

7. Wait until all sessions have logged off (approximately 15 minutes).

8. Stop logging.

3.3.3 Success Criteria

Multiple metrics were evaluated during each test run, but the success criteria for considering a single test run aspass or fail was based on the end user experience metrics captured by the Login VSI 3.7 testing. Login VSImaxevaluates the user response time during increasing user load and assesses the successful start-to-finish execution of all the initiated desktop sessions and applications. It provides a server, storage and network stress test, using end user response times as the indicator for when performance becomes unacceptable. We use that measure asour starting point for identifying our maximum recommended workload (virtual desktops or hosted shared desktops.)

3.3.4 Calculating Login VSImax

VSI Max represents the maximum number of users the environment can handle before serious degradation of theend user experience occurs. It requires the systems under test to be stressed past the point of normal operating parameters. VSImax is calculated based on the response times of individual users as recorded during the workloadexecution. The maximum user response time has a threshold of 4000ms and all users response times are expected to be less than 4000ms in order to assume that the user interaction with the virtual desktop is ata functional level. VSI Max is reached when the response times reaches or exceeds 4000ms for six consecutive occurrences. If VSImax is reached, that indicates the point at which the user experience has significantly degraded. The response time is generally an indicator of the host CPU resources, but this specific method of analyzing the user experience provides an objective method of comparison that can be aligned to host CPU performance.


Calculating VSImax

Typically the desktop workload is scripted in a 12-14 minute loop when a simulated Login VSI user is logged on. After the loop is finished it will restart automatically. Within each loop the response times of seven specific operations is measured in a regular interval: six times in within each loop. The response times if these seven operations are used to establish VSImax.

These operations measured with Login VSI hit considerably different subsystems such as CPU (user and kernel), memory, disk, the OS in general, the application itself, print, Graphics Device Interface (GDI,) etc. These operations are specifically short by nature. When such operations are consistently long, it indicates the system is saturated because of excessive queuing on any kind of resource. As a result, the average response times will escalate. This effect is clearly visible to end-users. When such operations consistently consume multiple seconds, the user will regard the system as slow and unresponsive.

For these tests, Cisco used the VSImax Dynamic model exclusively.

VSImax Dynamic

Because baseline response times can vary depending on the virtualization technology used, using a dynamically calculated threshold based on weighted measurements provides greater accuracy for cross-vendor comparisons. For this reason, we configured the Login VSI software to calculate and report a VSImax Dynamic response time.

VSImax Dynamic is calculated when the response times are consistently above a dynamically calculated threshold. Five individual measurements are weighted to support this approach:

Copy new doc from the document pool in the home drive: 100%

Microsoft Word with a document: 33.3%

Starting the “File Open” dialogue: 100%

Starting “Notepad”: 300%

Starting the “Print” dialogue: 200%

Starting the “Search and Replace” dialogue: 400%

Compress the document into a zip file with 7-zip command line 200%

A sample of the VSImax Dynamic response time calculation is displayed below:

Figure 13. VSImax Dynamic Results

Login VSImax Dynamic Response Time

The average VSImax response time is calculated based on the amount of active Login VSI users logged on to the system. For VSImax value to be reached, the average VSImax response times need to consistently higher than a dynamically calculated threshold.

To determine this dynamic threshold, first the average baseline response time is calculated. This is done by averaging the baseline response time of the first 15 Login VSI users on the system.

The formula for the dynamic threshold is: Avg. Baseline Response Time x 125% + 3000. As a result, when the baseline response time is 1800, the VSImax threshold will now be 1800 x 125% + 3000 = 5250ms.

Especially when application virtualization is used, the baseline response time can vary widely per vendor and streaming strategy. Using the VSImax Dynamic model provides a more level playing field when comparing application virtualization or anti-virus applications. The resulting VSImax Dynamic scores are aligned again with saturation on a CPU, Memory or Disk level when the baseline response time are relatively high.

Determining VSImax

The Login VSI analyzer will automatically identify the “VSImax”. In the example below the VSImax is 98. The analyzer will automatically determine “stuck sessions” and correct the final VSImax score.

Vertical axis: Response Time in milliseconds

Horizontal axis: Total Active Sessions

Figure 14. Sample Login VSI Analyzer Graphic Output

Login VSI Chart

Red line: Maximum Response (worst response time of an individual measurement within a single session)

Orange line: Average Response Time within for each level of active sessions

Blue line: the VSImax average.

Green line: Minimum Response (best response time of an individual measurement within a single session)

For a test to be considered a success, the total number of users in the test run had to login, become active and run at least one test loop and log out without reaching the VSImax.

4 Login VSI Test Results

This section provides the validation results of the Login VSI testing within the environment for the configurations of single-server and multiple-server configurations. We conducted three test runs during each test cycle to verify the consistency of our results. The test phases included the following:

1. Determine the recommended maximum density. This phase validated single-server scalability under a maximum recommended density with the RDS load. The maximum recommended load for a single server occurred when CPU or memory utilization peaked at 90-95%, while the end user response times remained below 4000ms. This testing is used to determine the server N+1 count for the solution.

2. Validate solution at full scale. This phase validated multiple server scalability on the set of servers in theconfiguration.

The recommended density phase on a single server was executed with the Medium and then the Light Login VSI workloads independently. The full scale phase was executed for the Medium workload only.

4.1 Results: Single-Server Recommended Maximum Density

The first test sequence for each workload was to determine the VSImax v3.7 value for a single blade running Hosted Shared Desktop sessions on Windows Server 2012 with XenApp 7.5. We initially tested different combinations of number of XenApp 7.5 server VMs and virtual CPU (vCPU) combinations assigned to those servers, determining the maximum density in each case.

The best performance was achieved when the number of vCPUs assigned to the VMs did not exceed the number of hyper-threaded cores available on the server. In other words, not overcommitting CPU resources provided the best user experience. For the Intel E5-2697v2 processors, this means that 24 cores with hyper-threading enabled provides 48 vCPUs. The highest density was observed at eight VMs per Cisco UCS C240 M3 Rack Server, each with five vCPUs and 24GB RAM assigned.

We used the Login VSImax values determined for the medium and light workloads, which correspond to Knowledge Worker and Task Worker user groups, to guide us to our recommended maximum density for each usecase.

It is important to identify the recommended maximum session density a single server can support while providing excellent end user experience without over-taxing server resources. We use this value to identify the minimum number of servers required to support the solution. We then add one additional server to the solution to provide optimal server performance for normal operating conditions. This ensures that we can tolerate a server failure and still support the full design workload or N+1 server fault tolerance.

4.1.1 Medium Workload Single Server Maximum Recommended Density

For the single server Medium Workload, guided by VSImax scores, we determined that 250 user sessions per host gave us optimal end user experience and good server utilization metrics. Performance charts for this workload are shown below in Figures 15 and 16.

Figure 15. XenApp 7.5 on XenServer 6.2 Recommended Maximum—Medium Workload—250 Sessions

Figure 16. XenApp 7.5 on XenServer 6.2 Recommended Maximum—Medium Workload—CPU Utilization

4.1.2 Light Workload Single Server Maximum Recommended Density

For the single server Light Workload, guided by VSImax scores, we determined that 325 user sessions per host gave us optimal end user experience and good server utilization metrics. Performance charts for this workload are shown in Figures 17 and 18.

Figure 17. XenApp 7.5 on XenServer 6.2 Recommended Maximum—Light Workload—325 Sessions

Figure 18. XenApp 7.5 on XenServer 6.2 Recommended Maximum—Light Workload—CPU Utilization

4.2 Cisco UCS C240-M3 Full Scale Results for Citrix XenApp 7.5

Using all three Cisco UCS C240 M3 Rack Servers, two running all required infrastructure and eight Citrix XenApp 7.5 virtual machines and one rack server with eight Citrix XenApp 7.5 virtual machines only, we performed 500-session Login VSI Medium Workload tests to validate that the solution delivered excellent end user experience atscale.

4.2.1 Scale Results with Medium Workload

The 500-seat, three-server configuration provided excellent results. The Login VSI Index Average and Average Response times tracked well below 2 seconds throughout the run, indicating outstanding end-user experience throughout the test.

Figure 19. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions

Figures 20 through 25 provide performance data on one of the Cisco UCS C240 M3 servers in the scale test. The graphs are representative of the other two servers in the three-server test.

Figure 20. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions -Server CPU

Figure 21. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions -Server IOPS

Figure 22. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions -Server IO Thruput Mbps

Figure 23. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions -Server IO Wait

Figure 24. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions -Server IO Latency

Figure 25. XenApp 7.5 on XenServer 6.2 Scale Result—Medium Workload—500 Sessions -Server IO Average Queue Length

5 Conclusion

This reference architecture provides an extremely simple, low-cost, fault tolerant Cisco UCS-managed infrastructure for deploying a small office/branch office configuration for hosted applications or hosted shared desktops. The combination of Citrix XenApp 7.5 and the Cisco UCS platform (with its simple “wire-once” cabling and powerful compute and networking technologies) makes it easy to provision XenApp 7.5 Hosted Shared Desktops and applications.

Desktop virtualization provides significant advantages: it empowers user mobility, centralizes and protects corporate data and intellectual property, and simplifies management while reducing IT costs. Citrix XenApp 7.5 delivers a high-definition user access experience for mission-critical Windows applications, data, and desktops centralized in the datacenter or installed on a local PC. XenApp built on HDX technologies enables high-definition real-time video and interactive collaboration even when accessed from hundreds of miles away, making it easy for remote and branch workers to have the same experience and resources as employees at headquarters.

6 References

6.1 Cisco Reference Documents

Cisco Unified Computing System Manager Home Page

http://www.cisco.com/en/US/products/ps10281/index.html

Cisco UCS C240 M3 Rack Server Resources

http://buildprice.cisco.com/catalog/ucs/models/C240M3

http://www.cisco.com/en/US/products/ps10493/index.html

Cisco UCS 6200 Series Fabric Interconnects

http://www.cisco.com/en/US/products/ps11544/index.html

Download Cisco UCS Manager and C-Series Software Version 2.2(1d)—ACCOUNT REQUIRED

http://software.cisco.com/download/release.html?mdfid=283612660&softwareid=283655658&release=2.2(2c)&relind=AVAILABLE&rellifecycle=&reltype=latest

Download Cisco UCS Central Software Version1.1(2a)—ACCOUNT REQUIRED

http://software.cisco.com/download/release.html?mdfid=284308174&softwareid=284308194&release=1.1(2a)&relind=AVAILABLE&rellifecycle=&reltype=latest&i=rm

LSI Mega RAID Controllers

http://www.lsi.com/downloads/Public/Nytro/docs/DB07-000134-06_LSI_Nytro_MegaRAID%20%28NMR%20r1.7%29_Application_Acceleration_RelNotes.pdf

http://www.lsi.com/downloads/Public/Nytro/downloads/Nytro%20XM/Tech%20Pubs/LSI_Nytro_MegaRAID_Application_Acceleration_Card_QIG.pdf

6.2 Citrix Reference Documents

Citrix Product Downloads

http://www.citrix.com/downloads/xenapp.html

Citrix Knowledge Center

http://support.citrix.com

Citrix XenApp eDocs

http://support.citrix.com/proddocs/topic/xenapp/ps-library-wrapper.html

Login VSI

http://www.loginvsi.com/documentation/

Login VSImax v3.7

http://www.loginvsi.com/documentation/index.php?title=VSImax

Text Box: Printed in USA	C11-732049-00	06/14