Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera for Data Science at Scale

Available Languages

Download Options

PDF (11.3 MB)
View with Adobe Reader on a variety of devices

Updated:October 21, 2019

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera for Data Science at Scale

Building a 28-Node Cluster

NOTE:

Last Updated: October 21, 2019

NOTE:

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_2.png

About the Cisco Validated Design Program

The Cisco Validated Design (CVD) program consists of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. For more information, go to:

http://www.cisco.com/go/designzone.

ALL DESIGNS, SPECIFICATIONS, STATEMENTS, INFORMATION, AND RECOMMENDATIONS (COLLECTIVELY, "DESIGNS") IN THIS MANUAL ARE PRESENTED "AS IS," WITH ALL FAULTS. CISCO AND ITS SUPPLIERS DISCLAIM ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE. IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THE DESIGNS, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

THE DESIGNS ARE SUBJECT TO CHANGE WITHOUT NOTICE. USERS ARE SOLELY RESPONSIBLE FOR THEIR APPLICATION OF THE DESIGNS. THE DESIGNS DO NOT CONSTITUTE THE TECHNICAL OR OTHER PROFESSIONAL ADVICE OF CISCO, ITS SUPPLIERS OR PARTNERS. USERS SHOULD CONSULT THEIR OWN TECHNICAL ADVISORS BEFORE IMPLEMENTING THE DESIGNS. RESULTS MAY VARY DEPENDING ON FACTORS NOT TESTED BY CISCO.

CCDE, CCENT, Cisco Eos, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, Cisco WebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS, Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Cisco Unified Computing System (Cisco UCS), Cisco UCS B-Series Blade Servers, Cisco UCS C-Series Rack Servers, Cisco UCS S-Series Storage Servers, Cisco UCS Manager, Cisco UCS Management Software, Cisco Unified Fabric, Cisco Application Centric Infrastructure, Cisco Nexus 9000 Series, Cisco Nexus 7000 Series. Cisco Prime Data Center Network Manager, Cisco NX-OS Software, Cisco MDS Series, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, Fast Step, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study, LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX, Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare, SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient, TransPath, WebEx, and the WebEx logo are registered trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries.

All other trademarks mentioned in this document or website are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0809R)

Table of Contents

Executive Summary. 7

Solution Overview.. 8

Introduction. 8

Audience. 8

Purpose of this Document 8

Solution Summary. 9

Scaling the Solution. 12

Cloudera Data Science Workbench. 13

NVIDIA GPU. 15

Technology Overview.. 16

Cisco UCS Integrated Infrastructure for Big Data and Analytics. 16

Cisco UCS 6300 Series Fabric Interconnects. 16

Cisco UCS C-Series Rack-Mount Servers. 16

Cisco UCS Virtual Interface Cards (VICs) 18

Cisco UCS Manager 19

NVIDIA CUDA. 19

Cloudera (CDH 5.13.0) 19

Cloudera Data Science Workbench. 21

Docker Containers. 21

Kubernetes. 22

Solution Design. 23

Requirements. 23

Rack and PDU Configuration. 23

Port Configuration on Fabric Interconnects. 25

Server Configuration and Cabling for Cisco UCS C240 M5. 25

Software Distributions and Versions. 25

Cloudera (CDH 5.13.0) 25

Red Hat Enterprise Linux (RHEL) 26

Software Versions. 26

Fabric Configuration. 27

Performing Initial Setup of Cisco UCS 6332 Fabric Interconnects. 27

Configure Fabric Interconnect A. 27

Configure Fabric Interconnect B. 28

Logging Into Cisco UCS Manager 28

Upgrading Cisco UCS Manager Software to Version 3.2(2b) 29

Adding a Block of IP Addresses for KVM Access. 29

Enabling Uplink Ports. 30

Configuring VLANs. 32

Enabling Server Ports. 34

Creating Pools for Service Profile Templates. 35

Creating an Organization. 35

Creating MAC Address Pools. 36

Creating a Server Pool 38

Creating Policies for Service Profile Templates. 40

Creating Host Firmware Package Policy. 40

Creating QoS Policies. 41

Creating the Local Disk Configuration Policy. 43

Creating Server BIOS Policy. 44

Creating the Boot Policy. 47

Creating Power Control Policy. 49

Create Server BIOS Policy. 50

Creating a Service Profile Template. 52

Configuring the Storage Provisioning for the Template. 54

Configuring Network Settings for the Template. 54

Configuring the vMedia Policy for the Template. 59

Configuring Server Boot Order for the Template. 59

Configuring Server Assignment for the Template. 61

Configuring Operational Policies for the Template. 62

Installing Red Hat Enterprise Linux 7.4. 64

Post OS Install Configuration. 85

Setting Up Password-less Login. 85

Configuring /etc/hosts. 86

Creating a Red Hat Enterprise Linux (RHEL) 7.4 Local Repo. 87

Creating the Red Hat Repository Database. 88

Setting up ClusterShell 89

Installing httpd. 90

Set Up All Nodes to use the RHEL Repository. 90

Configuring DNS. 91

Upgrading the Cisco Network Driver for VIC1387. 92

Installing xfsprogs. 92

NTP Configuration. 92

Enabling Syslog. 94

Setting ulimit 94

Disabling SELinux. 95

Set TCP Retries. 95

Disabling the Linux Firewall 96

Disable Swapping. 96

Disable Transparent Huge Pages. 96

Disable IPv6 Defaults. 96

Configuring Data Drives on Name Node And Other Management Nodes. 97

Configuring Data Drives on Data Nodes. 98

Configuring the Filesystem for NameNodes and Datanodes. 99

Cluster Verification. 100

Installing Cloudera. 102

Prerequisites for CDH Installation. 103

Cloudera Manager Repository. 103

Setting Up the Local Parcels for CDH 5.13.0. 104

Downloading Parcels. 104

Setting Up the MariaDB Database for Cloudera Manager 108

Cloudera Manager Installation. 112

Setting Up the Cloudera Manager Server Database. 112

Installing Cloudera Manager 112

Starting The Cloudera Manager Server 114

Installing Cloudera Enterprise Data Hub (CDH5) 114

Setting up the Database. 128

Starting the Cluster Services. 132

Scaling the Cluster 133

Enabling High Availability. 133

HDFS High Availability. 133

YARN High Availability. 140

Setting up YARN HA. 140

Configuring Yarn (MR2 Included) and HDFS Services. 142

Apache Kafka Installation and Configuration. 143

Configuring Spark. 146

Tuning Resource Allocation for Spark. 147

Submitting a Job. 147

Shuffle Performance Improvement 148

Improving Serialization Performance. 149

Spark SQL Tuning. 149

Compression for Hive. 150

Changing the Log Directory for All Applications. 150

Cloudera Data Science Workbench (CDSW) 152

Installing Prerequisites for CUDA. 152

GCC Installation. 153

Install Kernel Headers and Installation Packages. 153

Install dkms. 153

Install NVIDIA GPU Drivers. 154

Installing CUDA. 154

Installation Prerequisites for CDSW.. 157

Set Up a Wildcard DNS Subdomain. 157

Supported JDK Version. 158

IP Tables and Security on CDSW Nodes. 158

Disable SELinux. 159

Configure Block Devices. 159

Download and Install CDSW with Cloudera Manager 159

Installing Apache Spark 2 on YARN. 160

Add the Cloudera Data Science Workbench Service. 164

Create the Administrator Account 169

Non-Kerberized Clusters. 170

Using GPUs for Cloudera Data Science Workbench Workloads. 171

Enable GPU Support in Cloudera Data Science Workbench. 173

Create a Custom CUDA-Capable Engine Image. 174

Allocate GPUs for Sessions and Jobs. 177

Bill of Materials. 179

About the Authors. 183

Acknowledgements. 183

Executive Summary

The technology that implements big data systems has matured to the point where it is in wide use across all industries addressing a wide variety of complex business problems; yet even as the technology has matured, the rate of data growth has increased. In addition, machine learning is gaining in prominence. Machine learning is a set of techniques for sophisticated pattern matching that came out of research into Artificial Intelligence. Machine learning needs all the data from the big data systems plus very high-performance computing.

Businesses are now faced with a new set of challenges, namely, making this data available to the diverse set of people who need it, publishing their results so the organization can make use of it, enabling the automated production of those results, while also managing the data for compliance and governance, and doing all of this in an efficient way that scales as the data continues to grow.

New, better solutions to old problems, and new applications, with new revenue streams, are now within grasp, but require approaches to hardware and software where agility is a primary design driver. IT organizations with traditional infrastructure and software solutions struggle to respond to changing business conditions and to manage this infrastructure. These environments require administrators to spend excessive time configuring new server, storage, and network resources in order to keep up with the scale demanded by growing computing and storage needs.

In addition, the introduction of machine learning into the mix adds new requirements. Many machine learning tasks, especially deep learning tasks, require the use of GPUs, a specialized, very high-performance processor that is massively parallel in nature. GPUs are installed on the servers and it is critically important that these high-performance processors also scale with the data growth.

Cisco UCS Integrated Infrastructure for Big Data and Analytics is an optimal choice where world class performance and reliability are base requirements. It is the strong foundation upon which solutions are built. The architecture has the designed-in ability to scale from a small starting solution to thousands of servers and hundreds of petabytes of storage, all managed from a single pane.

Cloudera Enterprise combines distributed data processing with machine learning and analytics into a single scalable platform that enables businesses to tackle their most complex problems. Cloudera is a leading provider of Apache Hadoop distributions and its integrated ecosystem of projects. These tools are used by companies around the world to tackle problems as diverse as predictive maintenance on fleets of hundreds of thousands of vehicles to real-time processing of petabyte-scale data for market surveillance and compliance on stock exchanges.

Together, Cisco and Cloudera combine to create a dependable deployment system to address today’s most challenging problems.

Solution Overview

Introduction

Both big data and machine learning technology have progressed to the point where they are being implemented in production systems running 24x7. There exists a very clear need for a proven, dependable, high-performance platform for the ingestion, processing, storage and analysis of the data, as well as the seamless dissemination of the output, results and insights of the analysis.

This solution implements the Cisco UCS Integrated Infrastructure for Big Data and Analytics, a world-class platform specifically designed for demanding workloads that is both easy to scale and easy to manage, even as the requirements grow to thousands of servers and petabytes of storage; and the Cloudera Data Science Workbench, an integrated set of tools designed to enable flexible, fast access to the entire data store.

Many companies, recognizing the immense potential of big data and machine learning technology, are gearing up to leverage these new capabilities, building out departments and increasing hiring. However, these efforts face a new set of challenges:

· making the data available to the diverse set of people who need it,

· enabling access to high-performance computing resources, GPUs, that also scale with the data growth

· allowing people to work with the data using the environments they are familiar with,

· publishing their results so the organization can make use of it,

· enabling the automated production of those results,

· managing the data for compliance and governance,

· scaling the system as the data grows

· managing and administering the system in an efficient, cost-effective way

This solution is based on the Cisco UCS Integrated Infrastructure for Big Data and Analytics and includes computing, storage, connectivity, and unified management capabilities to help companies manage the immense amount of data being collected. It is built on the Cisco Unified Computing System (Cisco UCS) infrastructure, using Cisco UCS 6332 Series Fabric Interconnects, and Cisco UCS C-Series Rack Servers. This architecture is specifically designed for performance and linear scalability for big data and machine learning workloads.

Audience

The intended audience of this document includes sales engineers, field consultants, professional services, IT managers, partner engineering and customers who want to deploy the Cloudera Distribution with Apache Hadoop (CDH 5.13.0) and Cloudera Data Science Workbench (CDSW 1.3.0) on the Cisco UCS Integrated Infrastructure for Big Data and Analytics (Cisco UCS M5 Rack mount servers).

Purpose of this Document

This document describes the architecture and deployment procedures for Cloudera 5.13.0 with Cloudera Data Science Workbench 1.3.0 on a 28-node Cisco UCS C240 M5 cluster based on Cisco UCS Integrated Infrastructure for Big Data and Analytics.

Solution Summary

This CVD describes in detail the process of installing Cloudera 5.13.0 with Cloudera Data Science Workbench (CDSW 1.3.0) and the configuration details of the cluster. The current version of Cisco UCS Integrated Infrastructure for Big Data and Analytics offers the following configurations depending on the compute and storage requirements as shown in Table 1.

Table 1 Cisco UCS Integrated Infrastructure for Big Data and Analytics Configuration Options

	Performance (UCS-SP-C240M5-A2)	Capacity (UCS-SPC240M5L-S1)	High Capacity (UCS-SP-S3260-BV)
Servers	16 x Cisco UCS C240 M5 Rack Servers with SFF drives	16 x Cisco UCS C240 M5 Rack Servers with LFF drives	8 x Cisco UCS S3260 Storage Servers
CPU	2 x Intel Xeon Processor Scalable Family 6132 (2 x 14 cores, 2.6 GHz)	2 x Intel Xeon Processor Scalable Family 4110 (2 x 8 cores, 2.1 GHz)	2 x Intel Xeon Processor Scalable Family 6132 (2 x 14 cores, 2.6 GHz)
Memory	6 x 32 GB 2666 MHz (192 GB)	6 x 32 GB 2666 MHz (192 GB)	6 x 32 GB 2666 MHz (192 GB)
Boot	M.2 with 2 x 240-GB SSDs	M.2 with 2 x 240-GB SSDs	M.2 with 2 x 240-GB SSDs
Storage	26 x 1.8 TB 10K rpm SFF SAS HDDs or 12 x 1.6 TB Enterprise Value SATA SSDs	12 x 8 TB 7.2K rpm LFF SAS HDDs + 2 SFF rear hot-swappable 1.6 TB Enterprise Value SATA SSDs	24 x 6 TB 7.2K rpm LFF SAS HDDs
VIC	40 Gigabit Ethernet (Cisco UCS VIC 1387)	40 Gigabit Ethernet (Cisco UCS VIC 1387)	40 Gigabit Ethernet (Cisco UCS VIC 1387)
Storage Controller	Cisco 12-Gbps SAS Modular RAID Controller with 4-GB flash-based write cache (FBWC) or Cisco 12-Gbps Modular SAS Host Bus Adapter (HBA)	Cisco 12-Gbps SAS Modular RAID Controller with 2-GB flash-based write cache (FBWC) or Cisco 12-Gbps Modular SAS Host Bus Adapter (HBA)	Cisco 12-Gbps SAS Modular RAID Controller with 4-GB flash-based write cache (FBWC)
Network Connectivity	Cisco UCS 6332 Fabric Interconnect	Cisco UCS 6332 Fabric Interconnect	Cisco UCS 6332 Fabric Interconnect

Table 2 lists the configuration details for Cloudera Data Science Workbench. These servers provide the high-performance GPU compute capacity.

Table 2 Cisco UCS Integrated Infrastructure for Big Data and Analytics for CDSW

	Starter	High Performance
Servers	4 x Cisco UCS C240 M5 Rack Servers	4 x Cisco UCS C480 M5 Rack Servers
CPU	2 x Intel Xeon Processor Scalable Family 6132 (2 x 14 cores, 2.6 GHz)	2 x Intel Xeon Processor Scalable Family 6132 (2 x 14 cores, 2.6 GHz)
Memory	12 x 32 GB DDR4 (384 GB)	24 x 32 GB DDR4 (768 GB)
Boot	M.2 with 2 x 240-GB SSDs	M.2 with 2 x 240-GB SSDs
Storage	4 x 1.6 TB Enterprise Value SATA SSDs	8 x 1.6 TB Enterprise Value SATA SSDs
VIC	40 Gigabit Ethernet (Cisco UCS VIC 1387)	40 Gigabit Ethernet (Cisco UCS VIC 1387)
Storage Controller	Cisco 12-Gbps SAS Modular RAID Controller with 4-GB flash-based write cache (FBWC) or Cisco 12-Gbps Modular SAS Host Bus Adapter (HBA)	Cisco 12-Gbps SAS Modular RAID Controller with 4-GB flash-based write cache (FBWC) or Cisco 12-Gbps Modular SAS Host Bus Adapter (HBA)
Network Connectivity	Cisco UCS 6332 Fabric Interconnect	Cisco UCS 6332 Fabric Interconnect
GPU	2 x NVIDIA TESLA V100	6 x NVIDIA TESLA V100

Figure 1 depicts a 28-node starter cluster. Rack #1 has 16 Cisco UCS C240 M5 servers. Each link in the figure represents a 40 Gigabit Ethernet link from each of the 16 servers directly connected to a Fabric Interconnect. Rack #2 has 12 Cisco UCS C240 M5 servers. Every server is connected to both Fabric Interconnects.

Figure 1 28 Node Starter Cluster Configuration for CDSW

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_3.png

Note: Power requirements per rack must be calculated as the exact values will change based on the power needs of the GPUs.

Figure 2 shows an alternate configuration for cases where more GPU capacity is needed. Four of the Cisco UCS C240 M5 servers from the previous figure are replaced with Cisco UCS C480 M5 servers. These servers support up to six GPUs each.

Figure 2 28 Node High Performance Cluster Configuration with additional GPU capacity

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_4.png

Note: Power requirements per rack must be calculated as the exact values will change based on the power needs of the GPUs.

Scaling the Solution

Figure 3 shows how to scale the solution. Each pair of Cisco UCS 6332 Fabric Interconnects has 28 Cisco UCS C240 M5 servers connected to it. This allows for four uplinks from each Fabric Interconnect to the Cisco Nexus 9332 switch. Six pairs of 6332 FI’s can connect to a single switch with four uplink ports each. With 28 servers per FI, a total of 168 servers can be supported. Additionally, the can scale to thousands of nodes with the Nexus 9500 series family of switches.

Figure 3 Scaling the Solution

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_5.png

Cloudera Data Science Workbench

Cloudera Data Science Workbench (CDSW) is a web application that allows data scientists to use a variety of open source languages and libraries to directly and securely access the data in the Hadoop cluster. Direct access to the big data cluster means no more working with small subsets of the data on desktop systems; no sampling is required as the entire data set is available for use directly by the user. Further, users are not restricted to a single environment. Many popular open source libraries and languages are supported, including R, Python and Scala, which means users become productive faster with no need for retraining and no time lost learning a new programming language.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_6.png

CDSW is addressing the key challenge that every team or user may require a different language, library or framework in order to be productive while the organization requires reproducibility and collaboration. By making the entire set of data in the cluster available to the user, CSDW eliminates the problem that what works on small samples or extracts of the data on a user’s desktop computer may not scale across a large cluster. Cloudera Data Science Workbench gives data scientists the flexibility and simplicity they need to be productive and innovative at scale.

In addition, CDSW enables seamless access to high-performance processors in the form of GPUs. CSDW makes use of lightweight container architecture to rapidly and securely provide the environment and resources to the users.

Cloudera Data Science Workbench is directly aimed at helping data scientists build and test new analyses and analytics projects as quickly as possible in secure manner even in large scale environments. This flexibility improves the efficiency of the exploration process, a key requirement to meet in order to move rapidly from idea to answer. Most analytics problems, especially those with transformative power, are not standard analyses and require advanced models and iterative methods. Experimentation and innovation are the heart and soul of data science, but security is needed for compliance and governance.

Data has become one of the most strategic assets in the organization. Leveraging the data to drive the business forward is the primary motivation for building an enterprise data hub to support advanced analytics. Typically, when forced to make a choice between the security of the data and the flexibility to access it, security wins locking away the data from the people who most need it. CDSW address this issue by providing full authentication and access controls against data in the cluster, including complete Kerberos integration. It offers data science teams per-project isolation and reproducibility with no effort.

Cloudera Data Science Workbench allows you to automate analytics workloads with a built-in job and pipeline scheduling system that supports real-time monitoring, job history, and email alerts. Jobs are created and can be configured to run on a recurring schedule, as well as providing alerts for successful and failed runs. Multiple jobs can be scheduled together to create an automated pipeline; e.g., the first job performs data acquisition, the next data cleansing, then analytics, and so on.

Collaboration and sharing of results are implemented via project sharing (either globally or to specific users, and project forking. To share results, CSDW enables publishing output for viewing via a browser, and even makes the console log itself available for viewing both during and after the run. Cloudera Data Science Workbench is a web application. It has no desktop footprint making it very easy to administer and maintain.

NVIDIA GPU

Graphics Processing Units, or GPUs, are specialized processors designed to render images, animation and video for computer displays. They perform this task by running many operations simultaneously. While the number and kinds of operations they can do are limited, they make up for it by being able run many thousands in parallel. As the graphics capabilities of GPUs increased, it soon became apparent that the massive parallelism of GPUs could be put to other uses beside rendering graphics.

NVIDIA^®GPU used in this document, NVIDIA Tesla^® V100 is advanced data center GPU built to accelerate AI, HPC, and graphics. It’s powered by NVIDIA Volta architecture, comes in 16 and 32 GB configurations.

NVIDIA GPUs bring two key advantages to the table. First, they make possible solutions that were simply not computationally possible before. Second, by providing the same processing power as scores of traditional CPUs they reduce the requirements for rack space, power, networking and cooling in the data center.

Technology Overview

Cisco UCS Integrated Infrastructure for Big Data and Analytics

The Cisco UCS Integrated Infrastructure for Big Data and Analytics solution for Cloudera is based on Cisco UCS Integrated Infrastructure for Big Data and Analytics, a highly scalable architecture designed to meet a variety of scale-out application demands with seamless data integration and management integration capabilities built using the components described in this section.

Cisco UCS 6300 Series Fabric Interconnects

Cisco UCS 6300 Series Fabric Interconnects provide high-bandwidth, low-latency connectivity for servers, with integrated, unified management provided for all connected devices by Cisco UCS Manager. Deployed in redundant pairs, Cisco fabric interconnects offer the full active-active redundancy, performance, and exceptional scalability needed to support the large number of nodes that are typical in clusters serving big data applications. Cisco UCS Manager enables rapid and consistent server configuration using service profiles, automating ongoing system maintenance activities such as firmware updates across the entire cluster as a single operation. Cisco UCS Manager also offers advanced monitoring with options to raise alarms and send notifications about the health of the entire cluster.

The Cisco UCS 6300 series Fabric interconnects are a core part of Cisco UCS, providing low-latency, lossless 10 and 40 Gigabit Ethernet, Fiber Channel over Ethernet (FCoE), and Fiber Channel functions with management capabilities for the entire system. All servers attached to Fabric interconnects become part of a single, highly available management domain.

Figure 4 Cisco UCS 6332 UP 32 -Port Fabric Interconnect

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_7.png

Cisco UCS C-Series Rack-Mount Servers

The Cisco UCS C240 M5 Rack-Mount Server (Figure 5) is a 2-socket, 2-Rack-Unit (2RU) rack server offering industry-leading performance and expandability. It supports a wide range of storage and I/O-intensive infrastructure workloads, from big data and analytics to collaboration. Cisco UCS C-Series Rack Servers can be deployed as standalone servers or as part of a Cisco Unified Computing System (Cisco UCS) managed environment to take advantage of Cisco’s standards-based unified computing innovations that help reduce customers’ Total Cost of Ownership (TCO) and increase their business agility.

In response to ever-increasing computing and data-intensive real-time workloads, the enterprise-class Cisco UCS C240 M5 server extends the capabilities of the Cisco UCS portfolio in a 2RU form factor. It incorporates the Intel® Xeon® Scalable processors, supporting up to 20 percent more cores per socket, twice the memory capacity, and five times more

Non-Volatile Memory Express (NVMe) PCI Express (PCIe) Solid-State Disks (SSDs) compared to the previous generation of servers. These improvements deliver significant performance and efficiency gains that will improve your application performance. The Cisco UCS C240 M5 delivers outstanding levels of storage expandability with exceptional performance, along with the following:

· Latest Intel Xeon Scalable CPUs with up to 28 cores per socket

· Up to 24 DDR4 DIMMs for improved performance

· Up to 26 hot-swappable Small-Form-Factor (SFF) 2.5-inch drives, including 2 rear hot-swappable SFF drives (up to 10 support NVMe PCIe SSDs on the NVMe-optimized chassis version), or 12 Large-Form- Factor (LFF) 3.5-inch drives plus 2 rear hot-swappable SFF drives

· Support for 12-Gbps SAS modular RAID controller in a dedicated slot, leaving the remaining PCIe Generation 3.0 slots available for other expansion cards

· Modular LAN-On-Motherboard (mLOM) slot that can be used to install a Cisco UCS Virtual Interface Card (VIC) without consuming a PCIe slot, supporting dual 10- or 40-Gbps network connectivity

· Dual embedded Intel x550 10GBASE-T LAN-On-Motherboard (LOM) ports

· Modular M.2 or Secure Digital (SD) cards that can be used for boot  

Figure 5 Cisco UCS C240 M5 Rack-Mount Server

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_8.png

The Cisco UCS C480 M5 Rack-Mount Server is a storage- and I/O-optimized enterprise-class rack server that delivers industry-leading performance for in-memory databases, big data analytics, virtualization, Virtual Desktop Infrastructure (VDI), and bare-metal applications. The Cisco UCS C480 M5 (Figure 6) delivers outstanding levels of expandability and performance for standalone or Cisco Unified Computing System^™ (Cisco UCS) managed environments in a 4RU form-factor. And because of its modular design, you pay for only what you need. It offers these capabilities:

· Latest Intel® Xeon® Scalable processors with up to 28 cores per socket and support for two-or four-processor configurations

· 2666-MHz DDR4 memory and 48 DIMM slots for up to 6 TeraBytes (TB) of total memory

· 12 PCI Express (PCIe) 3.0 slots

- Six x8 full-height, full length slots

- Six x16 full-height, full length slots

· Flexible storage options with support up to 32 Small-Form-Factor (SFF) 2.5-inch, SAS, SATA, and PCIe NVMe disk drives

· Cisco^® 12-Gbps SAS Modular RAID Controller in a dedicated slot

· Internal Secure Digital (SD) and M.2 boot options

· Dual embedded 10 Gigabit Ethernet LAN-On-Motherboard (LOM) ports

Figure 6 Cisco UCS C480 M5 Rack-Mount Server

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_9.png

Cisco UCS Virtual Interface Cards (VICs)

Cisco UCS Virtual Interface Cards (VIC) are unique to Cisco. Cisco UCS Virtual Interface Cards incorporate next-generation converged network adapter (CNA) technology from Cisco, and offer dual 10- and 40-Gbps ports designed for use with Cisco UCS servers. Optimized for virtualized networking, these cards deliver high performance and bandwidth utilization, and support up to 256 virtual devices.

The Cisco UCS Virtual Interface Card 1387 offers dual-port Enhanced Quad Small Form-Factor Pluggable (QSFP+) 40 Gigabit Ethernet and Fiber Channel over Ethernet (FCoE) in a modular-LAN-on-motherboard (mLOM) form factor. The mLOM slot can be used to install a Cisco VIC without consuming a PCIe slot providing greater I/O expandability.

Figure 7 Cisco UCS VIC 1387

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_10.png

Cisco UCS Manager

Cisco UCS Manager resides within the Cisco UCS 6300 Series Fabric Interconnect. It makes the system self-aware and self-integrating, managing all of the system components as a single logical entity. Cisco UCS Manager can be accessed through an intuitive graphical user interface (GUI), a command-line interface (CLI), or an XML application-programming interface (API). Cisco UCS Manager uses service profiles to define the personality, configuration, and connectivity of all resources within Cisco UCS, radically simplifying provisioning of resources so that the process takes minutes instead of days. This simplification allows IT departments to shift their focus from constant maintenance to strategic business initiatives.

NVIDIA CUDA

GPUs are very good at running the same operation on different data simultaneously. This is often referred to as single instruction, multiple data, or SIMD. This is exactly what’s needed to render graphics but many other computing problems can benefit from this approach. As a result, NVIDIA created CUDA. CUDA is a parallel computing platform and programming model that makes it possible to use a GPU for many general purpose computing tasks via commonly used programming languages like C and C++.

In addition to the general-purpose computing capabilities that CUDA enables there is also a special CUDA library for deep learning called the CUDA Deep Neural Network library, or cuDNN. cuDNN makes it easier to implement deep machine learning architectures that take full advantage of the GPU’s capabilities.

Cloudera (CDH 5.13.0)

Built on the transformative Apache Hadoop open source software project, Cloudera Enterprise is a hardened distribution of Apache Hadoop and related projects designed for the demanding requirements of enterprise customers. Cloudera is the leading contributor to the Hadoop ecosystem, and has created a rich suite of complementary open source projects that are included in Cloudera Enterprise.

All the integration and the entire solution is thoroughly tested and fully documented. By taking the guesswork out of building out a Hadoop deployment, CDH gives a streamlined path to success in solving real business problems.

Cloudera Enterprise with Apache Hadoop is:

· Unified – one integrated system, bringing diverse users and application workloads to one pool of data on common infrastructure; no data movement required

· Secure – perimeter security, authentication, granular authorization, and data protection

· Governed – enterprise-grade data auditing, data lineage, and data discovery

· Managed – native high-availability, fault-tolerance and self-healing storage, automated backup and disaster recovery, and advanced system and data management

· Open – Apache-licensed open source to ensure both data and applications remain copy righted, and an open platform to connect with all of the existing investments in technology and skills.

Figure 8 Cloudera Data Hub

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_11.png

Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in any enterprise. Industry-leading Cloudera products and solutions enable to deploy and manage Apache Hadoop and related projects, manipulate and analyze data, and keep that data secure and protected.

Cloudera provides the following products and tools:

· CDH—The Cloudera distribution of Apache Hadoop and other related open-source projects, including Spark. CDH also provides security and integration with numerous hardware and software solutions.

· Apache Spark—An integrated part of CDH and supported with Cloudera Enterprise, Spark is an open standard for flexible in-memory data processing for batch, real time and advanced analytics. Via the one platform Cloudera is committed to adopting Spark as the default data execution engine for analytic workloads.

· Cloudera Manager—A sophisticated application used to deploy, manage, monitor, and diagnose issues with CDH deployments. Cloudera Manager provides the Admin Console, a web-based user interface that makes administration of any enterprise data simple and straightforward. It also includes the Cloudera Manager API, which can be used to obtain cluster health information and metrics, as well as configure Cloudera Manager.

· Cloudera Navigator—An end-to-end data management tool for the CDH platform. Cloudera Navigator enables administrators, data managers, and analysts to explore the large amounts of data in Hadoop. The robust auditing, data management, lineage management, and life cycle management in Cloudera Navigator allow enterprises to adhere to stringent compliance and regulatory requirements.

Cloudera Data Science Workbench

Cloudera Data Science Workbench (CDSW) is a web application that allows data scientists to use a variety of open source languages and libraries to directly and securely access the data in the Hadoop cluster. Direct access to the big data cluster means no more working with small subsets of the data on desktop systems; no sampling is required as the entire data set is available for use directly by the user. Further, users are not restricted to a single environment. Many popular open source libraries and languages are supported, including R, Python and Scala, as well as all of the ML/DL frameworks such as TensorFlow, Theano, Pytorch, etc. In addition, CDSW enables access to available GPU resources for deep learning workloads.

Docker Containers

Cloudera Data Science Workbench makes use of container technology. Containers are conceptually similar to virtual machines, but instead of virtualizing the hardware, a container virtualizes the operating system. With a VM there is an entire operating system sitting on top of the hypervisor. Containers dispense with this time-consuming and resource hungry requirement by sharing the host system’s kernel. As a result, a container is far smaller, and its lightweight nature means they can be instantiated quickly. In fact, they can be instantiated so quickly that new application architectures are possible.

Docker is an open-source project based on Linux containers. It uses Linux kernel features like namespaces and control groups to create containers. These features are not new, but Docker has taken these concepts and improved them in the following ways:

· Ease of use: Docker makes easier for anyone — developers, systems admins, architects and others — to take advantage of containers in order to quickly build and test portable applications. It allows anyone to package an application on their development system, which can then run unmodified on any cloud or bare metal server. The basic idea is to create a “build once, run anywhere” system.

· Speed: Docker containers are very fast with a small footprint. Ultimately, containers are just sandboxed environments running on the kernel so they take up few resources. You can create and run a Docker container in seconds. Compare this to a VM which takes much longer because it has to boot up a full virtual operating system every time.

· Modularity: Docker makes it easy to take an application and breaks its functionality into separate individual containers. These containers can then be spun up and run as needed. This is particularly useful for cases where an application needs to hold and lock a particular resource, like a GPU, and then release it once it’s done using it. Modularity also enables each component, i.e., container to be updated independently.

· Scalability: modularity enables scalability. With different parts of the system running in different containers it becomes possible, and with Docker, it becomes easy to connect these containers together to create an application, which can then be scaled out as needed.

Kubernetes

Applications built using container technology provide a great deal of flexibility in terms of their architecture, deployment and scaling. Since containers provide VM-like separation of concerns but with far less overhead they allow system developers to package different services of the same application into separate containers. These containers can then be deployed in a very flexible manner including across clusters of physical and virtual machines. This builds the ability to scale directly into the application architecture. This ability in turn requires a tool to aid in deploying, managing and scaling container-based applications.

Kubernetes is an open source project specifically designed for deploying and managing multi-container applications at scale. Kubernetes automates and simplifies the following tasks:

· Deploying multi-container applications. With the application split into separate containers for different services, Kubernetes manages the deployment of the containers both at initial startup and in real-time as needed by the application.

· Scaling containers. Applications need to spin up and down containers to suit demand, to balance incoming load, and make better use of physical resources. Kubernetes provides the mechanisms for doing these things in a completely automated way.

· Updating applications. One advantage of container-based application development is the ability to independently change, improve and fix individual containers. Kubernetes has mechanisms for allowing graceful updates to new versions of container images, including rollbacks if something does not go as planned.

Kubernetes manages application status and any replication and load balancing needs. It also handles hardware resource allocation including GPUs. Kubernetes also has facilities for maximizing the use of hardware resources including memory, storage I/O, and network bandwidth. Applications can have soft and hard limits set on their resource usage. For example, many small applications that use minimal resources can be run together on the same hardware while resource hungry applications can be placed on different hardware and scale out as needed.

Solution Design

Requirements

This CVD describes architecture and deployment procedures for Cloudera (CDH 5.13.0) and Cloudera Data Science Workbench (CDSW 1.3.0) on a 28-node cluster based on Cisco UCS Integrated Infrastructure for Big Data and Analytics. The solution goes into detail configuring CDH 5.13.0 on the infrastructure, as well as the complete installation and configuration of CDSW 1.3.0 and all of its dependencies.

The cluster configuration consists of the following:

· Two Cisco UCS 6332UP Fabric Interconnects

· 28 UCS C240 M5 Rack-Mount servers

· 8 NVIDIA GPU

· Two Cisco R42610 standard racks

· Four Vertical Power distribution units (PDUs) (Country Specific)

Rack and PDU Configuration

Each rack consists of two vertical PDUs. The first rack consists of two Cisco UCS 6332UP Fabric Interconnects, 16 Cisco UCS C240 M5 Rack Servers connected to each of the vertical PDUs for redundancy; thereby, ensuring availability during power source failure. The second rack consists of 12 Cisco UCS C240 M5 Servers connected to each of the vertical PDUs for redundancy; thereby, ensuring availability during power source failure, similar to the first rack.

Note: Please contact your Cisco representative for country specific information.

Table 3Describes the rack configurations.

Table 3 Rack Configurations

Cisco	First Rack	Cisco	Second Rack
42URack		42URack
42	Cisco UCS FI 6332UP Cisco UCS FI 6332UP	42	Unused
41	Cisco UCS FI 6332UP Cisco UCS FI 6332UP	41	Unused
40	Unused Unused	40	Unused
39	Unused Unused	39	Unused
38	Unused	38	Unused
37	Unused	37	Unused
36	Unused	36	Unused
35	Unused	35	Unused
34	Unused	34	Unused
33	Unused	33
32	Cisco UCS C240 M5	32	Unused
31	Cisco UCS C240 M5	31
30	Cisco UCS C240 M5	30	Unused
29	Cisco UCS C240 M5	29
28	Cisco UCS C240 M5	28	Unused
27	Cisco UCS C240 M5	27
26	Cisco UCS C240 M5	26	Unused
25	Cisco UCS C240 M5	25
24	Cisco UCS C240 M5	24	Cisco UCS C240 M5
23	Cisco UCS C240 M5	23
22	Cisco UCS C240 M5	22	Cisco UCS C240 M5
21	Cisco UCS C240 M5	21
20	Cisco UCS C240 M5	20	Cisco UCS C240 M5
19	Cisco UCS C240 M5	19
18	Cisco UCS C240 M5	18	Cisco UCS C240 M5
17	Cisco UCS C240 M5	17
16	Cisco UCS C240 M5	16	Cisco UCS C240 M5
15	Cisco UCS C240 M5	15
14	Cisco UCS C240 M5	14	Cisco UCS C240 M5
13	Cisco UCS C240 M5	13
12	Cisco UCS C240 M5	12	Cisco UCS C240 M5
11	Cisco UCS C240 M5	11
10	Cisco UCS C240 M5	10	Cisco UCS C240 M5
9	Cisco UCS C240 M5	9
8	Cisco UCS C240 M5	8	Cisco UCS C240 M5
7	Cisco UCS C240 M5	7	with 2x NVIDIA GPUs
6	Cisco UCS C240 M5	6	Cisco UCS C240 M5
5	Cisco UCS C240 M5	5	with 2x NVIDIA GPUs
4	Cisco UCS C240 M5	4	Cisco UCS C240 M5
3	Cisco UCS C240 M5	3	with 2x NVIDIA GPUs
2	Cisco UCS C240 M5	2	Cisco UCS C240 M5
1	Cisco UCS C240 M5	1	with 2x NVIDIA GPUs

Port Configuration on Fabric Interconnects

Port Type	Port Number
Network	29-32
Server	1-28

Server Configuration and Cabling for Cisco UCS C240 M5

The Cisco UCS C240 M5 rack server is equipped with 2 x Intel Xeon Processor Scalable Family 6132 (2 x 14 cores, 2.6 GHz), 192 GB of memory, Cisco UCS Virtual Interface Card 1337, Cisco 12-Gbps SAS Modular Raid Controller with 4-GB FBWC, 26 x 1.8 TB 10K rpm SFF SAS HDDs or 12 x 1.6 TB Enterprise Value SATA SSDs, M.2 with 2 x 240-GB SSDs for Boot.

For information on physical connectivity and single-wire management see:

https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c-series_integration/ucsm3-2/b_C-Series-Integration_UCSM3-2/b_C-Series-Integration_UCSM3-2_chapter_010.html?bookSearch=true

For more information on physical connectivity illustrations and cluster setup, see:

https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/c-series_integration/ucsm3-2/b_C-Series-Integration_UCSM3-2/b_C-Series-Integration_UCSM3-2_chapter_010.html?bookSearch=true

Software Distributions and Versions

The software distributions required versions are listed below.

Cloudera (CDH 5.13.0)

The Cloudera Distribution for Apache Hadoop version used is 5.13.0. For more information visit www.cloudera.com.

Red Hat Enterprise Linux (RHEL)

The operating system supported is Red Hat Enterprise Linux 7.4. For more information visit http://www.redhat.com.

Software Versions

The software versions tested and validated in this document are shown in Table 4.

Table 4 Software Versions

Layer	Component	Version or Release
Compute	Cisco UCS C240-M5	C240M5.3.1.2a.0.09
Network	Cisco UCS 6332	UCS 3.2(2b) A
	Cisco UCS VIC1387 Firmware	4.2.2(a)
	Cisco UCS VIC1387 Driver	2.3.0.44
Storage	SAS Expander	65.02.13.00
	Cisco 12G Modular Raid controller	50.1.0-07.26
	LSI MegaRAID SAS Driver	07.703.06.00
Software	Red Hat Enterprise Linux Server	7.4 (x86_64)
	Cisco UCS Manager	3.2(2b)
	CDH	5.13.0
	CDSW	1.3.0
GPU	CUDA	8.1
GPU	GPU Driver	390

Note: The latest drivers can be downloaded from the link below:
https://software.cisco.com/download/home/283862063/type/283853158/release/3.1%25283%2529.

Note: The Latest Supported RAID controller Driver is already included with the RHEL 7.4 operating system.

Note: Cisco UCS C240 M5 Rack Servers with Intel Scalable Processor Family CPUs are supported from Cisco UCS firmware 3.2 onwards.

Fabric Configuration

This section provides details for configuring a fully redundant, highly available Cisco UCS 6332 fabric configuration.

· Initial setup of the Fabric Interconnect A and B

· Connect to Cisco UCS Manager using virtual IP address of using the web browser

· Launch Cisco UCS Manager.

· Enable server, uplink and appliance ports.

· Start discovery process.

· Create pools and polices for service profile template

· Create Service Profile template and 28 Service profiles

· Associate Service Profiles to servers

Performing Initial Setup of Cisco UCS 6332 Fabric Interconnects

This section describes the initial setup of the Cisco UCS 6332 Fabric Interconnects A and B.

Configure Fabric Interconnect A

1. Connect to the console port on the first Cisco UCS 6332 Fabric Interconnect.

2. At the prompt to enter the configuration method, enter console to continue.

3. If asked to either perform a new setup or restore from backup, enter setup to continue.

4. Enter y to continue to set up a new Fabric Interconnect.

5. Enter y to enforce strong passwords.

6. Enter the password for the admin user.

7. Enter the same password again to confirm the password for the admin user.

8. When asked if this fabric interconnect is part of a cluster, answer y to continue.

9. Enter A for the switch fabric.

10. Enter the cluster name for the system name.

11. Enter the Mgmt0 IPv4 address.

12. Enter the Mgmt0 IPv4 netmask.

13. Enter the IPv4 address of the default gateway.

14. Enter the cluster IPv4 address.

15. To configure DNS, answer y.

16. Enter the DNS IPv4 address.

17. Answer y to set up the default domain name.

18. Enter the default domain name.

19. Review the settings that were printed to the console, and if they are correct, answer yes to save the configuration.

20. Wait for the login prompt to make sure the configuration has been saved.

Configure Fabric Interconnect B

1. Connect to the console port on the second Cisco UCS 6332 Fabric Interconnect.

2. When prompted to enter the configuration method, enter console to continue.

3. The installer detects the presence of the partner Fabric Interconnect and adds this fabric interconnect to the cluster. Enter y to continue the installation.

4. Enter the admin password that was configured for the first Fabric Interconnect.

5. Enter the Mgmt0 IPv4 address.

6. Answer yes to save the configuration.

7. Wait for the login prompt to confirm that the configuration has been saved.

For more information on configuring Cisco UCS 6332 Series Fabric Interconnect, refer to:

https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/ucs-manager/GUI-User-Guides/Getting-Started/3-2/b_UCSM_Getting_Started_Guide_3_2/b_UCSM_Getting_Started_Guide_3_2_chapter_0100.html

Logging Into Cisco UCS Manager

To log into Cisco UCS Manager, complete the following steps:

1. Open a Web browser and navigate to the Cisco UCS 6332 Fabric Interconnect cluster address.

2. Click the Launch link to download the Cisco UCS Manager software.

3. If prompted to accept security certificates, accept as necessary.

4. When prompted, enter admin for the username and enter the administrative password.

5. Click Login to log in to the Cisco UCS Manager.

Upgrading Cisco UCS Manager Software to Version 3.2(2b)

This document assumes the use of UCS 3.2(2b). Refer to Cisco UCS 3.2 Release (upgrade the Cisco UCS Manager software and UCS 6332 Fabric Interconnect software to version 3.2(2b). Also, make sure the UCS C-Series version 3.2(2b) software bundles are installed on the Fabric Interconnects.

Adding a Block of IP Addresses for KVM Access

To create a block of KVM IP addresses for server access in the Cisco UCS environment, complete the following steps:

1. Select the LAN tab at the top of the left window.

2. Select Pools > root > IpPools > Ip Pool ext-mgmt.

3. Right-click IP Pool ext-mgmt.

4. Select Create Block of IPv4 Addresses.

Figure 9 Adding a Block of IPv4 Addresses for KVM Access Part 1

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_12.jpg

5. Enter the starting IP address of the block and number of IPs needed, as well as the subnet and gateway information.

Figure 10 Adding Block of IPv4 Addresses for KVM Access Part 2

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_13.png

6. Click OK to create the IP block.

7. Click OK in the message box.

Enabling Uplink Ports

To enable uplinks ports, complete the following steps:

1. Select the Equipment tab on the top left of the window.

2. Select Equipment > Fabric Interconnects > Fabric Interconnect A (primary) > Fixed Module.

3. Expand the Unconfigured Ethernet Ports section.

4. Select port 29-32 that is connected to the uplink switch, right-click, then select Reconfigure > Configure as Uplink Port.

5. Select Show Interface and select 40GB for Uplink Connection.

6. A pop-up window appears to confirm your selection. Click Yes then OK to continue.

7. Select Equipment > Fabric Interconnects > Fabric Interconnect B (subordinate) > Fixed Module.

8. Expand the Unconfigured Ethernet Ports section.

9. Select port number 29-32, which is connected to the uplink switch, right-click, then select Reconfigure > Configure as Uplink Port.

10. Select Show Interface and select 40GB for Uplink Connection.

11. A pop-up window appears to confirm your selection. Click Yes then OK to continue.

Figure 11 Enabling Uplink Ports Part1

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_14.jpg

Figure 17 Enabling Uplink Ports Part2

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_15.jpg

Figure 18 Enabling Uplink Ports Part 3

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_16.jpg

Configuring VLANs

VLANs are configured as in shown in Table 5.

Table 5 VLAN Configurations

VLAN	NIC Port	Function
VLAN13	eth0	Data

The NIC will carry the data traffic from VLAN13. A single vNIC is used in this configuration and the Fabric Failover feature in Fabric Interconnects will take care of any physical port down issues. It will be a seamless transition from an application perspective.

To configure VLANs in the Cisco UCS Manager GUI, complete the following steps:

1. Select the LAN tab in the left pane in the UCSM GUI.

2. Select LAN > LAN Cloud > VLANs.

3. Right-click the VLANs under the root organization.

4. Select Create VLANs to create the VLAN.

Figure 19 Creating a VLAN

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_17.jpg

5. Enter vlan13 for the VLAN Name.

6. Keep multicast policy as <not set>.

7. Select Common/Global for vlan13.

8. Enter 13 in the VLAN IDs field for the Create VLAN IDs.

9. Click OK and then, click Finish.

10. Click OK in the success message box.

Figure 20 Creating VLAN for Data

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_18.png

11. Click OK and then, click Finish.

Enabling Server Ports

To enable server ports, complete the following steps:

1. Select the Equipment tab on the top left of the window.

2. Select Equipment > Fabric Interconnects > Fabric Interconnect A (primary) > Fixed Module.

3. Expand the Unconfigured Ethernet Ports section.

4. Select all the ports that are connected to the Servers right-click them, and select Reconfigure > Configure as a Server Port.

5. A pop-up window appears to confirm your selection. Click Yes then OK to continue.

6. Select Equipment > Fabric Interconnects > Fabric Interconnect B (subordinate) > Fixed Module.

7. Expand the Unconfigured Ethernet Ports section.

8. Select all the ports that are connected to the Servers right-click them, and select Reconfigure > Configure as a Server Port.

9. A pop-up window appears to confirm your selection. Click Yes, then OK to continue.

Figure 21 Enabling Server Ports

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_19.jpg

After the Server Discovery, Port 29-32 will be a Network Port and 1-28 will be Server Ports.

Figure 22 Ports status after the Server Discover

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_20.jpg

Creating Pools for Service Profile Templates

Creating an Organization

Organizations are used as a means to arrange and restrict access to various groups within the IT organization, thereby enabling multi-tenancy of the compute resources. This document does not assume the use of Organizations; however, the necessary steps are provided for future reference.

To configure an organization within the Cisco UCS Manager GUI, complete the following steps:

1. Click Quick Action icon on the top right corner in the right pane in the Cisco UCS Manager GUI.

2. Select Create Organization from the options

3. Enter a name for the organization.

4. (Optional) Enter a description for the organization.

5. Click OK.

6. Click OK in the success message box.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_21.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_22.jpg

Creating MAC Address Pools

To create MAC address pools, complete the following steps:

1. Select the LAN tab on the left of the window.

2. Select Pools > root > MAC Pools

3. Right-click MAC Pools under the root organization.

4. Select Create MAC Pool to create the MAC address pool. Enter ucs for the name of the MAC pool.

5. (Optional) Enter a description of the MAC pool.

6. Select Assignment Order Sequential.

7. Click Next.

8. Click Add.

9. Specify a starting MAC address.

10. Specify a size of the MAC address pool, which is sufficient to support the available server resources.

11. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_23.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_24.jpg

Figure 23 Specifying first MAC Address and Size

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_25.jpg

12. Click Finish.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_26.jpg

13. When the message box displays, click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_27.jpg

Creating a Server Pool

A server pool contains a set of servers. These servers typically share the same characteristics. Those characteristics can be their location in the chassis, or an attribute such as server type, amount of memory, local storage, type of CPU, or local drive configuration. You can manually assign a server to a server pool, or use server pool policies and server pool policy qualifications to automate the assignment.

To configure the server pool within the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Pools > root.

3. Right-click the Server Pools.

4. Select Create Server Pool.

5. Enter your required name (ucs) for the Server Pool in the name text box.

6. (Optional) enter a description for the organization.

7. Click Next > to add the servers.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_28.jpg

8. Select all the Cisco UCS C240M5 servers to be added to the server pool that was previously created (ucs), then Click >> to add them to the pool.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_29.jpg

9. Click Finish.

10. Click OK and then click Finish.

Creating Policies for Service Profile Templates

Creating Host Firmware Package Policy

Firmware management policies allow the administrator to select the corresponding packages for a given server configuration. These include adapters, BIOS, board controllers, FC adapters, HBA options, and storage controller properties as applicable.

To create a firmware management policy for a given server configuration using the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Policies > root.

3. Right-click Host Firmware Packages.

4. Select Create Host Firmware Package.

5. Enter the required Host Firmware package name (ucs).

6. Select Simple radio button to configure the Host Firmware package.

7. Select the appropriate Rack package that has been installed.

8. Click OK to complete creating the management firmware package

9. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_30.png

Creating QoS Policies

To create the QoS policy for a given server configuration using the Cisco UCS Manager GUI, complete the following steps:

Platinum Policy

1. Select the LAN tab in the left pane in the Cisco UCS Manager GUI.

2. Select Policies > root.

3. Right-click QoS Policies.

4. Select Create QoS Policy.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_31.jpg

5. Enter Platinum as the name of the policy.

6. Select Platinum from the drop-down list.

7. Keep the Burst(Bytes) field set to default (10240).

8. Keep the Rate(Kbps) field set to default (line-rate).

9. Keep Host Control radio button set to default (none).

10. When the pop-up window appears, click OK to complete the creation of the Policy.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_32.jpg

Setting Jumbo Frames

To set Jumbo frames and enable QoS, complete the following steps:

1. Select the LAN tab in the left pane in the Cisco UCS Manager GUI.

2. Select LAN Cloud > QoS System Class.

3. In the right pane, select the General tab

4. In the Platinum row, enter 9216 for MTU.

5. Check the Enabled Check box next to Platinum.

6. In the Best Effort row, select none for weight.

7. In the Fiber Channel row, select none for weight.

8. Click Save Changes.

9. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_33.jpg

Creating the Local Disk Configuration Policy

To create local disk configuration in the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab on the left pane in the Cisco UCS Manager GUI.

2. Go to Policies > root.

3. Right-click Local Disk Config Policies.

4. Select Create Local Disk Configuration Policy.

5. Enter ucs as the local disk configuration policy name.

6. Change the Mode to Any Configuration. Check the Protect Configuration box.

7. Keep the FlexFlash State field as default (Disable).

8. Keep the FlexFlash RAID Reporting State field as default (Disable).

9. Click OK to complete the creation of the Local Disk Configuration Policy.

10. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_34.jpg

Creating Server BIOS Policy

The BIOS policy feature in Cisco UCS automates the BIOS configuration process. The traditional method of setting the BIOS is manually, and is often error-prone. By creating a BIOS policy and assigning the policy to a server or group of servers, can enable transparency within the BIOS settings configuration.

Note: BIOS settings can have a significant performance impact, depending on the workload and the applications. The BIOS settings listed in this section is for configurations optimized for best performance which can be adjusted based on the application, performance, and energy efficiency requirements.

To create a server BIOS policy using the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Policies > root.

3. Right-click BIOS Policies.

4. Select Create BIOS Policy.

5. Enter your preferred BIOS policy name (ucs).

6. Change the BIOS settings as shown in the following figures.

7. Only changes that need to be made are in the Processor and RAS Memory settings.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_35.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_36.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_37.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_38.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_39.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_40.png

Creating the Boot Policy

To create boot policies within the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the UCS Manager GUI.

2. Select Policies > root.

3. Right-click the Boot Policies.

4. Select Create Boot Policy.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_41.jpg

5. Enter ucs as the boot policy name.

6. (Optional) enter a description for the boot policy.

7. Keep the Reboot on Boot Order Change check box unchecked.

8. Keep Enforce vNIC/vHBA/iSCSI Name check box checked.

9. Keep Boot Mode Default (Legacy).

10. Expand Local Devices > Add CD/DVD and select Add Local CD/DVD.

11. Expand Local Devices and select Add Local Disk.

12. Expand vNICs and select Add LAN Boot and enter eth0.

13. Click OK to add the Boot Policy.

14. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_42.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_43.jpg

Creating Power Control Policy

To create Power Control policies within the Cisco UCS Manager GUI, complete the following steps:

1. Select the Servers tab in the left pane in the Cisco UCS Manager GUI.

2. Select Policies > root.

3. Right-click the Power Control Policies.

4. Select Create Power Control Policy.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_44.jpg

5. Enter ucs as the Power Control policy name.

6. (Optional) enter a description for the boot policy.

7. Select Performance for Fan Speed Policy.

8. Select No cap for Power Capping selection.

9. Click OK to create the Power Control Policy.

10. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_45.jpg

Create Server BIOS Policy

To create a server BIOS policy for the Cisco UCS environment, follow these steps:

1. In Cisco UCS Manager, click the Servers tab in the navigation pane.

2. Select Policies > root > Sub-Organization > UCS-HDP > BIOS Policies.

3. Right-click BIOS Policies.

4. Select Create BIOS Policy.

5. Enter C240M5-BIOS as the BIOS policy name.

Figure 24 BIOS Configuration

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_46.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_47.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_48.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_49.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_50.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_51.jpg

Creating a Service Profile Template

To create a Service Profile Template, complete the following steps:

1. Select the Servers tab in the left pane in the Cisco UCS Manager GUI.

2. Right-click Service Profile Templates.

3. Select Create Service Profile Template.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_52.jpg

The Create Service Profile Template window appears.

To identify the service profile template, complete the following steps:

1. Name the service profile template as ucs. Select the Updating Template radio button.

2. In the UUID section, select Hardware Default as the UUID pool.

3. Click Next to continue to the next section.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_53.jpg

Configuring the Storage Provisioning for the Template

To configure storage policies, complete the following steps:

1. Go to the Local Disk Configuration Policy tab, and select ucs for the Local Storage.

2. Click Next to continue to the next section.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_54.png

3. Click Next once the Networking window appears to go to the next section.

Configuring Network Settings for the Template

1. Keep the Dynamic vNIC Connection Policy field at the default.

2. Select Expert radio button for the option how would you like to configure LAN connectivity?

3. Click Add to add a vNIC to the template.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_55.jpg

4. The Create vNIC window displays. Name the vNIC as eth0.

5. Select ucs in the Mac Address Assignment pool.

6. Select the Fabric A radio button and check the Enable failover check box for the Fabric ID.

7. Check the VLAN13 check box for VLANs and select the Native VLAN radio button.

8. Select MTU size as 9000.

9. Select adapter policy as Linux.

10. Select QoS Policy as Platinum.

11. Keep the Network Control Policy as Default.

12. Click OK.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_56.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_57.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_58.png

Note: Optionally Network Bonding can be setup on the vNICs for each host for redundancy as well as for increased throughput.

13. Click Next to continue with SAN Connectivity.

14. Select no vHBAs for How would you like to configure SAN Connectivity?

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_59.png

15. Click Next to continue with Zoning.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_60.png

16. Click Next to continue with vNIC/vHBA placement.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_61.jpg

17. Click Next to configure vMedia Policy.

Configuring the vMedia Policy for the Template

1. Click Next once the vMedia Policy window appears to go to the next section.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_62.png

Configuring Server Boot Order for the Template

To set the boot order for the servers, complete the following steps:

1. Select ucs in the Boot Policy name field.

2. Review to make sure that all of the boot devices were created and identified.

3. Verify that the boot devices are in the correct boot sequence.

4. Click OK.

5. Click Next to continue to the next section.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_63.png

6. In the Maintenance Policy window, apply the maintenance policy.

7. Keep the Maintenance policy at no policy used by default. Click Next to continue to the next section.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_64.png

Configuring Server Assignment for the Template

In the Server Assignment window, to assign the servers to the pool, complete the following steps:

1. Select ucs for the Pool Assignment field.

2. Select the power state to be Up.

3. Keep the Server Pool Qualification field set to <not set>.

4. Check the Restrict Migration check box.

5. Select ucs in Host Firmware Package.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_65.png

Configuring Operational Policies for the Template

In the Operational Policies Window, complete the following steps:

1. Select ucs in the BIOS Policy field.

2. Select ucs in the Power Control Policy field.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_66.png

3. Click Finish to create the Service Profile template.

4. Click OK in the pop-up window to proceed.

5. Select the Servers tab in the left pane of the Cisco UCS Manager GUI.

6. Go to Service Profile Templates > root.

7. Right-click Service Profile Templates ucs.

8. Select Create Service Profiles From Template.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_67.jpg

The Create Service Profiles from Template window appears.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_68.jpg

Association of the Service Profiles will take place automatically.

Installing Red Hat Enterprise Linux 7.4

The following section provides detailed procedures for installing Red Hat Enterprise Linux 7.4 using Software RAID (OS based Mirroring) on Cisco UCS C240 M5 servers. There are multiple ways to install the Red Hat Linux operating system. The installation procedure described in this deployment guide uses KVM console and virtual media from Cisco UCS Manager.

Note: This requires RHEL 7.4 DVD/ISO for the installation

To install the Red Hat Linux 7.4 operating system, complete the following steps:

1. Log in to the Cisco UCS 6332 Fabric Interconnect and launch the Cisco UCS Manager application.

2. Select the Equipment tab.

3. In the navigation pane expand Rack-Mounts and then Servers.

4. In the right pane, click the KVM Console >>.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_69.jpg

5. Click O.K on KVM Console – Select IP address pop-up window.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_70.jpg

6. Click the link to launch the KVM console.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_71.jpg

7. Point the cursor over the top right corner, select the Virtual Media tab.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_72.jpg

8. Click the Activate Virtual Devices found in Virtual Media tab.

9. Click the Virtual Media tab again to select CD/DVD.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_73.jpg

10. Select Map Drive in the Virtual Disk Management windows.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_74.jpg

11. Browse to the Red Hat Enterprise Linux Server 7.4 installer ISO image file.

Note: The Red Hat Enterprise Linux 7.4 DVD is assumed to be on the client machine.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_75.jpg

12. Click Open to add the image to the list of virtual media.

13. In the KVM window, select the KVM tab to monitor during boot.

14. In the KVM window, select the Macros > Static Macros > Ctrl-Alt-Del button in the upper left corner.

15. Click OK.

16. Click OK to reboot the system.

17. Press F6 key on the keyboard to select install media.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_76.jpg

Note: Press F6 on your keyboard as soon as possible when the screen above appears to avoid the server reboot again.

18. On reboot, the machine detects the presence of the Red Hat Enterprise Linux Server 7.4 install media.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_77.jpg

19. Select the Install Red Hat Enterprise Linux 7.4.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_78.jpg

20. Skip the Media test and start the installation. Select language of installation and click Continue.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_79.jpg

21. Select Date and time, which pops up another window as shown below:

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_80.jpg

22. Select the location on the map, set the time, and click Done.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_81.jpg

23. Click Installation Destination.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_82.jpg

24. This opens a new window with the boot disks. Make the selection, and choose I will configure partitioning. Click Done.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_83.jpg

25. This opens the new window for creating the partitions. Click on the + sign to add a new partition as shown below, boot partition of size 2048 MB.

26. Click Add MountPoint to add the partition.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_84.jpg

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_85.jpg

27. Change the Device type to RAID and make sure the RAID Level is RAID1 (Redundancy) and click on Update Settings to save the changes.

28. Click the + sign to create the swap partition of size 2048 MB as shown below.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_86.jpg

29. Change the Device type to RAID and RAID level to RAID1 (Redundancy) and click Update Settings.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_87.jpg

30. Click + to add the / partition. The size can be left empty so it uses the remaining capacity and click Add Mountpoint.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_88.jpg

31. Change the Device type to RAID and RAID level to RAID1 (Redundancy). Click Update Settings.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_89.png

32. Click Done to go back to the main screen and continue the Installation.

33. Click Software Selection.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_90.jpg

34. Select Infrastructure Server and select the Add-Ons as noted below. Click Done.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_91.jpg

35. Click Network and Hostname and configure Hostname and Networking for the Host.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_92.jpg

36. Type in the hostname as shown below.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_93.jpg .

37. Click Configure to open the Network Connectivity window. Click IPV4Settings.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_94.jpg

38. Change the Method to Manual and click Add to enter the IP Address, Netmask, and Gateway details.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_95.png

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_96.png

39. Click Save and update the hostname and turn Ethernet ON. Click Done to return to the main menu.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_97.png

40. Click Begin Installation in the main menu.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_98.jpg

41. Select Root Password in the User Settings.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_99.jpg

42. Enter the Root Password and click Done.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_100.jpg

43. When the installation is complete reboot the system.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_101.jpg

44. Repeat steps 1 to 43 to install Red Hat Enterprise Linux 7.4 on the remaining servers.

Note: The OS installation and configuration of the nodes that is mentioned above can be automated through PXE boot or third party tools.

The hostnames and their corresponding IP addresses are shown in Table 6.

Table 6 Hostnames and IP Addresses

Hostname	eth0
rhel1	10.13.1.50
rhel2	10.13.1.51
rhel3	10.13.1.52
rhel4	10.13.1.53
rhel1	10.13.1.54
rhel6	10.13.1.55
rhel7	10.13.1.56
rhel8	10.13.1.57
rhel9	10.13.1.58
rhel10	10.13.1.59
rhel11	10.13.1.60
rhel12	10.13.1.61
rhel13	10.13.1.62
rhel14	10.13.1.63
rhel15	10.13.1.64
rhel16	10.13.1.65
…	…
rhel24	10.13.1.73
cdsw1.cisco.com	10.13.1.250
Cdsw2.cisco.com	10.13.1.251
Cdsw3.cisco.com	10.13.1.252
Cdsw4.cisco.com	10.13.1.253

Note: Cloudera does not recommend multi-homing configurations, so please assign only one network to each node.

Post OS Install Configuration

Choose one of the nodes of the cluster or a separate node as the Admin Node for management such as CDH installation, cluster parallel shell, creating a local Red Hat repo and others. In this document, we use rhel1 for this purpose.

Setting Up Password-less Login

To manage all of the clusters nodes from the admin node password-less login needs to be setup. It assists in automating common tasks with clustershell (clush, a cluster wide parallel shell), and shell-scripts without having to use passwords.

When Red Hat Linux is installed across all the nodes in the cluster, to enable password-less login across all the nodes, complete the following steps:

1. Login to the Admin Node (rhel1).

#ssh 10.13.1.50

2. Run the ssh-keygen command to create both public and private keys on the admin node.

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_102.png

3. Then run the following command from the admin node to copy the public key id_rsa.pub to all the nodes of the cluster. ssh-copy-id appends the keys to the remote-host’s .ssh/authorized_keys.

#for IP in {50..73}; do echo -n "$IP -> "; ssh-copy-id -i ~/.ssh/id_rsa.pub 10.13.1.$IP; done

#for IP in {250..253}; do echo -n "$IP -> "; ssh-copy-id -i ~/.ssh/id_rsa.pub 10.13.1.$IP; done

4. Enter yes for Are you sure you want to continue connecting (yes/no)?

5. Enter the password of the remote host.

Configuring /etc/hosts

Setup /etc/hosts on the Admin node; this is a pre-configuration to setup DNS as shown in the next section.

To create the host file on the admin node, complete the following steps:

1. Populate the host file with IP addresses and corresponding hostnames on the Admin node (rhel1) and other nodes as follows:

2. On Admin Node (rhel1):

#vi /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 \ localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 \ localhost6.localdomain6

10.13.1.50 rhel1

10.13.1.51 rhel2

10.13.1.52 rhel3

10.13.1.53 rhel4

10.13.1.54 rhel5

10.13.1.55 rhel6

10.13.1.56 rhel7

…

10.13.1.73 rhel24

10.13.1.250 cdsw1.cisco.com

10.13.1.251 cdsw2.cisco.com

10.13.1.252 cdsw3.cisco.com

10.13.1.253 cdsw4.cisco.com

Creating a Red Hat Enterprise Linux (RHEL) 7.4 Local Repo

To create a repository using RHEL DVD or ISO on the admin node (in this deployment rhel1 is used for this purpose), create a directory with all the required RPMs, run the createrepo command and then publish the resulting repository.

1. Log on to rhel1. Create a directory that would contain the repository.

#mkdir -p /var/www/html/rhelrepo

2. Copy the contents of the Red Hat DVD to /var/www/html/rhelrepo

3. Alternatively, if you have access to a Red Hat ISO Image, Copy the ISO file to rhel1.

4. Log back into rhel1 and create the mount directory.

#scp rhel-server-7.4-x86_64-dvd.iso rhel1:/root/

#mkdir -p /mnt/rheliso

#mount -t iso9660 -o loop /root/rhel-server-7.4-x86_64-dvd.iso /mnt/rheliso/

5. Copy the contents of the ISO to the /var/www/html/rhelrepo directory.

#cp -r /mnt/rheliso/* /var/www/html/rhelrepo

6. On rhel1 create a .repo file to enable the use of the yum command.

#vi /var/www/html/rhelrepo/rheliso.repo

[rhel7.4]

name=Red Hat Enterprise Linux 7.4

baseurl=http://10.13.1.50/rhelrepo

gpgcheck=0

enabled=1

7. Copy rheliso.repo file from /var/www/html/rhelrepo to /etc/yum.repos.d on rhel1.

#cp /var/www/html/rhelrepo/rheliso.repo /etc/yum.repos.d/

Note: Based on this repo file yum requires httpd to be running on rhel1 for other nodes to access the repository.

8. To make use of repository files on rhel1 without httpd, edit the baseurl of repo file /etc/yum.repos.d/rheliso.repo to point repository location in the file system.

Note: This step is needed to install software on Admin Node (rhel1) using the repo (such as httpd, create-repo, etc.)

#vi /etc/yum.repos.d/rheliso.repo

[rhel7.4]

name=Red Hat Enterprise Linux 7.4

baseurl=file:///var/www/html/rhelrepo

gpgcheck=0

enabled=1

Creating the Red Hat Repository Database

1. Install the createrepo package on admin node (rhel1). Use it to regenerate the repository database(s) for the local copy of the RHEL DVD contents.

#yum -y install createrepo

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_103.jpg

2. Run createrepo on the RHEL repository to create the repo database on admin node

#cd /var/www/html/rhelrepo

#createrepo .

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_104.png

Setting up ClusterShell

ClusterShell (or clush) is the cluster-wide shell that runs commands on several hosts in parallel. To set up the ClusterShell, complete the following steps:

1. From the system connected to the Internet download Cluster shell (clush) and install it on rhel1. Cluster shell is available from EPEL (Extra Packages for Enterprise Linux) repository.

#wget http://rpm.pbone.net/index.php3/stat/4/idpl/31529309/dir/redhat_el_7/com/clustershell-1.7-1.el7.noarch.rpm.html

#scp clustershell-1.7-1.el7.noarch.rpm rhel1:/root/

#yum –y install clustershell-1.71.el7.noarch.rpm

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_105.jpg

2. Edit /etc/clustershell/groups.d/local.cfg file to include hostnames for all the nodes of the cluster. This set of hosts is taken when running clush with the ‘-a’ option.

3. For 28 node cluster as in our CVD, set groups file as follows:

#vi /etc/clustershell/groups.d/local.cfg

all: rhel[1-24],cdsw1.cisco.com, cdsw2.cisco.com, cdsw3.cisco.com, cdsw4.cisco.com

Note: For more information and documentation on ClusterShell, visit https://github.com/cea-hpc/clustershell/wiki/UserAndProgrammingGuide.

Note: clustershell will not work if not ssh to the machine earlier (as it requires to be in known_hosts file), for instance, as in the case below for rhel<host>.

Installing httpd

Setting up RHEL repo on the admin node requires httpd. To set up RHEL repository on the admin node, complete the following steps:

1. Install httpd on the admin node to host repositories.

The Red Hat repository is hosted using HTTP on the admin node, this machine is accessible by all the hosts in the cluster.

#yum –y install httpd

2. Add ServerName and make the necessary changes to the server configuration file.

#vi /etc/httpd/conf/httpd.conf

ServerName 10.13.1.50:80

3. Start httpd

#service httpd start

#chkconfig httpd on

Set Up All Nodes to use the RHEL Repository

Note: Based on this repo file yum requires httpd to be running on rhel1 for other nodes to access the repository.

1. Copy the rheliso.repo to all the nodes of the cluster.

#clush -w rhel[1-24],cdsw[1-4].cisco.com -b -c /var/www/html/rhelrepo/rheliso.repo --dest=/etc/yum.repos.d/

2. Also copy the /etc/hosts file to all nodes.

#clush –a -b -c /etc/hosts –-dest=/etc/hosts

3. Purge the yum caches after this

#clush -a -B yum clean all

#clush –a –B yum repolist

Note: While suggested configuration is to disable SELinux as shown below, if for any reason SELinux needs to be enabled on the cluster, then ensure to run the following to make sure that the httpd is able to read the Yum repofiles.

#chcon -R -t httpd_sys_content_t /var/www/html/

Configuring DNS

This section details setting up DNS using dnsmasq as an example based on the /etc/hosts configuration setup in the earlier section.

To create the host file across all the nodes in the cluster, complete the following steps:

1. Disable Network manager on all nodes.

#clush -a -b service NetworkManager stop

#clush -a -b chkconfig NetworkManager off

2. Update /etc/resolv.conf file to point to Admin Node.

#vi /etc/resolv.conf

nameserver 10.13.1.50

Note: Cloudera CDSW requires Wildcard entry dns configurations as detailed in section Cloudera Data Science Workbench.

Note: Alternatively #systemctl start NetworkManager.service can be used to start the service. #systemctl stop NetworkManager.service can be used to stop the service. Use #systemctl disable NetworkManager.service to stop a service from being automatically started at boot time.

3. Install and Start dnsmasq on Admin node

#service dnsmasq start

#chkconfig dnsmasq on

4. Deploy /etc/resolv.conf from the admin node (rhel1) to all the nodes via the following clush command:

#clush -a -B -c /etc/resolv.conf

Note: A clush copy without –dest copies to the same directory location as the source-file directory.

5. Make sure DNS is working fine by running the following command on Admin node and any data-node

[root@rhel2 ~]# nslookup rhel1

Server: 10.13.1.50

Address: 10.13.1.50#53

Name: rhel1

Address: 10.13.1.50 ç

Note: yum install –y bind-utils will need to be run for nslookup to utility to run.

Upgrading the Cisco Network Driver for VIC1387

The latest Cisco Network driver is required for performance and updates. The latest drivers can be downloaded from the link below:

https://software.cisco.com/download/home/286318800/type/283853158/release/3.1%25283%2529

In the ISO image, the required driver kmod-enic-2.3.0.44-rhel7u4.el7.x86_64.rpm can be located at \Linux\Network\Cisco\VIC\RHEL\RHEL7.4.

1. From a node connected to the Internet, download, extract and transfer kmod-enic-2.3.0.44-rhel7u4.el7.x86_64.rpm to rhel1 (admin node).

2. Install the rpm on all nodes of the cluster using the following clush commands. For this example, the rpm is assumed to be in present working directory of rhel1.

3. [root@rhel1 ~]# clush -a -b -c kmod-enic-2.3.0.44-rhel7u4.el7.x86_64.rpm

4. [root@rhel1 ~]# clush -a -b "rpm –ivh kmod-enic-2.3.0.44-rhel7u4.el7.x86_64.rpm "

5. Make sure that the above installed version of kmod-enic driver is being used on all nodes by running the command "modinfo enic" on all nodes

[root@rhel1 ~]# clush -a -B "modinfo enic | head -5"

6. Also it is recommended to download the kmod-megaraid driver for higher performance , the RPM can be found in the same package at \Linux\Storage\LSI\Cisco_Storage_12G_SAS_RAID_controller\RHEL\RHEL7.4

Installing xfsprogs

From the admin node rhel1 run the command shown below to Install xfsprogs on all the nodes for xfs filesystem.

#clush -a -B yum -y install xfsprogs

NTP Configuration

The Network Time Protocol (NTP) is used to synchronize the time of all the nodes within the cluster. The Network Time Protocol daemon (ntpd) sets and maintains the system time of day in synchronism with the timeserver located in the admin node (rhel1). Configuring NTP is critical for any Hadoop Cluster. If server clocks in the cluster drift out of sync, serious problems will occur with HBase and other services.

#clush –a –b "yum –y install ntp"

Note: Installing an internal NTP server keeps your cluster synchronized even when an outside NTP server is inaccessible.

1. Configure /etc/ntp.conf on the admin node only with the following contents:

#vi /etc/ntp.conf

driftfile /var/lib/ntp/drift

restrict 127.0.0.1

restrict -6 ::1

server 127.127.1.0

fudge 127.127.1.0 stratum 10

includefile /etc/ntp/crypto/pw

keys /etc/ntp/keys

2. Create /root/ntp.conf on the admin node and copy it to all nodes:

#vi /root/ntp.conf

server 10.13.1.50

driftfile /var/lib/ntp/drift

restrict 127.0.0.1

restrict -6 ::1

includefile /etc/ntp/crypto/pw

keys /etc/ntp/keys

3. Copy ntp.conf file from the admin node to /etc of all the nodes by executing the following commands in the admin node (rhel1)

#for SERVER in {50..73}; do scp /root/ntp.conf 10.13.1.$SERVER:/etc/ntp.conf; done

#for SERVER in {250..253}; do scp /root/ntp.conf 10.13.1.$SERVER:/etc/ntp.conf; done

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_108.png

4. Run the following to syncronize the time and restart NTP daemon on all nodes.

#clush -a -b "service ntpd stop"

#clush -a -b "ntpdate rhel1"

#clush -a -b "service ntpd start"

5. Make sure to restart of NTP daemon across reboots:

#clush –a –b "systemctl enable ntpd"

Alternatively, the new Chrony service can be installed, that is quicker to synchronize clocks in mobile and virtual systems.

6. Install the Chrony service:

# yum install -y chrony

7. Activate the Chrony service at boot:

8. # systemctl enable chronyd

9. Start the Chrony service:

# systemctl start chronyd

The Chrony configuration is in the /etc/chrony.conf file, configured similar to /etc/ntp.conf.

Enabling Syslog

Syslog must be enabled on each node to preserve logs regarding killed processes or failed jobs. Modern versions such as syslog-ng and rsyslog are possible, making it more difficult to be sure that a syslog daemon is present. One of the following commands should suffice to confirm that the service is properly configured:

#clush -B -a rsyslogd –v

#clush -B -a service rsyslog status

Setting ulimit

On each node, ulimit -n specifies the number of inodes that can be opened simultaneously. With the default value of 1024, the system appears to be out of disk space and shows no inodes available. This value should be set to 64000 on every node.

Higher values are unlikely to result in an appreciable performance gain.

1. For setting the ulimit on Redhat, edit /etc/security/limits.conf on admin node rhel1 and add the following lines:

root soft nofile 64000

root hard nofile 64000

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_109.jpg

2. Copy the /etc/security/limits.conf file from admin node (rhel1) to all the nodes using the following command.

#clush -a -b -c /etc/security/limits.conf --dest=/etc/security/

3. Make sure that the /etc/pam.d/su file contains the following settings:

#%PAM-1.0

auth sufficient pam_rootOK.so

# Uncomment the following line to implicitly trust users in the "wheel" group.

#auth sufficient pam_wheel.so trust use_uid

# Uncomment the following line to require a user to be in the "wheel" group.

#auth required pam_wheel.so use_uid

auth include system-auth

account sufficient pam_succeed_if.so uid = 0 use_uid quiet

account include system-auth

password include system-auth

session include system-auth

session optional pam_xauth.so

Note: The ulimit values are applied on a new shell, running the command on a node on an earlier instance of a shell will show old values.

Disabling SELinux

SELinux must be disabled during the install procedure and cluster setup. SELinux can be enabled after installation and while the cluster is running.

SELinux can be disabled by editing /etc/selinux/config and changing the SELINUX line to SELINUX=disabled. The following command will disable SELINUX on all nodes.

#clush -a -b "sed –i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config"

#clush –a –b "setenforce 0"

Note: The above command may fail if SELinux is already disabled.

Reboot the machine, if needed for SELinux to be disabled in case it does not take effect. It can checked using

#clush –a –b sestatus

Set TCP Retries

Adjusting the tcp_retries parameter for the system network enables faster detection of failed nodes. Given the advanced networking features of UCS, this is a safe and recommended change (failures observed at the operating system layer are most likely serious rather than transitory). On each node, set the number of TCP retries to 5 can help detect unreachable nodes with less latency.

1. Edit the file /etc/sysctl.conf and on admin node rhel1 and add the following lines:

net.ipv4.tcp_retries2=5

2. Copy the /etc/sysctl.conf file from admin node (rhel1) to all the nodes using the following command:

#clush -a -b -c /etc/sysctl.conf --dest=/etc/

3. Load the settings from default sysctl file /etc/sysctl.conf by running.

#clush -B -a sysctl -p

Disabling the Linux Firewall

The default Linux firewall settings are far too restrictive for any Hadoop deployment. Since the UCS Big Data deployment will be in its own isolated network there is no need for that additional firewall.

#clush -a -b " firewall-cmd --zone=public --add-port=80/tcp --permanent"

#clush -a -b "firewall-cmd --reload"

#clush –a –b “systemctl disable firewalld”

Disable Swapping

1. In order to reduce Swapping, run the following on all nodes. Variable vm.swappiness defines how often swap should be used, 60 is default.

#clush -a -b " echo 'vm.swappiness=1' >> /etc/sysctl.conf"

2. Load the settings from default sysctl file /etc/sysctl.conf.

#clush –a –b "sysctl –p"

Disable Transparent Huge Pages

Disabling Transparent Huge Pages (THP) reduces elevated CPU usage caused by THP.

#clush -a -b "echo never > /sys/kernel/mm/transparent_hugepage/enabled”

#clush -a -b "echo never > /sys/kernel/mm/transparent_hugepage/defrag"

1. The above commands must be run for every reboot, so copy this command to /etc/rc.local so they are executed automatically for every reboot.

2. On the Admin node, run the following commands:

#rm –f /root/thp_disable

#echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >>

/root/thp_disable

#echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag " >>

/root/thp_disable

3. Copy file to each node:

#clush –a –b –c /root/thp_disable

4. Append the content of file thp_disable to /etc/rc.local:

#clush -a -b “cat /root/thp_disable >> /etc/rc.local”

Disable IPv6 Defaults

1. Disable IPv6 as the addresses used are IPv4.

#clush -a -b "echo 'net.ipv6.conf.all.disable_ipv6 = 1' >> /etc/sysctl.conf"

#clush -a -b "echo 'net.ipv6.conf.default.disable_ipv6 = 1' >> /etc/sysctl.conf"

#clush -a -b "echo 'net.ipv6.conf.lo.disable_ipv6 = 1' >> /etc/sysctl.conf"

2. Load the settings from default sysctl file /etc/sysctl.conf.

#clush –a –b "sysctl –p"

Configuring Data Drives on Name Node And Other Management Nodes

This section describes steps to configure non-OS disk drives as RAID1 using StorCli command as described below. All the drives are going to be part of a single RAID1 volume. This volume can be used for Staging any client data to be loaded to HDFS. This volume will not be used for HDFS data.

1. From the website download storcli https://www.broadcom.com/support/download-search/?pg=&pf=&pn=&po=&pa=&dk=storcli

2. Extract the zip file and copy storcli-007.0504.0000.0000-1.noarch.rpm from the linux directory.

3. Download storcli and its dependencies and transfer to Admin node.

#scp storcli-007.0504.0000.0000-1.noarch.rpm rhel1:/root/

4. Copy storcli rpm to all the nodes using the following commands:

#clush -a -b -c /root/storcli-007.0504.0000.0000-1.noarch.rpm --dest=/root/

5. Run the below command to install storcli on all the nodes

#clush -a -b “rpm -ivh storcli-007.0504.0000.0000-1.noarch.rpm”

6. Run the below command to copy storcli64 to root directory.

#cd /opt/MegaRAID/storcli/

#cp storcli64 /root/

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_112.png

7. Copy storcli64 to all the nodes using the following commands:

#clush -a -b -c /root/storcli64 --dest=/root/

8. Run the following script as root user on rhel1 to rhel3 to create the virtual drives for the management nodes.

#vi /root/raid1.sh

./storcli64 -cfgldadd r1[$1:1,$1:2,$1:3,$1:4,$1:5,$1:6,$1:7,$1:8,$1:9,$1:10,$1:11,$1:12,$1:13,$1:14,$1:15,$1:16,$1:17,$1:18,$1:19,$1:20,$1:21,$1:22,$1:23,$1:24,$1:25,$1:26] wb ra nocachedbadbbu strpsz1024 -a0

The above script requires enclosure ID as a parameter.

9. Run the following command to get enclosure id.

#./storcli64 pdlist -a0 | grep Enc | grep -v 252 | awk '{print $4}' | sort | uniq -c | awk '{print $2}'

#chmod 755 raid1.sh

10. Run MegaCli script as follows.

#./raid1.sh <EnclosureID> obtained by running the command above

WB: Write back

RA: Read Ahead

NoCachedBadBBU: Do not write cache when the BBU is bad.

Strpsz1024: Strip Size of 1024K

Note: The command above will not override any existing configuration. To clear and reconfigure existing configurations refer to Embedded MegaRAID Software Users Guide available at www.broadcom.com.

Cloudera recommends the following disk configuration for the master nodes:

· At least 10 physical disks in following configuration

· 2 x RAID1 OS (Root disk)

· 4 x RAID 10 (DB filesystems)

· 2 x RAID 1 HDFS NameNode metadata

· 1 x JBOD - ZooKeeper

· 1 x JBOD - Quorum JournalNode

Configuring Data Drives on Data Nodes

To configure non-OS disk drives as individual RAID0 volumes using StorCli command, complete the following steps. These volumes are going to be used for HDFS Data.

1. Issue the following command from the admin node to create the virtual drives with individual RAID 0 configurations on all the data nodes.

#clush –w rhel[4-24] -B ./storcli64 -cfgeachdskraid0 WB RA direct NoCachedBadBBU strpsz1024 -a0

WB: Write back

RA: Read Ahead

NoCachedBadBBU: Do not write cache when the BBU is bad.

Strpsz1024: Strip Size of 1024K

Note: Create Raid 1 for the Cloudera Data Science workbench nodes as shown as management nodes.

Note: The command above will not override existing configurations. To clear and reconfigure existing configurations refer to Embedded MegaRAID Software Users Guide available at www.broadcom.com.

Configuring the Filesystem for NameNodes and Datanodes

The following script will format and mount the available volumes on each node whether it is Namenode or Data node. OS boot partition is going to be skipped. All drives are mounted based on their UUID as /data/disk1, /data/disk2, etc. To configure the filesystem for NameNodes and DataNodes, complete the following steps:

1. On the Admin node, create a file containing the following script.

2. To create partition tables and file systems on the local disks supplied to each of the nodes, run the following script as the root user on each node.

Note: The script assumes there are no partitions already existing on the data volumes. If there are partitions, delete them before running the script. This process is documented in the "Note" section at the end of the section.

#vi /root/driveconf.sh

#!/bin/bash

[[ "-x" == "${1}" ]] && set -x && set -v && shift 1

count=1

for X in /sys/class/scsi_host/host?/scan

echo '- - -' > ${X}

done

for X in /dev/sd?

list+=$(echo $X " ")

done

for X in /dev/sd??

list+=$(echo $X " ")

done

for X in $list

echo "========"

echo $X

echo "========"

if [[ -b ${X} && `/sbin/parted -s ${X} print quit|/bin/grep -c boot` -

ne 0

]]

then

echo "$X bootable - skipping."

continue

else

Y=${X##*/}1

echo "Formatting and Mounting Drive => ${X}"

166

/sbin/mkfs.xfs –f ${X}

(( $? )) && continue

#Identify UUID

UUID=`blkid ${X} | cut -d " " -f2 | cut -d "=" -f2 | sed 's/"//g'`

/bin/mkdir -p /data/disk${count}

(( $? )) && continue

echo "UUID of ${X} = ${UUID}, mounting ${X} using UUID on

/data/disk${count}"

/bin/mount -t xfs -o inode64,noatime,nobarrier -U ${UUID}

/data/disk${count}

(( $? )) && continue

echo "UUID=${UUID} /data/disk${count} xfs inode64,noatime,nobarrier 0

0" >> /etc/fstab

((count++))

done

Note: Do not run this script on the Cloudera Data Science Workbench nodes

3. Run the following command to copy driveconf.sh to all the nodes:

#chmod 755 /root/driveconf.sh

#clush –a -B –c /root/driveconf.sh

4. Run the following command from the admin node to run the script across all data nodes

#clush –a –B /root/driveconf.sh

5. Run the following from the admin node to list the partitions and mount points

#clush –a -B df –h

#clush –a -B mount

#clush –a -B cat /etc/fstab

Note: If there is a need to delete any partitions, it can be done so using the following.

6. Run the mount command (‘mount’) to identify which drive is mounted to which device /dev/sd<?>

7. umount the drive for which partition is to be deleted and run fdisk to delete as shown below.

Note: Care should be taken not to delete the OS partition as this will wipe out the OS.

#mount

#umount /data/disk1 ç (disk1 shown as example)

#(echo d; echo w;) | sudo fdisk /dev/sd<?>

Cluster Verification

This section describes the steps to create the script cluster_verification.sh that helps to verify the CPU, memory, NIC, and storage adapter settings across the cluster on all nodes. This script also checks additional prerequisites such as NTP status, SELinux status, ulimit settings, JAVA_HOME settings and JDK version, IP address and hostname resolution, Linux version and firewall settings.

1. Create the script cluster_verification.sh as shown, on the Admin node (rhel1).

#vi cluster_verification.sh

#!/bin/bash

shopt -s expand_aliases,

# Setting Color codes

green='\e[0;32m'

red='\e[0;31m'

NC='\e[0m' # No Color

echo -e "${green} === Cisco UCS Integrated Infrastructure for Big Data and Analytics \ Cluster Verification === ${NC}"

echo ""

echo -e "${green} ==== System Information ==== ${NC}"

echo ""

echo -e "${green}System ${NC}"

clush -a -B " `which dmidecode` |grep -A2 '^System Information'"

echo ""

echo -e "${green}BIOS ${NC}"

clush -a -B " `which dmidecode` | grep -A3 '^BIOS I'"

echo ""

echo -e "${green}Memory ${NC}"

clush -a -B "cat /proc/meminfo | grep -i ^memt | uniq"

echo ""

echo -e "${green}Number of Dimms ${NC}"

clush -a -B "echo -n 'DIMM slots: '; `which dmidecode` |grep -c \ '^[[:space:]]*Locator:'"

clush -a -B "echo -n 'DIMM count is: '; `which dmidecode` | grep \ "Size"| grep -c "MB""

clush -a -B " `which dmidecode` | awk '/Memory Device$/,/^$/ {print}' |\ grep -e '^Mem' -e Size: -e Speed: -e Part | sort -u | grep -v -e 'NO \ DIMM' -e 'No Module Installed' -e Unknown"

echo ""

# probe for cpu info #

echo -e "${green}CPU ${NC}"

clush -a -B "grep '^model name' /proc/cpuinfo | sort -u"

echo ""

clush -a -B "`which lscpu` | grep -v -e op-mode -e ^Vendor -e family -e\ Model: -e Stepping: -e BogoMIPS -e Virtual -e ^Byte -e '^NUMA node(s)'"

echo ""

# probe for nic info #

echo -e "${green}NIC ${NC}"

clush -a -B "`which ifconfig` | egrep '(^e|^p)' | awk '{print \$1}' | \ xargs -l `which ethtool` | grep -e ^Settings -e Speed"

echo ""

clush -a -B "`which lspci` | grep -i ether"

echo ""

# probe for disk info #

echo -e "${green}Storage ${NC}"

clush -a -B "echo 'Storage Controller: '; `which lspci` | grep -i -e \ raid -e storage -e lsi"

echo ""

clush -a -B "dmesg | grep -i raid | grep -i scsi"

echo ""

clush -a -B "lsblk -id | awk '{print \$1,\$4}'|sort | nl"

echo ""

echo -e "${green} ================ Software ======================= ${NC}"

echo ""

echo -e "${green}Linux Release ${NC}"

clush -a -B "cat /etc/*release | uniq"

echo ""

echo -e "${green}Linux Version ${NC}"

clush -a -B "uname -srvm | fmt"

echo ""

echo -e "${green}Date ${NC}"

clush -a -B date

echo ""

echo -e "${green}NTP Status ${NC}"

clush -a -B "ntpstat 2>&1 | head -1"

echo ""

echo -e "${green}SELINUX ${NC}"

clush -a -B "echo -n 'SElinux status: '; grep ^SELINUX= \ /etc/selinux/config 2>&1"

echo ""

clush -a -B "echo -n 'CPUspeed Service: '; `which service` cpuspeed \ status 2>&1"

clush -a -B "echo -n 'CPUspeed Service: '; `which chkconfig` --list \ cpuspeed 2>&1"

echo ""

echo -e "${green}Java Version${NC}"

clush -a -B 'java -version 2>&1; echo JAVA_HOME is ${JAVA_HOME:-Not \ Defined!}'

echo ""

echo -e "${green}Hostname LoOKup${NC}"

clush -a -B " ip addr show"

echo ""

echo -e "${green}Open File Limit${NC}"

clush -a -B 'echo -n "Open file limit(should be >32K): "; ulimit -n'

2. Change permissions to executable.

chmod 755 cluster_verification.sh

3. Run the Cluster Verification tool from the admin node. This can be run before starting Hadoop to identify any discrepancies in Post OS Configuration between the servers or during troubleshooting of any cluster / Hadoop issues.

#./cluster_verification.sh

Installing Cloudera

Cloudera’s Distribution including Apache Hadoop (CDH) is an enterprise grade, hardened Hadoop distribution. CDH offers Apache Hadoop and several related projects into a single tested and certified product. It offers the latest innovations from the open source community with the testing and quality expected from enterprise quality software.

Prerequisites for CDH Installation

This section details the prerequisites for CDH installation such as setting up CDH Repo.

Cloudera Manager Repository

1. From a host connected to the Internet, download the Cloudera’s repositories as shown below and transfer it to the admin node.

#mkdir -p /tmp/clouderarepo/

2. Download Cloudera Manager Repository.

#cd /tmp/clouderarepo/

#wget http:/ /archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo

#reposync --config=./cloudera-manager.repo --repoid=cloudera-manager

This downloads the Cloudera Manager RPMs needed for the Cloudera repository.

3. Run the following command to move the RPMs

4. Copy the repository directory to the admin node (rhel1)

#scp -r /tmp/clouderarepo/ rhel1:/var/www/html/

5. On admin node (rhel1) run create repo command.

#cd /var/www/html/clouderarepo/

#createrepo --baseurl http://10.13.1.50/clouderarepo/cloudera-manager/

/var/www/html/clouderarepo/cloudera-manager

Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_Cloudera_28node_113.png

Note: Visit http://10.13.1.50/clouderarepo/ to verify the files.

6. Create the Cloudera Manager repo file with following contents:

#vi /var/www/html/clouderarepo/cloudera-manager/cloudera-manager.repo

[cloudera-manager]

name=Cloudera Manager

baseurl=http://10.13.1.50/clouderarepo/cloudera-manager/

gpgcheck=0

enabled=1

7. Copy the file cloudera-manager.repo into /etc/yum.repos.d/ on the admin node to enable it to find the packages that are locally hosted.

#cp /var/www/html/clouderarepo/cloudera-manager/cloudera-manager.repo /etc/yum.repos.d/

8. From the admin node copy the repo files to /etc/yum.repos.d/ of all the nodes of the cluster.

#clush –a –B -c /etc/yum.repos.d/cloudera-manager.repo

Setting Up the Local Parcels for CDH 5.13.0

From a host connected the internet, download the appropriate CDH 5.13.0 parcels that are meant for RHEL7.4 from the URL: http://archive.cloudera.com/cdh5/parcels/ and place them in the directory "/var/www/html/CDH5.13.0parcels" of the Admin node.

The following are the relevant files for RHEL7.4:

· CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel

· CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha1 and

· manifest.json

Downloading Parcels

From a host connected to the Internet, download the Cloudera’s parcels as shown below and transfer it to the admin node.

#mkdir -p /tmp/clouderarepo/CDH5.13.0parcels

1. Download parcels:

#cd /tmp/clouderarepo/CDH5.13.0parcels

#wget http://archive.cloudera.com/cdh5/parcels/5.13.0/CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel

#wget http://archive.cloudera.com/cdh5/parcels/5.13.0/CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha1

#wget http://archive.cloudera.com/cdh5/parcels/5.13.0/manifest.json

2. Now edit the /tmp/clouderarepo/CDH5.13.0parcels/manifest.json file and remove the scripts that are not meant for RHEL7.4. Below is that script which can be copy and pasted.

Note: Please make sure the script starts and end with initial additional braces.

{

"lastUpdated": 15075981980000,

"parcels": [

{

"parcelName": "CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel",

"components": [

{

"pkg_version": "0.7.0+cdh5.13.0+0",