FlexPod Datacenter with Red Hat OpenShift Bare Metal IaC Configuration with Cisco UCS X-Series Direct Deployment Guide

Available Languages

Download Options

  • PDF
    (4.3 MB)
    View with Adobe Reader on a variety of devices
  • ePub
    (3.2 MB)
    View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
  • Mobi (Kindle)
    (2.0 MB)
    View on Kindle device or Kindle app on multiple devices

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Available Languages

Download Options

  • PDF
    (4.3 MB)
    View with Adobe Reader on a variety of devices
  • ePub
    (3.2 MB)
    View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
  • Mobi (Kindle)
    (2.0 MB)
    View on Kindle device or Kindle app on multiple devices
 

 

Published: May 2025

A logo for a companyDescription automatically generated

Related image, diagram or screenshot

In partnership with:

A black text on a white backgroundDescription automatically generated

About the Cisco Validated Design Program

The Cisco Validated Design (CVD) program consists of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. For more information, go to: http://www.cisco.com/go/designzone.

Executive Summary

The FlexPod Datacenter solution is a validated design for deploying Cisco and NetApp technologies and products to build shared private and public cloud infrastructure. Cisco and NetApp have partnered to deliver a series of FlexPod solutions that enable strategic data center platforms. The success of the FlexPod solution is driven through its ability to evolve and incorporate both technology and product innovations in the areas of management, compute, storage, and networking. This document explains the deployment details of Red Hat OpenShift on FlexPod Bare Metal Infrastructure. Some of the key advantages of FlexPod Datacenter with Red Hat OpenShift Bare Metal are:

●    Consistent Configuration: having a standard method for deploying Red Hat OpenShift on FlexPod Bare Metal infrastructure provides a consistent platform to run containers and virtualized workloads including CPU and GPU accelerated AI/ML workloads, software and models, and OpenShift Virtualization, all side by side on the same infrastructure.

●    Simpler and programmable infrastructure: the entire underlying infrastructure can be configured using infrastructure as code delivered using Ansible.

●    End-to-End 100Gbps Ethernet: utilizing the 5th Generation Cisco UCS VICs and the 5th Generation Cisco UCS S9108 Fabric Interconnects (FIs) to deliver 100Gbps Ethernet from the server through the network to the storage.

●    Cisco Intersight Management: Cisco Intersight Managed Mode (IMM) is used to manage the Cisco UCS S9108 FIs and Cisco UCS X-Series Servers. Additionally, Cisco Intersight integrates with NetApp Active IQ Unified Manager and Cisco Nexus switches as described in the following sections.

●    Built for investment protections: design ready for future technologies such as liquid cooling and high-Wattage CPUs; CXL (Compute Express Link)-ready.

In addition to the FlexPod-specific hardware and software innovations, the integration of the Cisco Intersight cloud platform with NetApp Active IQ Unified Manager, and Cisco Nexus switches delivers monitoring and, orchestration capabilities for different layers (storage and networking) of the FlexPod infrastructure. Implementation of this integration at this point in the deployment process would require Cisco Intersight Assist and NetApp Active IQ Unified Manager to be deployed outside of the FlexPod.

For information about the FlexPod design and deployment details, including the configuration of various elements of design and associated best practices, refer to Cisco Validated Designs for FlexPod, here: https://www.cisco.com/c/en/us/solutions/design-zone/data-center-design-guides/flexpod-design-guides.html.

Solution Overview

This chapter contains the following:

●   Introduction

●   Audience

●   Purpose of this Document

●   New in this Release

Introduction

The FlexPod Datacenter with Red Hat OpenShift on Bare Metal configuration represents a cohesive and flexible infrastructure solution that combines computing hardware, networking, and storage resources into a single, integrated architecture. Designed as a collaborative effort between Cisco and NetApp, this converged infrastructure platform is engineered to deliver high levels of efficiency, scalability, and performance, suitable for a multitude of datacenter workloads. By standardizing on a validated design, organizations can accelerate deployment, reduce operational complexities, and confidently scale their IT operations to meet evolving business demands. The FlexPod architecture leverages Cisco's Unified Computing System (UCS) servers, Cisco Nexus networking, and NetApp's innovative storage systems, providing a robust foundation for both virtualized and non-virtualized environments.

Audience

The intended audience of this document includes but is not limited to IT architects, sales engineers, field consultants, professional services, IT managers, partner engineering, and customers who want to take advantage of an infrastructure built to deliver IT efficiency and enable IT innovation.

Purpose of this Document

This document provides deployment guidance around bringing up the FlexPod Datacenter with Red Hat OpenShift on Bare Metal infrastructure. This configuration is built as a tenant on top of FlexPod Base and assumes FlexPod Base has already been configured. This document introduces various design elements and explains various considerations and best practices for a successful deployment.

New in this Release

The following design elements distinguish this version of FlexPod from previous models:

●    IaC Configuration of Red Hat OpenShift Bare Metal as a tenant on top of FlexPod Base. This document is the first example of a FlexPod tenant on top of FlexPod Base that aligns with the tenant defined in FlexPod Zero Trust Framework Design Guide.

●    Configuration of a platform that will support both Containerized Applications, such as AI applications and Virtual Machines on the same platform.

Deployment Hardware and Software

This chapter contains the following:

●   Design Requirements

●   Physical Topology

●   Software Revisions

Design Requirements

The FlexPod Datacenter with Cisco UCS and Cisco Intersight meets the following general design requirements:

●   Resilient design across all layers of the infrastructure with no single point of failure

●   Scalable design with the flexibility to add compute capacity, storage, or network bandwidth as needed

●   Modular design that can be replicated to expand and grow as the needs of the business grow

●   Flexible design that can support different models of various components with ease

●   Simplified design with the ability to integrate and automate with external automation tools

●   Cloud-enabled design which can be configured, managed, and orchestrated from the cloud using GUI or APIs

To deliver a solution which meets all these design requirements, various solution components are connected and configured as covered in the upcoming sections.

Physical Topology

The FlexPod Datacenter with Red Hat OpenShift on Bare Metal infrastructure configuration is built using the following hardware components:

●   Cisco UCS X9508 Chassis with six Cisco UCS X210C M7 Compute Nodes and 2 UCS X440p PCIe Nodes each containing 2 NVIDIA L40S GPUs

●   Fifth-generation Cisco UCS S9108 Fabric Interconnects to support 100GbE and 25GbE connectivity from various components

●   High-speed Cisco NX-OS-based Nexus 93600CD-GX switching design to support 100GE and 400GE connectivity

●   NetApp AFF C800 end-to-end NVMe storage with 25G or 100G Ethernet and (optional) 32G Fibre Channel connectivity

The software components of this solution consist of:

●   Cisco Intersight to deploy, maintain, and support the Cisco UCS server components

●   Cisco Intersight SaaS platform to maintain and support the FlexPod components

●   Cisco Intersight Assist Virtual Appliance to help connect NetApp ONTAP and Cisco Nexus switches with Cisco Intersight

●   NetApp Active IQ Unified Manager to monitor and manage the storage and for NetApp ONTAP integration with Cisco Intersight

●   Red Hat OpenShift which provides a platform for both containers and VMs

FlexPod Datacenter with Red Hat OpenShift on Bare Metal Infrastructure with Cisco UCS X-Series Direct Topology

Figure 1 shows various hardware components and the network connections for this IP-based FlexPod design.

Figure 1.          FlexPod Datacenter Physical Topology for IP-based Storage Access

Related image, diagram or screenshot

The reference hardware configuration includes:

●   Two Cisco Nexus 93600CD-GX Switches in Cisco NX-OS mode provide the switching fabric. Other Cisco Nexus Switches are also supported.

●   Two Cisco UCS S9108 Fabric Interconnects (FIs) in the chassis provide the chassis connectivity. At least two 100 Gigabit Ethernet ports from each FI, configured as a Port-Channel, are connected to each Nexus 93600CD-GX switch. 25 Gigabit Ethernet connectivity is also supported as well as other versions of the Cisco UCS FI that would be used with Intelligent Fabric Modules (IFMs) in the chassis.

●   One Cisco UCS X9508 Chassis contains six Cisco UCS X210C M7 servers and 2 Cisco UCS X440p PCIe Nodes each with two NVIDIA L40S GPUs. Other configurations of servers with and without GPUs are also supported.

●   One NetApp AFF C800 HA pair connects to the Cisco Nexus 93600CD-GX Switches using two 100 GE ports from each controller configured as a Port-Channel. 25 Gigabit Ethernet connectivity is also supported as well as other NetApp AFF, ASA, and FAS storage controllers.

Red Hat OpenShift on Bare Metal Server Configuration

A simple Red Hat OpenShift cluster consists of at least five servers – three Control-Plane Nodes and two or more Worker Nodes where applications and VMs are run. In this lab validation three Worker Nodes were utilized. Based on OpenShift published requirements, the three Control Plane Nodes were configured with 64GB RAM, and the three Worker Nodes were configured with 768GB RAM to handle containerized applications and VMs.

An alternative configuration where all servers have the same amount of memory and CPUs, is to combine the control-plane and worker roles on the first three servers and then to assign only the worker role to the remaining servers. This configuration would require a minimum of three servers and notes throughout the document will explain deviations in the process for this configuration.

Each Node was booted from M.2. Both a single M.2 module and 2 M.2 modules with RAID1 are supported. Also, the servers paired with X440p PCIe Nodes were configured as Workers. From a networking perspective, both the Control-Plane Nodes and the Workers were configured with a single vNIC with UCS Fabric Failover in the Bare Metal or Management VLAN. The Workers were configured with extra NICs (vNICs) to allow storage attachment to the Workers. Each worker had two additional vNICs with the iSCSI A and B VLANs configured as native to allow iSCSI persistent storage attachment and future iSCSI boot. These same vNICs also had the NVMe-TCP A and B allowed VLANs assigned, allowing tagged VLAN interfaces for NVMe-TCP to be defined on the Workers. Finally, each worker had one additional vNIC with the OpenShift NFS VLAN configured as native to provide NFS persistent storage.

VLAN Configuration

Table 1 lists VLANs configured for setting up the FlexPod environment along with their usage.

Table 1.      VLAN Usage

VLAN ID

Name

Usage

IP Subnet used in this deployment

2*

Native-VLAN

Use VLAN 2 as native VLAN instead of default VLAN (1)

 

1020*

OOB-MGMT-VLAN

Out-of-band management VLAN to connect management ports for various devices

10.102.0.0/24; GW: 10.102.0.254

1022

OCP-BareMetal-MGMT

Routable OpenShift Bare Metal VLAN used for OpenShift cluster and node management

10.102.2.0/24; GW: 10.102.2.254

3012

OCP-iSCSI-A

Used for OpenShift iSCSI Persistent Storage

192.168.12.0/24

3022

OCP-iSCSI-B

Used for OpenShift iSCSI Persistent Storage

192.168.22.0/24

3032

OCP-NVMe-TCP-A

Used for OpenShift NVMe-TCP Persistent Storage (Optional)

192.168.32.0/24

3042

OCP-NVMe-TCP-B

Used for OpenShift NVMe-TCP Persistent Storage (Optional)

192.168.42.0/24

3052

OCP-NFS

Used for OpenShift NFS RWX Persistent Storage (Not Available with NetApp ASA)

192.168.52.0/24

Note:      *VLANs configured in FlexPod Base.

Note:      S3 object storage was also used in this environment but requires a routable subnet. In order to avoid having two default gateways on the OpenShift nodes, S3 was placed on the OCP-BareMetal-MGMT subnet and VLAN. A separate VLAN and subnet was not defined for S3.

Table 2 lists the VMs or bare metal servers necessary for deployment as outlined in this document.

Table 2.      Virtual Machines

Virtual Machine Description

VLAN

IP Address

Comments

OCP AD1

1022

10.102.2.249

Hosted on pre-existing management infrastructure within the FlexPod

OCP AD2

1022

10.102.2.250

Hosted on pre-existing management infrastructure within the FlexPod

OCP Installer

1022

10.102.2.10

Hosted on pre-existing management infrastructure within the FlexPod

NetApp Active IQ Unified Manager

1021

10.102.1.97

Hosted on pre-existing management infrastructure within the FlexPod

Cisco Intersight Assist Virtual Appliance

1021

10.102.1.96

Hosted on pre-existing management infrastructure within the FlexPod

Software Revisions

Table 3 lists the software revisions for various components of the solution.

Table 3.      Software Revisions

Layer

Device

Image Bundle

Comments

Compute

Cisco UCS Fabric Interconnect S9108

4.3(5.240191)

 

 

Cisco UCS X210C M7

5.3(5.250001)

 

Network

Cisco Nexus 93600CD-GX NX-OS

10.4(4)M

 

Storage

NetApp AFF C800

ONTAP 9.16.1

Latest patch release

Software

Red Hat OpenShift

4.17

 

NetApp Trident

25.02.1

 

NetApp DataOps Toolkit

2.5.0

 

Cisco Intersight Assist Appliance

1.1.1-1

1.1.1-0 initially installed and then automatically upgraded

NetApp Active IQ Unified Manager

9.16

 

NVIDIA L40S GPU Driver

550.144.03

 

FlexPod Cabling

The information in this section is provided as a reference for cabling the physical equipment in a FlexPod environment. To simplify cabling requirements, a cabling diagram was used.

The cabling diagram in this section contains the details for the prescribed and supported configuration of the NetApp AFF C800 running NetApp ONTAP 9.16.1.

Note:      For any modifications of this prescribed architecture, consult the NetApp Interoperability Matrix Tool (IMT).

Note:      This document assumes that out-of-band management ports are plugged into an existing management infrastructure at the deployment site. These interfaces will be used in various configuration steps.

Note:      Be sure to use the cabling directions in this section as a guide.

The NetApp storage controller and disk shelves should be connected according to best practices for the specific storage controller and disk shelves. For disk shelf cabling, refer to NetApp Support.

Figure 2 details the cable connections used in the validation lab for the FlexPod topology based on the Cisco UCS S9108 fabric interconnect directly in the chassis. Two 100Gb links connect each Cisco UCS Fabric Interconnect to the Cisco Nexus Switches and each NetApp AFF controller to the Cisco Nexus Switches. Additional 1Gb management connections will be needed for one or more out-of-band network switches that sit apart from the FlexPod infrastructure. Each Cisco UCS fabric interconnect and Cisco Nexus switch is connected to the out-of-band network switches, and each AFF controller has a connection to the out-of-band network switches. Layer 3 network connectivity is required between the Out-of-Band (OOB) and In-Band (IB) Management Subnets.

Figure 2.          FlexPod Cabling with Cisco UCS S9108 X-Series Direct Fabric Interconnects

Related image, diagram or screenshot 

FlexPod Base Configuration

The OpenShift Tenant is intended to be built on top of FlexPod Base and can coexist with other tenants. If FlexPod base has not been installed on the FlexPod, use FlexPod Datacenter Base Configuration using IaC with Cisco IMM and NetApp ONTAP to install FlexPod Base. Note that the OpenShift Tenant is an IP-only solution, but that other tenants utilizing Fibre Channel could be installed on the FlexPod. When FlexPod Base is installed, use it to configure all the available FlexPod components.

One part of the FlexPod Base installation is installing an Ansible VM or machine that is used to run the Ansible playbooks. For FlexPod Base, in the .ansible.cfg file in the user directory, “jinja2_native=True” was set. For running the FlexPod OpenShift Tenant scripts, this parameter needs to be commented out as shown below.

cat ~/.ansible.cfg

[defaults]

interpreter_python=/usr/bin/python3.11

#jinja2_native=True

Clone the GitHub Repository

You need to use a GitHub repository from one public location; the first step in the process is to clone the GitHub collection named FlexPod-IMM-OpenShift (https://github.com/ucs-compute-solutions/FlexPod-IMM-OpenShift.git) to a new empty folder on the Ansible workstation. Cloning the repository creates a local copy, which is then used to run the playbooks that have been created for this solution.

Step 1.      From the Ansible workstation, change directories to the folder where the Ansible collections are located – something like /home/admin/ansible.

Step 2.      Clone the GitHub collection using the following command:

git clone https://github.com/ucs-compute-solutions/FlexPod-IMM-OpenShift.git

Step 3.      Change directories to the new folder named FlexPod-IMM-OpenShift.

Network Switch Configuration

This chapter contains the following:

●   Cisco Nexus Switch Ansible Configuration

This chapter provides a detailed procedure for using an Ansible playbook to configure the Cisco Nexus 93600CD-GX switches for use in a FlexPod with Red Hat OpenShift on Bare Metal environment.

Note:      The following procedures describe how to configure the Cisco Nexus switches for use in the OpenShift Bare Metal FlexPod environment. This procedure assumes the use of Cisco Nexus 9000 10.4(4)M.

●   The following procedure includes the setup of NTP distribution on the bare metal VLAN. The interface-vlan feature and ntp commands are used to set this up.

●   This procedure adds the tenant VLANs to the appropriate port-channels.

Cisco Nexus Switch Ansible Configuration

Procedure 1.       Configure the Cisco Nexus switches from the Ansible workstation

Step 1.      Add Nexus switch ssh keys to /home/admin/.ssh/known_hosts. Adjust known_hosts as necessary if errors occur:

ssh admin@<nexus-A-mgmt0-ip>

exit

ssh admin@<nexus-B-mgmt0-ip>

exit

Step 2.      Edit the following variable files to ensure proper Cisco Nexus variables are entered:

●     FlexPod-IMM-OpenShift/group_vars/all.yml

●     FlexPod-IMM-OpenShift /group_vars/secrets.yml

●     FlexPod-IMM-OpenShift /group_vars/nexus.yml

●     FlexPod-IMM-OpenShift /inventory

●     FlexPod-IMM-OpenShift /host_vars/n9kA.yml

●     FlexPod-IMM-OpenShift /host_vars/n9kB.yml

Note:      Port-channel numbers in FlexPod-IMM-OpenShift/group_vars/nexus.yml should be the same as setup in FlexPod Base.

Step 3.      From FlexPod-IMM-OpenShift, run the Setup_Nexus.yml Ansible playbook:

ansible-playbook ./Setup_Nexus.yml -i inventory

Step 4.      The following commands can be used to see the switch configuration and status:

show run

show vpc
show vlan

show port-channel summary

show ntp peer-status

show cdp neighbors

show lldp neighbors

show run int

show int

show udld neighbors

show int status

NetApp ONTAP Storage Configuration

This chapter contains the following:

●     NetApp ONTAP Storage Ansible Configuration

This chapter provides a detailed procedure for using an Ansible playbook to configure the NetApp AFF C800 storage for use in a FlexPod with Red Hat OpenShift on Bare Metal environment.

Note:      The following procedures describe how to configure the NetApp ONTAP storage for use in the OpenShift Bare Metal FlexPod environment. This procedure assumes the use of NetApp AFF C800 running ONTAP 9.16.1 software version.

●   The following procedure includes the creation of dedicated IPspace for the OpenShift tenant, then creating relevant broadcast-domains, VLANs, adding VLANs to corresponding broadcast-domains.

●   This procedure creates an SVM for OpenShift tenant and creates/enables the required services (NFS, iSCSI etc.) on the SVM.

●   The following procedure includes the creation of logical interfaces (LIFs) for storage access

Note:      The ONTAP Ansible playbook also provides ONTAP S3 configuration for the OpenShift tenant SVM.

NetApp ONTAP Storage Ansible Configuration

Procedure 1.       Configure the NetApp ONTAP Storage for the OpenShift Tenant

Step 1.      Edit the following variable files to ensure proper NetApp ONTAP storage variables are entered:

●     FlexPod-IMM-OpenShift/group_vars/all.yml

●     FlexPod-IMM-OpenShift /group_vars/secrets.yml

●     FlexPod-IMM-OpenShift /group_vars/ontap

●     FlexPod-IMM-OpenShift /inventory

●     FlexPod-IMM-OpenShift /vars/ontap_main.yml

Step 2.      From FlexPod-IMM-OpenShift, run the Setup_ONTAP.yml Ansible playbook with the associated tag for this section:

ansible-playbook ./Setup_ONTAP.yml -i inventory -t ontap_config

Note:      Use the -vvv tag to see detailed execution output log.

Cisco Intersight Managed Mode Configuration

This chapter contains the following:

●   Set up Cisco Intersight Resource Group

●   Set up Cisco Intersight Organization

●   Add Intersight IMM Pools and OpenShift VLANs

●   Add Intersight IMM Server Policies

●   Add Intersight IMM Server Profile Templates

●   Clone and Adjust Server Profile Templates

●   Derive Server Profiles

The Cisco Intersight platform is a management solution delivered as a service with embedded analytics for Cisco and third-party IT infrastructures. The Cisco Intersight Managed Mode (also referred to as Cisco IMM or Intersight Managed Mode) is an architecture that manages Cisco Unified Computing System (Cisco UCS) fabric interconnect–attached systems through a Redfish-based standard model. Cisco Intersight managed mode standardizes both policy and operation management for Cisco UCS C-Series M7 and Cisco UCS X210c M7 compute nodes used in this deployment guide.

Cisco UCS B-Series M6 servers, connected and managed through Cisco UCS FIs, are also supported by IMM. For a complete list of supported platforms, go to: https://www.cisco.com/c/en/us/td/docs/unified_computing/Intersight/b_Intersight_Managed_Mode_Configuration_Guide/b_intersight_managed_mode_guide_chapter_01010.html

Procedure 1.       Set up Cisco Intersight Resource Group

In this procedure, a Cisco Intersight resource group for the Red Hat OpenShift tenant is created where resources will be logically grouped. In FlexPod Base, a Resource Group for the entire FlexPod was setup. In our lab example, this was AA02-rg. In this deployment, a tenant resource group (AA02-OCP-rg) is created to host all the tenant resources, but you can choose to create multiple resource groups for granular control of the resources.

Step 1.      Log into Cisco Intersight.

Step 2.      Select System.

Step 3.      Click Resource Groups on the left.

Step 4.      Click + Create Resource Group in the top-right corner.

Step 5.      Provide a name for the Resource Group (for example, AA02-OCP-rg).

Step 6.      Under Resources, select Custom.

Step 7.      Select all resources that are connected to this Red Hat OpenShift FlexPod tenant.

Note:      If more than one FlexPod tenant is sharing the FIs, a subset of the servers can be assigned to the Resource Group.

Related image, diagram or screenshot

Step 8.      Click Create.

Procedure 2.       Set Up Cisco Intersight Organization

In this procedure, an Intersight organization for the Red Hat OpenShift tenant is created where all Cisco Intersight Managed Mode configurations including policies are defined. In FlexPod Base, just like with Resource Groups, in our lab validation, the FlexPod Base Organization was AA02, and the OpenShift Tenant Organization is AA02-OCP.

Step 1.      Log into the Cisco Intersight portal.

Step 2.      Select System.

Step 3.      Click Organizations on the left.

Step 4.      Click + Create Organization in the top-right corner.

Step 5.      Provide a name for the organization (for example, AA02-OCP), optionally select Share Resources with Other Organizations, and click Next.

Step 6.      Select the Resource Group created in the last step (for example, AA02-OCP-rg) and click Next.

Step 7.      Click Create.

A screenshot of a computerDescription automatically generated

Procedure 3.       Add Intersight IMM Pools and OpenShift VLANs

This procedure adds the necessary Intersight IMM Pools and add the OpenShift VLANs to the Fabric Interconnects.

Step 1.      Edit the following variable files to ensure proper Cisco Intersight IMM variables are entered:

●     FlexPod-IMM-OpenShift/group_vars/all.yml

●     FlexPod-IMM-OpenShift/group_vars/secrets.yml

●     FlexPod-IMM-OpenShift/group_vars/ucs.yml

●     FlexPod-IMM-OpenShift/SecretKey.txt

●     FlexPod-IMM-OpenShift/roles/UCS-IMM/create_pools/defaults/main.yml

Step 2.      From FlexPod-IMM-OpenShift, run the Setup_IMM_Pools.yml Ansible playbook.

ansible-playbook ./Setup_IMM_Pools.yml

Procedure 4.       Add Intersight IMM Server Policies

The Setup_IMM_Server_Policies.yml playbook is designed to be run more than once if you have servers with different CPU types (Intel or AMD) or server generations (M6, M7, or M8). The different settings will generate different BIOS policies for each type of machine. It is important to run the Setup_IMM_Server_Policies.yml playbook and the Setup_IMM_Server_Profile_Templates.yml playbooks in succession before changing the CPU type or server generation in the FlexPod-IMM-OpenShift/group_vars/ucs.yml  file and running both playbooks again.

Step 1.      Edit the following variable files to ensure proper Cisco Intersight IMM variables are entered:

●     FlexPod-IMM-OpenShift/group_vars/all.yml

●     FlexPod-IMM-OpenShift/group_vars/secrets.yml

●     FlexPod-IMM-OpenShift/group_vars/ucs.yml

●     FlexPod-IMM-OpenShift/SecretKey.txt

●     FlexPod-IMM-OpenShift/roles/create_server_policies/defaults/main.yml

Step 2.      From FlexPod-IMM-OpenShift, run the Setup_IMM_Server_Policies.yml Ansible playbook:

ansible-playbook ./Setup_IMM_Policies.yml

Procedure 5.       Add Intersight IMM Server Profile Templates

The Setup_IMM_Server_Profile_Templates.yml playbook is designed to be run immediately after the Setup_IMM_Server_Policies.yml  playbook is run to create Server Profile Templates for a particular CPU type and server generation. Both a blade (X- or B-Series) and rack (C-Series) server profile template will be created.

Step 1.      Edit the following variable files to ensure proper Cisco Intersight IMM variables are entered:

●     FlexPod-IMM-OpenShift/group_vars/all.yml

●     FlexPod-IMM-OpenShift/group_vars/secrets.yml

●     FlexPod-IMM-OpenShift/group_vars/ucs.yml

●     FlexPod-IMM-OpenShift/SecretKey.txt

●     FlexPod-IMM-OpenShift/roles/create_server_profile_template/defaults/main.yml

Step 2.      From FlexPod-IMM-OpenShift, run the Setup_IMM_Server_Profile_Templates.yml Ansible playbook:

ansible-playbook ./Setup_IMM_Profile_Templates.yml

Step 3.      If you have additional servers with different CPU types or different generations, go back to Add Intersight IMM Server Policies and run the two playbooks again.

Procedure 6.       Clone and Adjust Server Profile Templates

The server profile templates created above assume that each server has two M.2 cards and an M.2 RAID controller. If you have any servers with just one M.2 card, you can clone a template created by the Ansible playbooks and adjust it for one M.2 card.

In this example, there is one Cisco UCS X210c M7 that has only one M.2 card and uses that machine as a Worker node.

Step 1.      In Cisco Intersight under Configure > Templates > UCS Server Profile Templates, click the to the right of the <prefix>-Worker-Intel-M7-Blade-SPT template and select Clone.

Step 2.      Make sure the correct Destination Organization is selected and click Next.

Step 3.      Adjust the Clone Name (for example, <prefix>-Worker-Intel-M7-Blade-1M.2-SPT) and Description as needed and click Clone.

Step 4.      From the Templates window, click the to the right of the newly created clone and click Edit.

Step 5.      Click Next until you get to Storage Configuration. Place the mouse over the M.2-RAID-Storage-Policy and click the X to delete the M.2-RAID-Storage-Policy.

Step 6.      Click Next and Close to save this template.

Complete the Cisco UCS IMM Setup

Procedure 1.       Derive Server Profiles

Step 1.      From the Configure > Templates page, to the right of the OCP-Control-Plane template, click and select Derive Profiles.

Note:      If using combined control-plane and worker nodes, use the OCP-Worker template for all nodes.

Step 2.      Under the Server Assignment, select Assign Now and select the three Cisco UCS X210c M7 servers that will be used as OpenShift Control-Plane Nodes.

Related image, diagram or screenshot

Step 3.      Click Next.

Step 4.      For the Profile Name Prefix, enter the first part of the OpenShift Control-Plane Node hostnames (for example, control. Set Start Index for Suffix to 0 (zero). The three server Names should now correspond to the OpenShift Control-Plane Node hostnames.

Related image, diagram or screenshot

Step 5.      Click Next.

Step 6.      Click Derive to derive the OpenShift Control-Plane Node Server Profiles.

Step 7.      Select Profiles on the left and then select the UCS Server Profiles tab.

Step 8.      Select the three OpenShift Control-Plane Node profiles and then click the at the top or bottom of the list and select Deploy.

Step 9.      Select Reboot Immediately to Activate and click Deploy.

Step 10.  Repeat this process to create three OpenShift Worker Node Server Profiles using the OCP-Worker-Template.

OpenShift Installation and Configuration

This chapter contains the following:

●     OpenShift – Installation Requirements

●     Prerequisites

●     Network Requirements

●     Deploy NetApp Trident

●     NetApp DataOps Toolkit

●     Add an Additional Administrative User to the OpenShift Cluster

●     Back up Cluster etcd

●     Add a Worker Node to an OpenShift Cluster

●     Deploy a Sample Containerized Application

OpenShift 4.17 is deployed on the Cisco UCS infrastructure as M.2 booted bare metal servers. The Cisco UCS X210C M7 servers need to be equipped with an M.2 controller (SATA or NVMe) card and either 1 or 2 identical M.2 drives. Three control-plane nodes and three worker nodes are deployed in the validation environment and additional worker nodes can easily be added to increase the scalability of the solution. This document will guide you through the process of using the Assisted Installer to deploy OpenShift 4.17.

OpenShift – Installation Requirements

The Red Hat OpenShift Assisted Installer provides support for installing OpenShift on bare metal nodes. This guide provides a methodology to achieving a successful installation using the Assisted Installer.

Prerequisites

The FlexPod for OpenShift utilizes the Assisted Installer for OpenShift installation therefore when provisioning and managing the FlexPod infrastructure, you must provide all the supporting cluster infrastructure and resources, including an installer VM or host, networking, storage, and individual cluster machines.

The following supporting cluster resources are required for the Assisted Installer installation:

●     The control plane and compute machines that make up the cluster

●     Cluster networking

●     Storage for the cluster infrastructure and applications

●     The Installer VM or Host

Network Requirements

The following infrastructure services need to be deployed to support the OpenShift cluster, during the validation of this solution we have provided VMs on your hypervisor of choice to run the required services. You can use existing DNS and DHCP services available in the data center.

There are various infrastructure services prerequisites for deploying OpenShift 4.16. These prerequisites are as follows:

●     DNS and DHCP services – these services were configured on Microsoft Windows Server VMs in this validation

●     NTP Distribution was done with the Cisco Nexus switches

●     Specific DNS entries for deploying OpenShift – added to the DNS server

●     A Linux VM for initial automated installation and cluster management – a Rocky Linux 9 / RHEL 9 VM with appropriate packages

NTP

Each OpenShift node in the cluster must have access to at least two NTP servers.

NICs

vNICs configured on the Cisco UCS servers based on the design previously discussed.

DNS

Clients access the OpenShift cluster nodes over the bare metal network. Configure a subdomain or subzone where the canonical name extension is the cluster name.

The following domain and OpenShift cluster names are used in this deployment guide:

●     Base Domain: flexpodb4.cisco.com

●     OpenShift Cluster Name: ocp

The DNS domain name for the OpenShift cluster should be the cluster name followed by the base domain, for example, ocp.flexpodb4.cisco.com.

Table 4 lists the information for fully qualified domain names used during validation. The API and Nameserver addresses begin with canonical name extensions. The hostnames of the control plane and worker nodes are exemplary, so you can use any host naming convention you prefer.

Table 4.      DNS FQDN Names Used

Usage

Hostname

IP Address

API

api.ocp.flexpodb4.cisco.com

10.102.2.228

Ingress LB (apps)

*.apps.ocp.flexpodb4.cisco.com

10.102.2.229

control0

control0.ocp.flexpodb4.cisco.com

10.102.2.211

control1

control1.ocp.flexpodb4.cisco.com

10.102.2.212

control2

control2.ocp.flexpodb4.cisco.com

10.102.2.213

worker0

worker0.ocp.flexpodb4.cisco.com

10.102.2.214

worker1

worker1.ocp.flexpodb4.cisco.com

10.102.2.215

worker2

worker2.ocp.flexpodb4.cisco.com

10.102.2.216

DHCP

For the bare metal network, a network administrator must reserve several IP addresses, including:

●     One IP address for the API endpoint

●     One IP address for the wildcard Ingress endpoint

●     One IP address for each control-plane node (DHCP server assigns to the node)

●     One IP address for each worker node (DHCP server assigns to the node)

Note:      Obtain the MAC addresses of the bare metal Interfaces from the UCS Server Profile for each node to be used in the DHCP configuration to assign reserved IP addresses (reservations) to the nodes. The KVM IP address also needs to be gathered for the control-plane and worker nodes from the server profiles.

Procedure 1.       Gather MAC Addresses of Node Bare Metal Interfaces

Step 1.      Log into Cisco Intersight.

Step 2.      Select Configure > Profiles > Server Profile (for example, ocp-worker2).

Step 3.      In the center pane, select Inventory > Network Adapters > Network Adapter (for example, UCSX-ML-V5D200G).

Step 4.      In the center pane, select Interfaces.

Step 5.      Record the MAC address for NIC Interface eno5.

Step 6.      Select the General tab and select Identifiers in the center pane.

Step 7.      Record the Management IP assigned out of the OCP-BareMetal-IP-Pool.

Table 5 lists the IP addresses used for the OpenShift cluster including bare metal network IPs and UCS KVM Management IPs for IPMI or Redfish access.

Table 5.      Host BMC Information

Hostname

IP Address

UCS KVM Mgmt. IP Address

BareMetal MAC Address (eno5)

control0.ocp.flexpodb4.cisco.com

10.102.2.211

10.102.2.241

00:25:B5:A2:0A:80

control1.ocp.flexpodb4.cisco.com

10.102.2.212

10.102.2.240

00:25:B5:A2:0A:81

control2.ocp.flexpodb4.cisco.com

10.102.2.213

10.102.2.239

00:25:B5:A2:0A:82

worker0.ocp.flexpodb4.cisco.com

10.102.2.214

10.102.2.243

00:25:B5:A2:0A:83

worker1.ocp.flexpodb4.cisco.com

10.102.2.215

10.102.2.244

00:25:B5:A2:0A:85

worker2.ocp.flexpodb4.cisco.com

10.102.2.216

10.102.2.242

00:25:B5:A2:0A:87

Step 8.      From Table 5, enter the hostnames, IP addresses, and MAC addresses as reservations in your DHCP and DNS server(s) or configure the DHCP server to dynamically update DNS.

Step 9.      You will also need to pipe VLAN interfaces for up to all six storage VLANs into your DHCP server(s) and assign IPs in the storage networks on those interfaces. Then create a DHCP scope for each storage VLAN and subnet where the IPs assigned by the scope do not overlap with storage LIF IPs. Either enter the nodes in the DNS server or configure the DHCP server to forward entries to the DNS server. For the cluster nodes, create reservations to map the hostnames to the desired IP addresses.

Step 10.  Setup either a VM or spare server as an OCP-Installer machine with the network interface connected to the Bare Metal VLAN and install either Red Hat Enterprise Linux (RHEL) 9.5 or Rocky Linux 9.5 “Server with GUI” and create an administrator user. Once the VM or host is up and running, update it and install and configure XRDP. Also, install Google Chrome onto this machine. Connect to this host with a Windows Remote Desktop client as the admin user.

Procedure 2.       Install Red Hat OpenShift using the Assisted Installer

Use the following steps to install OpenShift from the OCP-Installer VM.

Step 1.      From the Installer desktop, open a terminal session and create an SSH key pair to use to communicate with the OpenShift hosts:

ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519

Step 2.      Copy the public SSH key to the user directory:

cp ~/.ssh/id_ed25519.pub ~/

Step 3.      Add the private key to the ssh-agent:

eval "$(ssh-agent)"                              

ssh-add ~/.ssh/id_ed25519

Step 4.      Launch Chrome and connect to https://console.redhat.com/openshift/cluster-list. Log into your Red Hat account.

Step 5.      Click Create cluster to create an OpenShift cluster.

Step 6.      Select Datacenter and then select Bare Metal (x86_64).

Step 7.      Select Interactive to launch the Assisted Installer.

Step 8.      Provide the cluster name and base domain. Select the latest OpenShift 4.17 version. Scroll down and click Next.

A screenshot of a computerAI-generated content may be incorrect.

Related image, diagram or screenshot

 

Step 9.      It is not necessary to install any Operators at this time, they can be added later. Click Next.

Step 10.  Click Add hosts.

Step 11.  Under Provisioning type, from the drop-down list select the Minimal image file. Under SSH public key, click Browse and browse to, select, and open the id_ed25519.pub file. The contents of the public key should now appear in the box. Click Generate Discovery ISO.

Related image, diagram or screenshot

Step 12.  If your Cisco UCS Servers have the Intersight Advantage license installed, click Add hosts from Cisco Intersight. If you do not have the Advantage license or you do not wish to use the Cisco Intersight Integration, skip to Step 15.

Related image, diagram or screenshot

Step 13.  A Cisco Intersight tab will appear in Chrome. Log into Intersight and select the appropriate account. Select the appropriate Organization (AA02-OCP). Click the pencil icon to select the servers for the OpenShift installation. In the list on the right, select the servers to install OpenShift onto and click Save. In the lower right-hand corner, click Execute. The Workflow will mount the Discovery ISO from the Red Hat Cloud and reboot the servers into the Discovery ISO.

A screenshot of a computerAI-generated content may be incorrect.

Step 14.  Back in the Red Hat Hybrid Cloud Console, click Close to close the Add hosts popup. Skip to Step 22 below.

Step 15.  Click Download Discovery ISO to download the Discovery ISO into the Downloads directory. Click Close when the download is done.

Step 16.  Copy the Discovery ISO to an http server. Use a web browser to get a copy of the URL for the Discovery ISO.

Step 17.  Use Chrome to connect to Cisco Intersight and log into the Intersight account previously set up.

Step 18.  Go to Configure > Policies and edit the Virtual Media policy attached to your OpenShift server profiles. Once on the Policy Details page, click Add Virtual Media.

Step 19.  In the Add Virtual Media dialogue, leave CDD selected and select HTTP/HTTPS. Provide a name for the mount and add the URL for File Location.

Related image, diagram or screenshot

Step 20.  Click Add. Click Save & Deploy then click Save & Proceed. It is not necessary to reboot the hosts to add the vMedia mount. Click Deploy. Wait for each of the six servers to complete deploying the profile.

Step 21.  Go to Configure > Profiles > UCS Server Profiles. Once all six server profiles have a status of OK, click the to the right of each profile and select Server Actions > Power > Power Cycle then Power Cycle to reboot each of the six servers. If the M.2 drives or virtual drives are blank, the servers should boot from the Discovery ISO. This can be monitored with the vKVM if desired.

Step 22.  Once all six servers have booted “RHEL CoreOS (Live)” from the Discovery ISO, they will appear in the Assisted Installer under Host discovery. Use the drop-down lists under Role to assign the appropriate server roles. Scroll down and click Next.

Note:      If using combined control-plane and worker nodes, enable Run workloads on control plane nodes. When the “Control pane node” role is selected, it will also include the “Worker” role.

A screenshot of a computerDescription automatically generated 

Step 23.  Expand each node and verify CoreOS and OpenShift is being installed to sda (the M.2 device). Click Next.

Step 24.  Under Network Management, make sure Cluster-Managed Networking is selected. Under Machine network, from the drop-down list select the subnet for the BareMetal VLAN. Enter the API IP for the api.cluster.basedomain entry in the DNS servers. For the Ingress IP, enter the IP for the *.apps.cluster.basedomain entry in the DNS servers.

Related image, diagram or screenshot

Step 25.  Scroll down. All nodes should all have a status of Ready. Click Next.

Related image, diagram or screenshot

Step 26.  Review the information and click Install cluster to begin the cluster installation.

Related image, diagram or screenshot

Step 27.  On the Installation progress page, expand the Host inventory. The installation will take 30-45 minutes. When installation is complete, all nodes will show a Status of Installed.

Related image, diagram or screenshot

Step 28.  Select Download kubeconfig to download the kubeconfig file. In a terminal window, setup a cluster directory and save credentials:

cd
mkdir <clustername> # for example, ocp
cd <clustername>
mkdir auth
cd auth
mv ~/Downloads/kubeconfig ./
mkdir ~/.kube
cp kubeconfig ~/.kube/config

Step 29.  In the Assisted Installer, click the icon to copy the kubeadmin password:

echo <paste password> > ./kubeadmin-password

Step 30.  In a new tab in Chrome, connect to https://access.redhat.com/downloads/content/290. Download the OpenShift Linux Client for the version of OpenShift that you installed:

cd ..
mkdir client
cd client
ls ~/Downloads
mv ~/Downloads/oc-x.xx.x-linux.tar.gz ./
tar xvf oc-x.xx.x-linux.tar.gz
ls
sudo mv oc /usr/local/bin/
sudo mv kubectl /usr/local/bin/
oc get nodes

Step 31.  To enable oc tab completion for bash, run the following:

oc completion bash > oc_bash_completion
sudo mv oc_bash_completion /etc/bash_completion.d/

Step 32.  If you used the Cisco UCS Integration in the OpenShift installation process, connect to Cisco Intersight and from Configure > Profiles > UCS Server Profiles, select all OpenShift Server Profiles. Click the at either the top or bottom of the column and select Deploy. It is not necessary to reboot the servers, only select the second or lower check box and click Deploy.

Step 33.  If you did not use the Cisco UCS Integration in the OpenShift installation process, in Cisco Intersight, edit the Virtual Media policy and remove the link to the Discovery ISO. Click Save & Deploy and then click Save & Proceed. Do not select “Reboot Immediately to Activate.” Click Deploy. The virtual media mount will be removed from the servers without rebooting them.

Step 34.  In Chrome or Firefox, in the Assisted Installer page, click Launch OpenShift Console to launch the OpenShift Console. Use kubeadmin and the kubeadmin password to login. On the left, go to Compute > Nodes to see the status of the OpenShift nodes.

A screenshot of a computerAI-generated content may be incorrect.

Step 35.  In the Red Hat OpenShift console, go to Compute > Bare Metal Hosts. For each Bare Metal Host, click the ellipses to the right of the host and select Edit Bare Metal Host. Select Enable power management. Using Table 5, fill in the BMC Address. For an IPMI connection to the server, use the BMC IP Address. For a redfish connection to the server, use redfish://<BMC IP>/redfish/v1/Systems/<server Serial Number> and make sure to check Disable Certificate Verification. Also, make sure the Boot MAC Address matches the MAC address in Table 5. For the BMC Username and BMC Password, use what was entered into the Cisco Intersight Local User policy. Click Save to save the changes. Repeat this step for all Bare Metal Hosts.

Note:      If using redfish to connect to the server, it is critical to check the box Disable Certificate Verification. Related image, diagram or screenshot

Step 36.  Go to Compute > Bare Metal Hosts. Once all hosts have been configured, the Status should show “Externally provisioned,” and the Management Address should be populated. You can now manage power on the OpenShift hosts from the OpenShift console.

Related image, diagram or screenshot

Step 37.  Enable dynamic resource allocation for kubelet. On your installer VM, create a directory for resource allocation, place the following YAML files in it, and run the commands to create the configuration:

cat worker-kubeletconfig.yaml

apiVersion: machineconfiguration.openshift.io/v1

kind: KubeletConfig

metadata:

  name: dynamic-node

spec:

  autoSizingReserved: true

  machineConfigPoolSelector:

    matchLabels:

      pools.operator.machineconfiguration.openshift.io/worker: ""

cat control-plane-kubeletconfig.yaml

apiVersion: machineconfiguration.openshift.io/v1

kind: KubeletConfig

metadata:

  name: dynamic-node-control-plane

spec:

  autoSizingReserved: true

  machineConfigPoolSelector:

    matchLabels:

      pools.operator.machineconfiguration.openshift.io/master: ""

oc create -f worker-kubeletconfig.yaml
oc create -f control-plane-kubeletconfig.yaml

Step 38.  To setup NTP on the worker and control-plane nodes, and NVMe-TCP on the worker nodes, run the following:

cd

cd <cluster-name> # For example, ocp

mkdir machine-configs

cd machine-configs

curl https://mirror.openshift.com/pub/openshift-v4/clients/butane/latest/butane --output butane

chmod +x butane

Step 39.  Build the following files in the machine-configs directory with variations for your network:

cat 99-control-plane-chrony-conf-override.bu

variant: openshift

version: 4.17.0

metadata:

  name: 99-control-plane-chrony-conf-override

  labels:

    machineconfiguration.openshift.io/role: master

storage:

  files:

    - path: /etc/chrony.conf

      mode: 0644

      overwrite: true

      contents:

        inline: |

          driftfile /var/lib/chrony/drift

          makestep 1.0 3

          rtcsync

          logdir /var/log/chrony

          server 10.102.2.3 iburst

          server 10.102.2.4 iburst

 

cat 99-worker-chrony-conf-override.bu

variant: openshift

version: 4.17.0

metadata:

  name: 99-worker-chrony-conf-override

  labels:

    machineconfiguration.openshift.io/role: worker

storage:

  files:

    - path: /etc/chrony.conf

      mode: 0644

      overwrite: true

      contents:

        inline: |

          driftfile /var/lib/chrony/drift

          makestep 1.0 3

          rtcsync

          logdir /var/log/chrony

          server 10.102.2.3 iburst

          server 10.102.2.4 iburst

cat 99-worker-nvme-discovery.bu

variant: openshift

version: 4.17.0

metadata:

  name: 99-worker-nvme-discovery

  labels:

    machineconfiguration.openshift.io/role: worker

openshift:

  kernel_arguments:

    - loglevel=7

storage:

  files:

    - path: /etc/nvme/discovery.conf

      mode: 0644

      overwrite: true

      contents:

        inline: |

          --transport=tcp --traddr=192.168.32.51 --trsvcid=8009
          --transport=tcp --traddr=192.168.32.52 --trsvcid=8009
          --transport=tcp --traddr=192.168.42.51 --trsvcid=8009

          --transport=tcp --traddr=192.168.42.52 --trsvcid=8009

Step 40.  Create .yaml files from the butane files with butane, then load the configurations into OpenShift:

./butane 99-control-plane-chrony-conf-override.bu -o ./99-control-plane-chrony-conf-override.yaml
./butane 99-worker-chrony-conf-override.bu -o ./99-worker-chrony-conf-override.yaml
./butane 99-worker-nvme-discovery.bu -o ./99-worker-nvme-discovery.yaml

oc create -f 99-control-plane-chrony-conf-override.yaml

oc create -f 99-worker-chrony-conf-override.yaml

oc create -f 99-worker-nvme-discovery.yaml

Note:      If using combined control-plane and worker nodes, 99-control-plane-nvme-discovery.bu and 99-control-plane-nmve-discovery.yaml files will need to be created and loaded into OpenShift.

Step 41.  To enable iSCSI and multipathing on the workers, create the 99-worker-ontap-iscsi.yaml and upload as a machine config:

cat 99-worker-ontap-iscsi.yaml

apiVersion: machineconfiguration.openshift.io/v1

kind: MachineConfig

metadata:

  name: 99-worker-ontap-iscsi

  labels:

    machineconfiguration.openshift.io/role: worker

spec:

  config:

    ignition:

      version: 3.2.0

    storage:

      files:

      - contents:

          source: data:text/plain;charset=utf-8;base64,IyBkZXZpY2UtbWFwcGVyLW11bHRpcGF0aCBjb25maWd1cmF0aW9uIGZpbGUKCiMgRm9yIGEgY29tcGxldGUgbGlzdCBvZiB0aGUgZGVmYXVsdCBjb25maWd1cmF0aW9uIHZhbHVlcywgcnVuIGVpdGhlcjoKIyAjIG11bHRpcGF0aCAtdAojIG9yCiMgIyBtdWx0aXBhdGhkIHNob3cgY29uZmlnCgojIEZvciBhIGxpc3Qgb2YgY29uZmlndXJhdGlvbiBvcHRpb25zIHdpdGggZGVzY3JpcHRpb25zLCBzZWUgdGhlCiMgbXVsdGlwYXRoLmNvbmYgbWFuIHBhZ2UuCgpkZWZhdWx0cyB7Cgl1c2VyX2ZyaWVuZGx5X25hbWVzIHllcwoJZmluZF9tdWx0aXBhdGhzIG5vCn0KCmJsYWNrbGlzdCB7Cn0K

          verification: {}

        filesystem: root

        mode: 600

        overwrite: true

        path: /etc/multipath.conf

    systemd:

      units:

        - name: iscsid.service

          enabled: true

          state: started

        - name: multipathd.service

          enabled: true

          state: started

  osImageURL: ""

oc create -f 99-worker-ontap-iscsi.yaml

Note:      If using combined control-plane and worker nodes, the 99-control-plane-ontap-iscsi.yaml file will need to be created and loaded into OpenShift.

Note:      The Base 64 encoded source above is the following file (/etc/multipath.conf) encoded. It is necessary to set “find_multipaths” to no.

cat multipath.conf

# device-mapper-multipath configuration file

 

# For a complete list of the default configuration values, run either:

# # multipath -t

# or

# # multipathd show config

 

# For a list of configuration options with descriptions, see the

# multipath.conf man page.

 

defaults {

        user_friendly_names yes

        find_multipaths no

}

 

blacklist {

}

Step 42.  Over the next 20-30 minutes each of the nodes will go through the “Not Ready” state and reboot. You can monitor this by going to Compute > MachineConfigPools in the OpenShift Console. Wait until both pools have an Update status of “Up to date.”

A screenshot of a computerDescription automatically generated

Step 43.  The Kubernetes NMState Operator will be used to configure the storage networking interfaces on the workers (and also virtual machine connected interfaces if OpenShift Virtualization is installed). In the OpenShift Console, go to Operators > OperatorHub. In the search box, enter NMState and Kubernetes NMState Operator should appear. Click Kubernetes NMState Operator.

Related image, diagram or screenshot

Step 44.  Click Install. Leave all the defaults in place and click Install again. The operator will take a few minutes to install.

Step 45.  Once the operator is installed, click View Operator.

Step 46.  Select the NMState tab. On the right, click Create NMState. Leave all defaults in place and click Create. The nmstate will be created. You will also need to refresh the console because additional items will be added under Networking.

Related image, diagram or screenshot

Step 47.  In an NMState directory on the ocp-installer machine, create the following YAML files:

cat eno6.yaml

apiVersion: nmstate.io/v1

kind: NodeNetworkConfigurationPolicy

metadata:

  name: ocp-iscsi-a-policy

spec:

  nodeSelector:

    node-role.kubernetes.io/worker: ''

  desiredState:

    interfaces:

    - name: eno6

      description: Configuring eno6 on workers

      type: ethernet

      state: up

      ipv4:

        dhcp: true

        enabled: true

      ipv6:

        enabled: false

cat eno7.yaml

apiVersion: nmstate.io/v1

kind: NodeNetworkConfigurationPolicy

metadata:

  name: ocp-iscsi-b-policy

spec:

  nodeSelector:

    node-role.kubernetes.io/worker: ''

  desiredState:

    interfaces:

    - name: eno7

      description: Configuring eno7 on workers

      type: ethernet

      state: up

      ipv4:

        dhcp: true

        enabled: true

      ipv6:

        enabled: false

cat eno8.yaml   # If configuring NFS

apiVersion: nmstate.io/v1

kind: NodeNetworkConfigurationPolicy

metadata:

  name: ocp-nfs-policy

spec:

  nodeSelector:

    node-role.kubernetes.io/worker: ''

  desiredState:

    interfaces:

    - name: eno8

      description: Configuring eno8 on workers

      type: ethernet

      state: up

      ipv4:

        dhcp: true

        enabled: true

      ipv6:

        enabled: false

cat eno6.3032.yaml   # If configuring NVMe-TCP

apiVersion: nmstate.io/v1

kind: NodeNetworkConfigurationPolicy

metadata:

  name: ocp-nvme-tcp-a-policy

spec:

  nodeSelector:

     node-role.kubernetes.io/worker: ''

  desiredState:

    interfaces:

    - name: eno6.3032

      description: VLAN 3032 using eno6

      type: vlan

      state: up
      ipv4:

        dhcp: true

        enabled: true

      ipv6:

        enabled: false

      vlan:

        base-iface: eno6

        id: 3032

cat eno7.3042.yaml  # If configuring NVMe-TCP

apiVersion: nmstate.io/v1

kind: NodeNetworkConfigurationPolicy

metadata:

  name: ocp-nvme-tcp-b-policy

spec:

  nodeSelector:

     node-role.kubernetes.io/worker: ''

  desiredState:

    interfaces:

    - name: eno7.3042

      description: VLAN 3042 using eno7

      type: vlan

      state: up
      ipv4:

        dhcp: true

        enabled: true

      ipv6:

        enabled: false

      vlan:

        base-iface: eno7

        id: 3042

Step 48.  Add the Node Network Configuration Policies to the OpenShift cluster:

oc create -f eno6.yaml
oc create -f eno7.yaml
oc create -f eno8.yaml # If configuring NFS
oc create -f eno6.3032.yaml # If configuring NVMe-TCP
oc create -f eno7.3042.yaml # If configuring NVMe-TCP

Step 49.  The policies should appear under Networking > NodeNetworkConfigurationPolicy.

A screenshot of a computerAI-generated content may be incorrect.

Note:      If using combined control-plane and worker nodes, since all nodes have the worker role, the node selector will apply these policies to all nodes.

Step 50.  Using ssh core@<node IP>, connect to each of the worker nodes and use the ifconfig -a and chronyc sources commands to verify the correct network and NTP setup of the servers.

Procedure 3.       Install the NVIDIA GPU Operator (optional)

If you have GPUs installed in your Cisco UCS servers, you need to install the Node Feature Discovery (NFD) Operator to detect NVIDIA GPUs and the NVIDIA GPU Operator to make these GPUs available to containers and virtual machines.

Step 1.      In the OpenShift web console, click Operators > OperatorHub.

Step 2.      Type Node Feature in the Filter box and then click the Node Feature Discovery Operator with Red Hat in the upper right corner. Click Install.

Step 3.      Do not change any settings and click Install.

Step 4.      When the Install operator is ready for use, click View Operator.

Step 5.      In the bar to the right of Details, click NodeFeatureDiscovery.

Step 6.      Click Create NodeFeatureDiscovery.

Step 7.      Click Create.

Step 8.      When the nfd-instance has a status of Available, Upgradeable, select Compute > Nodes.

Step 9.      Select a node that has one or more GPUs and then select Details.

Step 10.  The following label should be present on the host:

Related image, diagram or screenshot

Note:      This label should appear on all nodes with GPUs.

Step 11.  Return to Operators > OperatorHub.

Step 12.  Type NVIDIA in the Filter box and then click on the NVIDIA GPU Operator. Click Install.

Step 13.  Do not change any settings and click Install.

Step 14.  When the Install operator is ready for use, click View Operator.

Step 15.  In the bar to the right of Details, click ClusterPolicy.

Step 16.  Click Create ClusterPolicy.

Step 17.  Do not change any settings and scroll down and click Create. This will install the latest GPU driver.

Step 18.  Wait for the gpu-cluster-policy Status to become Ready.

Step 19.  Connect to a terminal window on the OCP-Installer machine. Type the following commands. The output shown is for two servers that are equipped with GPUs:

oc project nvidia-gpu-operator

Now using project "nvidia-gpu-operator" on server "https://api.ocp.flexpodb4.cisco.com:6443".

 

oc get pods

NAME                                                  READY   STATUS      RESTARTS        AGE

gpu-feature-discovery-jmlbr                           1/1     Running     0               6m45s

gpu-feature-discovery-l2l6n                           1/1     Running     0               6m41s

gpu-operator-6656d9fbf-wkkfm                          1/1     Running     0               11m

nvidia-container-toolkit-daemonset-gb8d9              1/1     Running     0               6m45s

nvidia-container-toolkit-daemonset-t4xdf              1/1     Running     0               6m41s

nvidia-cuda-validator-lc8zr                           0/1     Completed   0               4m33s

nvidia-cuda-validator-zxvnx                           0/1     Completed   0               4m39s

nvidia-dcgm-exporter-k6tnp                            1/1     Running     2 (4m7s ago)    6m41s

nvidia-dcgm-exporter-vb66w                            1/1     Running     2 (4m20s ago)   6m45s

nvidia-dcgm-hfgz2                                     1/1     Running     0               6m45s

nvidia-dcgm-qwm46                                     1/1     Running     0               6m41s

nvidia-device-plugin-daemonset-nr6m7                  1/1     Running     0               6m41s

nvidia-device-plugin-daemonset-rpvwr                  1/1     Running     0               6m45s

nvidia-driver-daemonset-416.94.202407231922-0-88zcr   2/2     Running     0               7m42s

nvidia-driver-daemonset-416.94.202407231922-0-bvph6   2/2     Running     0               7m42s

nvidia-node-status-exporter-bz79d                     1/1     Running     0               7m41s

nvidia-node-status-exporter-jgjbd                     1/1     Running     0               7m41s

nvidia-operator-validator-8fxqr                       1/1     Running     0               6m41s

nvidia-operator-validator-tbqtc                       1/1     Running     0               6m45s

Step 20.  Connect to one of the nvidia-driver-daemonset containers and view the GPU status:

oc exec -it nvidia-driver-daemonset-416.94.202407231922-0-88zcr -- bash

[root@nvidia-driver-daemonset-417 drivers]# nvidia-smi

Thu Mar  6 13:13:33 2025      

+-----------------------------------------------------------------------------------------+

| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |

|-----------------------------------------+------------------------+----------------------+

| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |

| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |

|                                         |                        |               MIG M. |

|=========================================+========================+======================|

|   0  NVIDIA L40S                    On  |   00000000:38:00.0 Off |                    0 |

| N/A   28C    P8             34W /  350W |       1MiB /  46068MiB |      0%      Default |

|                                         |                        |                  N/A |

+-----------------------------------------+------------------------+----------------------+

|   1  NVIDIA L40S                    On  |   00000000:D8:00.0 Off |                    0 |

| N/A   27C    P8             35W /  350W |       1MiB /  46068MiB |      0%      Default |

|                                         |                        |                  N/A |

+-----------------------------------------+------------------------+----------------------+

                                                                                         

+-----------------------------------------------------------------------------------------+

| Processes:                                                                              |

|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |

|        ID   ID                                                               Usage      |

|=========================================================================================|

|  No running processes found                                                             |

+-----------------------------------------------------------------------------------------+

Procedure 4.       Enable the GPU Monitoring Dashboard (Optional)

Step 1.      Using https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/enable-gpu-monitoring-dashboard.html, enable the GPU Monitoring Dashboard to monitor GPUs in the OpenShift Web-Console.

Deploy NetApp Trident

NetApp Trident is an open-source, fully supported storage orchestrator for containers and Kubernetes distributions. It was designed to help meet the containerized applications’ persistence demands using industry-standard interfaces, such as the Container Storage Interface (CSI). With Trident, microservices and containerized applications can take advantage of enterprise-class storage services provided by the NetApp portfolio of storage systems. More information about Trident can be found here: NetApp Trident Documentation. NetApp Trident can be installed via different methods. In this solution we will discuss installing the NetApp Trident version 25.2.1 using Trident Operator (installed using OperatorHub).

Trident Operator is a component used to manage the lifecycle of Trident. The operator simplifies the deployment, configuration, and management of Trident. The Trident operator is supported with OpenShift version 4.10 and above.

Note:      In this solution, we validated NetApp Trident with the ontap-nas driver and ontap-nas-flexgroup driver using the NFS protocol. We also validated the ontap-san driver for iSCSI and NVMe-TCP. Make sure to install only the backends and storage classes for the storage protocols you are using.

Procedure 1.       Install the NetApp Trident Operator

In this implementation NetApp Trident Operator minimally version 25.2.1 is installed.

Step 1.      In the OpenShift web console, click Operators > OperatorHub.

Step 2.      Type Trident in the Filter box and then click the NetApp Trident operator. Click Continue to accept the warning about Community Operators. Click Install.

Step 3.      Verify that at least Version 25.2.1 is selected. Click Install.

Step 4.      Once the operator is installed and ready for use, click View Operator.

Step 5.      In the bar to the right of Details, click Trident Orchestrator.

Step 6.      Click Create TridentOrchestrator. Click Create. Wait for the Status to become Installed.

Related image, diagram or screenshot

Step 7.      On the installer VM, check the Trident OpenShift pods after installation:

oc get pods -n trident

NAME                                  READY   STATUS    RESTARTS   AGE

trident-controller-5df9c4b4b5-sdlft   6/6     Running   0          7m57s

trident-node-linux-7pjfj              2/2     Running   0          7m57s

trident-node-linux-j4k92              2/2     Running   0          7m57s

trident-node-linux-kzb6n              2/2     Running   0          7m57s

trident-node-linux-q7ndq              2/2     Running   0          7m57s

trident-node-linux-tl2z8              2/2     Running   0          7m57s

trident-node-linux-vtfr6              2/2     Running   0          7m57s

Procedure 2.       Obtain tridentctl

Step 1.      From the OpenShift directory, download Trident software from GitHub and untar the .gz file to obtain the trident-installer folder:

mkdir trident
cd trident

wget https://github.com/NetApp/trident/releases/download/v25.02.1/trident-installer-25.02.1.tar.gz
tar -xvf trident-installer-25.02.1.tar.gz

Step 2.      Copy tridentctl to /usr/local/bin:

sudo cp trident-installer/tridentctl /bin/

Note:      If the NetApp Trident deployment fails and does not bring up the pods to Running state, use the tridentctl logs -l all -n trident command for debugging.

Note:      Before configuring the backends that Trident needs to use for user apps, go to: https://docs.netapp.com/us-en/trident/trident-reference/objects.html#kubernetes-customresourcedefinition-objects to understand the storage environment parameters and its usage in Trident.

Procedure 3.       Configure the Storage Backends in Trident

Step 1.      Configure the connections to the SVM on the NetApp storage array created for the OpenShift installation. For more options regarding storage backend configuration, go to https://docs.netapp.com/us-en/trident/trident-use/backends.html.

Step 2.      Create a backends directory and create the following backend definition files in that directory. Note that each backend definition includes a volume name template parameter that will give the volume configured on storage as part of the persistent volume a name that includes the backend name, the namespace, and the persistent volume claim (PVC) name (RequestName).

Note:      Customizable volume names are compatible with ONTAP on-premises drivers only. Also, these volume names do not apply to existing volumes.

Note:      In the following backend config definition files, we used “StoragePrefix” attribute under name template. The default value for StoragePrefix is “trident.”

cat backend_NFS.yaml

---

version: 1

storageDriverName: ontap-nas

backendName: ocp-nfs-backend

managementLIF: 10.102.2.50

dataLIF: 192.168.52.51

svm: OCP-SVM

username: vsadmin

password: <password>

useREST: true

defaults:

  spaceReserve: none

  exportPolicy: default

  snapshotPolicy: default

  snapshotReserve: '5'
  nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"

cat backend_NFS_flexgroup.yaml

---

version: 1

storageDriverName: ontap-nas-flexgroup

backendName: ocp-nfs-flexgroup

managementLIF: 10.102.2.50

dataLIF: 192.168.52.51

svm: OCP-SVM

username: vsadmin

password: <password>

useREST: true

defaults:

  spaceReserve: none

  exportPolicy: default

  snapshotPolicy: default

  snapshotReserve: '5'
  nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"

cat backend_iSCSI.yaml

---

version: 1

storageDriverName: ontap-san

backendName: ocp-iscsi-backend

managementLIF: 10.102.2.50

svm: OCP-SVM

sanType: iscsi

useREST: true

username: vsadmin

password: <password>

defaults:

  spaceReserve: none

  spaceAllocation: 'false'

  snapshotPolicy: default

  snapshotReserve: '5'
  nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"

cat backend_NVMe.yaml

---

version: 1

backendName: ocp-nvme-backend

storageDriverName: ontap-san

managementLIF: 10.102.2.50

svm: OCP-SVM

username: vsadmin

password: <password>

sanType: nvme

useREST: true

defaults:

  spaceReserve: none

  snapshotPolicy: default

  snapshotReserve: '5'
  nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"
 

Step 3.      Activate the storage backends for all storage protocols in your FlexPod:

tridentctl -n trident create backend -f backend_NFS.yaml
tridentctl -n trident create backend -f backend_NFS_flexgroup.yaml
tridentctl -n trident create backend -f backend_iSCSI.yaml
tridentctl -n trident create backend -f backend_NVMe.yaml
tridentctl -n trident get backend

+-------------------+---------------------+--------------------------------------+--------+------------+-----

|       NAME        |   STORAGE DRIVER    |                 UUID                 | STATE  | USER-STATE | VOLU

+-------------------+---------------------+--------------------------------------+--------+------------+-----

| ocp-nfs-backend   | ontap-nas           | 6bcb2421-a148-40bb-b7a4-9231e58efc2a | online | normal     |     

| ocp-nfs-flexgroup | ontap-nas-flexgroup | 68428a01-c5e6-4676-8cb5-e5521fc04bc7 | online | normal     |      

| ocp-iscsi-backend | ontap-san           | bbf1664d-1615-42d3-a5ed-1b8aed995a42 | online | normal     |      

| ocp-nvme-backend  | ontap-san           | 2b6861a2-6980-449a-b718-97002079e7f3 | online | normal     |      

+-------------------+---------------------+--------------------------------------+--------+------------+-----

Step 4.      Create the following Storage Class files:

cat storage-class-ontap-nfs.yaml

---

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: ontap-nfs

  annotations:

    storageclass.kubernetes.io/is-default-class: "true"

provisioner: csi.trident.netapp.io

parameters:

  backendType: "ontap-nas"

  provisioningType: "thin"

  snapshots: "true"

allowVolumeExpansion: true

cat storage-class-ontap-nfs-flexgroup.yaml

---

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: ontap-nfs-flexgroup

  annotations:

    storageclass.kubernetes.io/is-default-class: "false"

provisioner: csi.trident.netapp.io

parameters:

  backendType: "ontap-nas-flexgroup"

  provisioningType: "thin"

  snapshots: "true"

allowVolumeExpansion: true

cat storage-class-ontap-iscsi.yaml

---

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: ontap-iscsi

parameters:

  backendType: "ontap-san"

  sanType: "iscsi"

  provisioningType: "thin"

  snapshots: "true"

allowVolumeExpansion: true

provisioner: csi.trident.netapp.io

cat storage-class-ontap-nvme.yaml

---

apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: ontap-nvme-tcp

parameters:

  backendType: "ontap-san"

  sanType: "nvme"

  provisioningType: "thin"

  snapshots: "true"

allowVolumeExpansion: true

provisioner: csi.trident.netapp.io

Step 5.      Create the storage classes:

oc create -f storage-class-ontap-nfs.yaml
oc create -f storage-class-ontap-nfs-flexgroup.yaml
oc create -f storage-class-ontap-iscsi.yaml
oc create -f storage-class-ontap-nvme.yaml

A screenshot of a computerDescription automatically generated

Step 6.      Create a VolumeSnapshotClass file:

cat ontap-volumesnapshot-class.yaml

---

apiVersion: snapshot.storage.k8s.io/v1

kind: VolumeSnapshotClass

metadata:

  name: ontap-snapclass

driver: csi.trident.netapp.io

deletionPolicy: Delete

Step 7.      Create the VolumeSnapshotClass using the above file.

oc create -f ontap-volumesnapshot-class.yaml

Step 8.      Create a test PersistentVolumeClaim (PVC). In the OpenShift console, click Storage > PersistentVolumeClaims. Select an appropriate project (for example, default) or create a new project and select it. On the right, click Create PersistentVolumeClaim.

Step 9.      Select a StorageClass and give the PVC a name. Select an Access mode (RWO or RWX for NFS classes, and RWO for iSCSI or NVMe-TCP classes). Set a size and select a Volume mode (normally Filesystem). Click Create to create the PVC. For illustration, we created a test PVC using “ontap-nvme-tcp” storage class.

Related image, diagram or screenshot

Step 10.  Wait for the PVC to have a status of Bound. The PVC can now be attached to a container.

A screenshot of a computerAI-generated content may be incorrect.

Step 11.  Create a NetApp volume snapshot of the PVC by clicking the to the right of the PVC and selecting Create snapshot. Adjust the snapshot name and click Create. The snapshot will appear under VolumeSnapshots and can also be seen in NetApp ONTAP System Manager under the corresponding PV with a modified name.

Related image, diagram or screenshot

Related image, diagram or screenshot

Note:      Make sure the volume name for the PV matches the volume name mapping from the backend configuration in the above screenshot.

Step 12.  Delete the test PVC and snapshot by first selecting the Snapshot under Storage > VolumeSnapshots and clicking the to the right of the snapshot and selecting Delete VolumeSnapshot followed by Delete. Select the PVC under Storage > PersistentVolumeClaims and click the to the right of the PVC and select Delete PersistentVolumeClaim and click Delete.

NetApp DataOps Toolkit

The version 2.5.0 toolkit is currently compatible with Kubernetes versions 1.20 and above, and OpenShift versions 4.7 and above.

The toolkit is currently compatible with Trident versions 20.07 and above. Additionally, the toolkit is compatible with the following Trident backend types used in this validation:

●   ontap-nas

●   ontap-nas-flexgroup

More operations and capabilities about NetApp DataOps Toolkit are available and documented here: https://github.com/NetApp/netapp-dataops-toolkit

Prerequisites

The NetApp DataOps Toolkit for Kubernetes requires that Python 3.8 or above be installed on the local host. Additionally, the toolkit requires that pip for Python3 be installed on the local host. For more details regarding pip, including installation instructions, refer to the pip documentation.

Procedure 1.       NetApp DataOps Toolkit Installation

Step 1.      To install the NetApp DataOps Toolkit for Kubernetes on the OCP-Installer VM, run the following command:

sudo dnf install python3.11
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.11 get-pip.py
rm get-pip.py
python3.11 -m pip install netapp-dataops-k8s

NetApp DataOps Toolkit is used to create jupyterlab, clone jupyterlab, create a snapshot for a JupyterLab workspace, and so on.

Note:      You can use NetApp DataOps Toolkit to create Jupyter notebooks in this solution. For more information, go to: Create a new JupyterLab workspace.

Add an Additional Administrative User to the OpenShift Cluster

It is recommended to install a permanent administrative user to an OpenShift cluster to provide an alternative to logging in with the “temporary” kubeadmin user. This section shows how to build and install an HTPasswd user. Other Identity providers are also available.

Procedure 1.       Add the admin User

Step 1.      On the OCP-Installer VM in the auth directory where the kubeadmin-password and kubeconfig files are stored, create an admin.htpasswd file by typing:

htpasswd -c -B -b ./admin.htpasswd admin <password>

Adding password for user admin

Step 2.      Using Chrome or Firefox on the OCP-Installer VM, connect to the OpenShift console with the kubeadmin user. In the blue banner near the top of the page, click cluster OAuth configuration.

Step 3.      Use the Add pulldown under Identity providers to select HTPasswd. Click Browse and browse to the admin.htpasswd file created above. Highlight the file and click Select. Click Add. The htpasswd should now show up as an Identity provider.

Related image, diagram or screenshot

Step 4.      Click View authentication conditions for reconfiguration status and wait for the status to become Available.

Step 5.      Log out of the cluster and log back in with htpasswd and the admin user. Click Skip tour and log out of the cluster.

Step 6.      Log back into the cluster with kube:admin and the kubeadmin user. Select User Management > Users, then select the admin user. Select the RoleBindings tab and click Create binding.

Step 7.      Select for a Cluster-wide role binding and name the RoleBinding admin-cluster-admin. From the drop-down list under Role name, select the cluster-admin role. Click Create.

Related image, diagram or screenshot

Step 8.      Select User Management > Users, then select the admin user. Select the RoleBindings tab. Click the ellipses to the right of the user-settings RoleBinding to delete that RoleBinding, leaving only the cluster-admin RoleBinding.

Step 9.      You can now log out of the cluster and log back in with httpasswd and the admin user. On the top left select the Administrator role. You now have full cluster-admin access to the cluster.

Back up Cluster etcd

etcd is the key-value store for OpenShift, which persists the state of all resource objects.

For more information, see: https://docs.openshift.com/container-platform/4.16/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html.

Procedure 1.       Back up etcd using a script

Assuming that the OCP-Installer VM is backed up regularly, regular OpenShift etcd backups can be taken and stored on the OCP-Installer VM.

Step 1.      Create a directory on the OCP-Installer VM and create a directory inside this directory to store the backups:

cd
cd ocp
mkdir etcd-backup
cd etcd-backup
mkdir etcd-backups

Note:      For more robust storage of etcd backups, an NFS volume can be created on the NetApp storage and mounted as etcd-backups in the example above.

Step 2.      The following script can be created and made executable to create and save the etcd backup:

cat etcd-backup-script


#! /usr/bin/bash

ssh core@<control0 ip> sudo /usr/local/bin/cluster-backup.sh /home/core/assets/backup

ssh core@<control0 ip> sudo chmod 644 /home/core/assets/backup/*

scp core@<control0 ip>:/home/core/assets/backup/* /home/admin/ocp/etcd-backup/etcd-backups/

ssh core@<control0 ip> sudo rm /home/core/assets/backup/*

chmod 600 /home/admin/ocp/etcd-backup/etcd-backups/*

find /home/admin/ocp/etcd-backup/etcd-backups -type f -mtime +30 -delete

Note:      This script deletes backups over 30 days old.

Step 3.      Using sudo, add execution of this script to /etc/crontab:

cat /etc/crontab

SHELL=/bin/bash

PATH=/sbin:/bin:/usr/sbin:/usr/bin

MAILTO=root

 

# For details see man 4 crontabs

 

# Example of job definition:

# .---------------- minute (0 - 59)

# |  .------------- hour (0 - 23)

# |  |  .---------- day of month (1 - 31)

# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...

# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat

# |  |  |  |  |

# *  *  *  *  * user-name  command to be executed

  0  2  *  *  * admin      /home/admin/ocp/etcd-backup/etcd-backup-script

Note:      This example backs up etcd data daily at 2:00 am.

Step 4.       In the event that an etcd restore is needed, the appropriate backup files would need to be copied back to a working control-plane node from the control-plane node:

ssh core@<control0 ip> sudo scp admin@<ocp installer vm IP>:/home/admin/ocp/etcd-backup/etcd-backups/snapshot_2024-11-12_165737.db /home/core/assets/backup/

ssh core@<control0 ip> sudo scp admin@<ocp installer vm IP>:/home/admin/ocp/etcd-backup/ static_kuberesources_2024-11-12_170543.tar.gz /home/core/assets/backup/

Step 5.      To recover the cluster, see https://docs.openshift.com/container-platform/4.17/hosted_control_planes/hcp_high_availability/hcp-recovering-etcd-cluster.html#hcp-recovering-etcd-cluster.

Add a Worker Node to an OpenShift Cluster

It is often necessary to scale up an OpenShift cluster by adding worker nodes to the cluster. This set of procedures describes the steps to add a node to the cluster. These procedures require a Cisco UCS Server connected to a set of Fabric Interconnects with all VLANs in the Server Profile configured.

Procedure 1.       Deploy a Cisco UCS Server Profile

Deploy a Cisco UCS Server Profile in Cisco Intersight.

Step 1.      Depending on the type of server added (Cisco UCS X-Series or Cisco UCS C-Series), clone the existing OCP-Worker template and create and adjust the template according to the server type.

Step 2.      From the Configure > Templates page, to the right of the OCP-Worker template setup above, click the and select Derive Profiles.

Step 3.      Under the Server Assignment, select Assign Now and select the Cisco UCS server that will be added to the cluster as a Worker Node. Click Next.

Step 4.      Assign the Server Profile an appropriate Name (for example, ocp-worker3) and select the appropriate Organization. Click Next.

Step 5.      Click Derive.

Step 6.      From the Infrastructure Service > Profiles page, to the right of the just-created profile, click the and select Deploy. Select Reboot Immediately to Activate and click Deploy.

Step 7.      Wait until the profile deploys and activates.

Step 8.      Click the server profile and go to Configuration > Identifiers and Inventory tabs note the server’s management IP, serial number, and the MAC of address of network interface eno5.

Procedure 2.       Create the Bare Metal Host (BMH)

Step 1.      On the OCP-Installer VM, create the following yaml file (the example shown is for worker node worker3.<domain-name>.<base-domain>:

cat bmh.yaml

---

apiVersion: v1

kind: Secret

metadata:

  name: worker3-bmc-secret

  namespace: openshift-machine-api

type: Opaque

data:

  username: ZmxleGFkbWlu

  password: SDFnaJQwbJQ=

---

apiVersion: metal3.io/v1alpha1

kind: BareMetalHost

metadata:

  name: worker3.ocp.flexpodb4.cisco.com

  namespace: openshift-machine-api

spec:

  online: True

  bootMACAddress: 00:25:B5:A2:0A:1B

  bmc:

    address: redfish://10.102.2.238/redfish/v1/Systems/WZP27020EG1

    credentialsName: ocp-worker3-bmc-secret

    disableCertificateVerification: True

  customDeploy:

    method: install_coreos

  externallyProvisioned: true

Note:      The username and password shown in this file are base64 encoded and can be obtained by typing “echo -ne <username> | base64”. In this case typing “echo -ne flexadmin | base64” yielded ZmxleGFkbWlu.

Note:      Also note the bmc address. In this case redfish is used to connect to the server. The URL has the server serial number at the end of the URL. If you would like to use IPMI over LAN instead of redfish, just put the server’s management IP for the bmc address.

Step 2.      Create the Bare Metal Host by typing the following:

oc project openshift-machine-api
oc create -f bmh.yaml

Step 3.      Verify that the BMH is created by selecting Compute > Bare Metal Hosts in the OpenShift Console.

Related image, diagram or screenshot

Note:      With this method of creating the BMH, the server is not inspected and some details such as Serial Number, Network Interfaces, and Disks are not retrieved from the server, but the Power Management functions do work.

Step 4.      In the OpenShift Console, select Compute > MachineSets. Click the to the right of the worker MachineSet and choose Edit Machine count. Use the plus sign to increase the count by one. Click Save.

Step 5.      Click Compute > Machines. A new machine in the Provisioning phase should now appear in the list.

Related image, diagram or screenshot

Procedure 3.       Install Red Hat CoreOS on the New Worker

Step 1.      Connect to the Red Hat Hybrid Cloud Console here: https://console.redhat.com/openshift/overview and log in with your Red Hat credentials. On the left, select Cluster List. Under Cluster List, click your cluster to open it.

Step 2.      Select the Add Hosts tab. Click Add hosts.

Step 3.      Do not change the field settings and click Next.

Step 4.      For Provisioning type, select Full image file. Browse to and select the SSH public key file used in the original cluster installation. Click Generate Discovery ISO.

Step 5.      If your Cisco UCS Servers have the Intersight Advantage license installed, follow the procedure from Step 12 to use the Cisco Intersight workflow to boot the server with the Discovery ISO. Then skip to Step 15.

Step 6.      Click Download Discovery ISO. The file will download to your machine. Click Close.

Note:      This is a slightly different ISO than the one used to install the cluster and must be downloaded to successfully add a node.

Step 7.      Place the downloaded Discovery ISO on your HTTP or HTTPS web server and use a web browser to obtain the URL of the ISO.

Step 8.       In Cisco Intersight, edit the Virtual Media Policy that is part of the server profile. On the Policy Details page, select Add Virtual Media.

Step 9.      In the Add Virtual Media dialogue, leave CDD selected and select HTTP/HTTPS. Provide a name for the mount and add the URL for File Location.

Related image, diagram or screenshot

Step 10.  Click Add.

Step 11.  Click Save.

Step 12.  Under Infrastructure Service > Profiles, click the three dots to the right of the newly added worker server profile and select Deploy. Select only the bottom checkbox and select Deploy.

Note:      It is not necessary to redeploy the remaining server profiles. The Inconsistent status will be resolved after CoreOS is installed on the newly added worker.

Step 13.  Click the to the right of the newly added worker profile and select Server Actions > Power > Power Cycle. In the popup, click Power Cycle. The reboot from the Discovery ISO can be monitored with a vKVM Console (Server Actions > Launch vKVM).

Step 14.  Once the server has booted from the Discovery ISO, return to the Red Hat Hybrid Cloud Console. The newly added worker should appear in a few minutes. Wait for the Status to become Ready.

Related image, diagram or screenshot

Step 15.  Click Install ready hosts. The installation of CoreOS will take several minutes.

Note:      Once the CoreOS installation completes (Status of Installed), the server will reboot, boot CoreOS, and reboot a second time.

Step 16.  In Cisco Intersight, edit the vMedia policy and remove the virtual media mount. Go to Profiles > Server Profiles page, deploy the profile to the newly added worker profile without rebooting the host. The Inconsistent state on the remaining profiles should be cleared.

Step 17.  In the OpenShift Console, select Compute > Nodes. Once the server reboots have completed, the newly added worker will appear in the list as Discovered. Click Discovered and then select Approve. Click Not Ready and select Approve.

Step 18.  To link the Bare Metal Host to the Machine, select Compute > Machines. For the newly-added machine in the Provisioning Phase, note the last five characters in the machine name (for example, bqz2k).

Related image, diagram or screenshot

Step 19.  Select Compute > Bare Metal Hosts. Select the BMH above the newly added BMH (for example, worker2). Select the YAML tab. Select and copy the entire consumerRef field right underneath the externallyProvisioned field.

Related image, diagram or screenshot

Step 20.  Select Compute > Bare Metal Hosts. Select the BMH for the newly added BMH (for example, worker3). Select the YAML tab. Place the cursor at the end of the externallyProvisioned: true line and press Enter to insert a new line. Backspace to the beginning of the line and then paste in the consumerRef field from the previous step. Replace the last five characters in the name field with the five characters noted above (for example, bqz2k).

A screen shot of a computerDescription automatically generated

Step 21.  Click Save. Click Compute > Machines. The newly added machine should now be in the Provisioned Phase.

Related image, diagram or screenshot

Step 22.  To link this machine to the node, click this newly added machine and select the YAML tab. Under spec, select and copy the entire providerID line.

Related image, diagram or screenshot

Step 23.  Select Compute > Nodes. Select the newly-added node and select the YAML tab. Scroll down to find the spec field. Select and delete the {} to the right of spec: and press Enter to add a line. Paste in the providerID field with a two space indention and click Save.

Note:      The OpenShift nodes update frequently, and it will be necessary if an update has occurred to reload the YAML tab. After reloading, you may need to make the changes again.

Related image, diagram or screenshot

Step 24.  Select Compute > Bare Metal Hosts. The newly-added BMH should now be linked to a node.

Related image, diagram or screenshot

Step 25.  Select Compute > Machines. The newly-added machine should now be in the Provisioned as node Phase and should be linked to the node.

Related image, diagram or screenshot

Deploy a Sample Containerized Application

To demonstrate the installation of Red Hat OpenShift on Bare Metal on FlexPod Datacenter, a sample containerized application can be installed and run. In this case Stable Diffusion XL 1.0 will be run utilizing the Intel CPUs. If you have NVIDIA GPUs installed, refer to FlexPod Datacenter with Generative AI Inferencing - Cisco for details on deploying Stable Diffusion utilizing an NVIDIA GPU. This installation will use NetApp DataOps Toolkit, installed above, to install a Jupyter Notebook and then Intel OpenVino to run Stable Diffusion XL.

Procedure 1.       Deploy Jupyter Notebook

Step 1.      From the OCP-Installer VM, run the following command to deploy a jupyter notebook with an nfs persistent storage 90G disk, no gpus, and the latest PyTorch container at this time:

netapp_dataops_k8s_cli.py create jupyterlab --workspace-name=sd-xl -c ontap-nfs --size=90Gi --nvidia-gpu=0 -i nvcr.io/nvidia/pytorch:25.03-py3

Step 2.      Enter and verify a password for the notebook. The notebook is created in the ‘default’ namespace. The deployment will take a few minutes to reach Ready state:

Setting workspace password (this password will be required in order to access the workspace)...

Enter password:

Verify password:

 

Creating persistent volume for workspace...

Creating PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.

PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-sd-xl' created. Waiting for Kubernetes to bind volume to PVC.

Volume successfully created and bound to PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.

 

Creating Service 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.

Service successfully created.

 

Creating Deployment 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.

Deployment 'ntap-dsutil-jupyterlab-sd-xl' created.

Waiting for Deployment 'ntap-dsutil-jupyterlab-sd-xl' to reach Ready state.

Deployment successfully created.

 

Workspace successfully created.

To access workspace, navigate to http://10.102.2.211:31809

Step 3.      Once the Workspace is successfully created, use a Web browser on a machine with access to the Baremetal subnet to connect to the provided URL. Log in with the password provided.

A screenshot of a computerDescription automatically generated

Step 4.      Click the Terminal icon to launch a terminal in the PyTorch container. The Stable Diffusion XL 1.0 model by default is stored in /root/.cache. To redirect this to the persistent storage (mounted on /workspace), run the following commands:

mkdir /workspace/.cache
cp -R /root/.cache/* /workspace/.cache/
rm -rf /root/.cache
ln -s /workspace/.cache /root/.cache

Step 5.      Install Diffusers and OpenVino:

pip install --upgrade diffusers transformers scipy accelerate

pip install optimum[openvino]
pip install openvino==2024.6.0

Step 6.      Click the + icon to add a window and select Python File. Add the following:

from optimum.intel import OVStableDiffusionXLPipeline

 

model_id = "stabilityai/stable-diffusion-xl-base-1.0"

pipeline = OVStableDiffusionXLPipeline.from_pretrained(model_id)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k resolution"

image = pipeline(prompt).images[0]

 

image.save("astronaut_intel.png")

Step 7.      Right-click untitled.py and select Rename Python File. Name the file Run-SDXL.py and choose Rename. Click the x to the right of Run-SDXL.py to close the file and click Save.

Step 8.      In the Terminal window, run Stable Diffusion XL by typing python Run-SDXL.py. On the first run, the Stable Diffusion XL model will be downloaded to persistent storage. Subsequent runs will take less time.

Step 9.      Once the run is complete, double-click the astronaut_intel.png file from the list on the left.

A screenshot of a computer screenDescription automatically generated

Step 10.  From the OpenShift console, on the left click Workloads > Pods. In the center pane, from the drop-down list select the default Project.

Related image, diagram or screenshot

Step 11.  On the left, select Deployments. In the center pane, select the Jupyterlab Deployment and then select the YAML tab. This info can be used as a guide to create a yaml file to do a command line deployment using “oc” of a pod. The YAML can also be modified to customize the deployment. If you edit the deployment, you will need to delete the corresponding pod to spin the container and you will then need to add the symbolic link and reinstall the python libraries with pip.

About the Authors

John George, Technical Marketing Engineer, Cisco Systems, Inc.

John has been involved in designing, developing, validating, and supporting the FlexPod Converged Infrastructure since it was developed more than 13 years ago. Before his role with FlexPod, he supported and administered a large, worldwide training network and VPN infrastructure. John holds a master’s degree in Computer Engineering from Clemson University.

Kamini Singh, Technical Marketing Engineer, Hybrid Cloud Infra & OEM Solutions, NetApp

Kamini Singh is a Technical Marketing engineer at NetApp. She has more than five years of experience in data center infrastructure solutions. Kamini focuses on FlexPod hybrid cloud infrastructure solution design, implementation, validation, automation, and sales enablement. Kamini holds a bachelor’s degree in Electronics and Communication and a master’s degree in Communication Systems.

Acknowledgements

For their support and contribution to the design, validation, and creation of this Cisco Validated Design, the authors would like to thank:

●   Archana Sharma, Technical Marketing Engineer, Cisco Systems, Inc.

●   Paniraja Koppa, Technical Marketing Engineer, Cisco Systems, Inc.

Feedback

For comments and suggestions about this guide and related guides, join the discussion on Cisco Community at https://cs.co/en-cvds.

CVD Program

ALL DESIGNS, SPECIFICATIONS, STATEMENTS, INFORMATION, AND RECOMMENDATIONS (COLLECTIVELY, "DESIGNS") IN THIS MANUAL ARE PRESENTED "AS IS," WITH ALL FAULTS. CISCO AND ITS SUPPLIERS DISCLAIM ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE. IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THE DESIGNS, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

THE DESIGNS ARE SUBJECT TO CHANGE WITHOUT NOTICE. USERS ARE SOLELY RESPONSIBLE FOR THEIR APPLICATION OF THE DESIGNS. THE DESIGNS DO NOT CONSTITUTE THE TECHNICAL OR OTHER PROFESSIONAL ADVICE OF CISCO, ITS SUPPLIERS OR PARTNERS. USERS SHOULD CONSULT THEIR OWN TECHNICAL ADVISORS BEFORE IMPLE-MENTING THE DESIGNS. RESULTS MAY VARY DEPENDING ON FACTORS NOT TESTED BY CISCO.

CCDE, CCENT, Cisco Eos, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, Cisco WebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS, Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Cisco Unified Computing System (Cisco UCS), Cisco UCS B-Series Blade Servers, Cisco UCS C-Series Rack Servers, Cisco UCS S-Series Storage Servers, Cisco UCS X-Series, Cisco UCS Manager, Cisco UCS Management Software, Cisco Unified Fabric, Cisco Application Centric Infrastructure, Cisco Nexus 9000 Series, Cisco Nexus 7000 Series. Cisco Prime Data Center Network Manager, Cisco NX-OS Software, Cisco MDS Series, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, Fast Step, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study,  LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX, Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare, SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient, TransPath, WebEx, and the WebEx logo are registered trade-marks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries. (LDW_P3)

All other trademarks mentioned in this document or website are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0809R)

Learn more