The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
Published: May 2025
In partnership with:
About the Cisco Validated Design Program
The Cisco Validated Design (CVD) program consists of systems and solutions designed, tested, and documented to facilitate faster, more reliable, and more predictable customer deployments. For more information, go to: http://www.cisco.com/go/designzone.
The FlexPod Datacenter solution is a validated design for deploying Cisco and NetApp technologies and products to build shared private and public cloud infrastructure. Cisco and NetApp have partnered to deliver a series of FlexPod solutions that enable strategic data center platforms. The success of the FlexPod solution is driven through its ability to evolve and incorporate both technology and product innovations in the areas of management, compute, storage, and networking. This document explains the deployment details of Red Hat OpenShift on FlexPod Bare Metal Infrastructure. Some of the key advantages of FlexPod Datacenter with Red Hat OpenShift Bare Metal are:
● Consistent Configuration: having a standard method for deploying Red Hat OpenShift on FlexPod Bare Metal infrastructure provides a consistent platform to run containers and virtualized workloads including CPU and GPU accelerated AI/ML workloads, software and models, and OpenShift Virtualization, all side by side on the same infrastructure.
● Simpler and programmable infrastructure: the entire underlying infrastructure can be configured using infrastructure as code delivered using Ansible.
● End-to-End 100Gbps Ethernet: utilizing the 5th Generation Cisco UCS VICs and the 5th Generation Cisco UCS S9108 Fabric Interconnects (FIs) to deliver 100Gbps Ethernet from the server through the network to the storage.
● Cisco Intersight Management: Cisco Intersight Managed Mode (IMM) is used to manage the Cisco UCS S9108 FIs and Cisco UCS X-Series Servers. Additionally, Cisco Intersight integrates with NetApp Active IQ Unified Manager and Cisco Nexus switches as described in the following sections.
● Built for investment protections: design ready for future technologies such as liquid cooling and high-Wattage CPUs; CXL (Compute Express Link)-ready.
In addition to the FlexPod-specific hardware and software innovations, the integration of the Cisco Intersight cloud platform with NetApp Active IQ Unified Manager, and Cisco Nexus switches delivers monitoring and, orchestration capabilities for different layers (storage and networking) of the FlexPod infrastructure. Implementation of this integration at this point in the deployment process would require Cisco Intersight Assist and NetApp Active IQ Unified Manager to be deployed outside of the FlexPod.
For information about the FlexPod design and deployment details, including the configuration of various elements of design and associated best practices, refer to Cisco Validated Designs for FlexPod, here: https://www.cisco.com/c/en/us/solutions/design-zone/data-center-design-guides/flexpod-design-guides.html.
Solution Overview
This chapter contains the following:
● Audience
The FlexPod Datacenter with Red Hat OpenShift on Bare Metal configuration represents a cohesive and flexible infrastructure solution that combines computing hardware, networking, and storage resources into a single, integrated architecture. Designed as a collaborative effort between Cisco and NetApp, this converged infrastructure platform is engineered to deliver high levels of efficiency, scalability, and performance, suitable for a multitude of datacenter workloads. By standardizing on a validated design, organizations can accelerate deployment, reduce operational complexities, and confidently scale their IT operations to meet evolving business demands. The FlexPod architecture leverages Cisco's Unified Computing System (UCS) servers, Cisco Nexus networking, and NetApp's innovative storage systems, providing a robust foundation for both virtualized and non-virtualized environments.
The intended audience of this document includes but is not limited to IT architects, sales engineers, field consultants, professional services, IT managers, partner engineering, and customers who want to take advantage of an infrastructure built to deliver IT efficiency and enable IT innovation.
This document provides deployment guidance around bringing up the FlexPod Datacenter with Red Hat OpenShift on Bare Metal infrastructure. This configuration is built as a tenant on top of FlexPod Base and assumes FlexPod Base has already been configured. This document introduces various design elements and explains various considerations and best practices for a successful deployment.
The following design elements distinguish this version of FlexPod from previous models:
● IaC Configuration of Red Hat OpenShift Bare Metal as a tenant on top of FlexPod Base. This document is the first example of a FlexPod tenant on top of FlexPod Base that aligns with the tenant defined in FlexPod Zero Trust Framework Design Guide.
● Configuration of a platform that will support both Containerized Applications, such as AI applications and Virtual Machines on the same platform.
Deployment Hardware and Software
This chapter contains the following:
The FlexPod Datacenter with Cisco UCS and Cisco Intersight meets the following general design requirements:
● Resilient design across all layers of the infrastructure with no single point of failure
● Scalable design with the flexibility to add compute capacity, storage, or network bandwidth as needed
● Modular design that can be replicated to expand and grow as the needs of the business grow
● Flexible design that can support different models of various components with ease
● Simplified design with the ability to integrate and automate with external automation tools
● Cloud-enabled design which can be configured, managed, and orchestrated from the cloud using GUI or APIs
To deliver a solution which meets all these design requirements, various solution components are connected and configured as covered in the upcoming sections.
The FlexPod Datacenter with Red Hat OpenShift on Bare Metal infrastructure configuration is built using the following hardware components:
● Cisco UCS X9508 Chassis with six Cisco UCS X210C M7 Compute Nodes and 2 UCS X440p PCIe Nodes each containing 2 NVIDIA L40S GPUs
● Fifth-generation Cisco UCS S9108 Fabric Interconnects to support 100GbE and 25GbE connectivity from various components
● High-speed Cisco NX-OS-based Nexus 93600CD-GX switching design to support 100GE and 400GE connectivity
● NetApp AFF C800 end-to-end NVMe storage with 25G or 100G Ethernet and (optional) 32G Fibre Channel connectivity
The software components of this solution consist of:
● Cisco Intersight to deploy, maintain, and support the Cisco UCS server components
● Cisco Intersight SaaS platform to maintain and support the FlexPod components
● Cisco Intersight Assist Virtual Appliance to help connect NetApp ONTAP and Cisco Nexus switches with Cisco Intersight
● NetApp Active IQ Unified Manager to monitor and manage the storage and for NetApp ONTAP integration with Cisco Intersight
● Red Hat OpenShift which provides a platform for both containers and VMs
FlexPod Datacenter with Red Hat OpenShift on Bare Metal Infrastructure with Cisco UCS X-Series Direct Topology
Figure 1 shows various hardware components and the network connections for this IP-based FlexPod design.
The reference hardware configuration includes:
● Two Cisco Nexus 93600CD-GX Switches in Cisco NX-OS mode provide the switching fabric. Other Cisco Nexus Switches are also supported.
● Two Cisco UCS S9108 Fabric Interconnects (FIs) in the chassis provide the chassis connectivity. At least two 100 Gigabit Ethernet ports from each FI, configured as a Port-Channel, are connected to each Nexus 93600CD-GX switch. 25 Gigabit Ethernet connectivity is also supported as well as other versions of the Cisco UCS FI that would be used with Intelligent Fabric Modules (IFMs) in the chassis.
● One Cisco UCS X9508 Chassis contains six Cisco UCS X210C M7 servers and 2 Cisco UCS X440p PCIe Nodes each with two NVIDIA L40S GPUs. Other configurations of servers with and without GPUs are also supported.
● One NetApp AFF C800 HA pair connects to the Cisco Nexus 93600CD-GX Switches using two 100 GE ports from each controller configured as a Port-Channel. 25 Gigabit Ethernet connectivity is also supported as well as other NetApp AFF, ASA, and FAS storage controllers.
Red Hat OpenShift on Bare Metal Server Configuration
A simple Red Hat OpenShift cluster consists of at least five servers – three Control-Plane Nodes and two or more Worker Nodes where applications and VMs are run. In this lab validation three Worker Nodes were utilized. Based on OpenShift published requirements, the three Control Plane Nodes were configured with 64GB RAM, and the three Worker Nodes were configured with 768GB RAM to handle containerized applications and VMs.
An alternative configuration where all servers have the same amount of memory and CPUs, is to combine the control-plane and worker roles on the first three servers and then to assign only the worker role to the remaining servers. This configuration would require a minimum of three servers and notes throughout the document will explain deviations in the process for this configuration.
Each Node was booted from M.2. Both a single M.2 module and 2 M.2 modules with RAID1 are supported. Also, the servers paired with X440p PCIe Nodes were configured as Workers. From a networking perspective, both the Control-Plane Nodes and the Workers were configured with a single vNIC with UCS Fabric Failover in the Bare Metal or Management VLAN. The Workers were configured with extra NICs (vNICs) to allow storage attachment to the Workers. Each worker had two additional vNICs with the iSCSI A and B VLANs configured as native to allow iSCSI persistent storage attachment and future iSCSI boot. These same vNICs also had the NVMe-TCP A and B allowed VLANs assigned, allowing tagged VLAN interfaces for NVMe-TCP to be defined on the Workers. Finally, each worker had one additional vNIC with the OpenShift NFS VLAN configured as native to provide NFS persistent storage.
VLAN Configuration
Table 1 lists VLANs configured for setting up the FlexPod environment along with their usage.
VLAN ID |
Name |
Usage |
IP Subnet used in this deployment |
2* |
Native-VLAN |
Use VLAN 2 as native VLAN instead of default VLAN (1) |
|
1020* |
OOB-MGMT-VLAN |
Out-of-band management VLAN to connect management ports for various devices |
10.102.0.0/24; GW: 10.102.0.254 |
1022 |
OCP-BareMetal-MGMT |
Routable OpenShift Bare Metal VLAN used for OpenShift cluster and node management |
10.102.2.0/24; GW: 10.102.2.254 |
3012 |
OCP-iSCSI-A |
Used for OpenShift iSCSI Persistent Storage |
192.168.12.0/24 |
3022 |
OCP-iSCSI-B |
Used for OpenShift iSCSI Persistent Storage |
192.168.22.0/24 |
3032 |
OCP-NVMe-TCP-A |
Used for OpenShift NVMe-TCP Persistent Storage (Optional) |
192.168.32.0/24 |
3042 |
OCP-NVMe-TCP-B |
Used for OpenShift NVMe-TCP Persistent Storage (Optional) |
192.168.42.0/24 |
3052 |
OCP-NFS |
Used for OpenShift NFS RWX Persistent Storage (Not Available with NetApp ASA) |
192.168.52.0/24 |
Note: *VLANs configured in FlexPod Base.
Note: S3 object storage was also used in this environment but requires a routable subnet. In order to avoid having two default gateways on the OpenShift nodes, S3 was placed on the OCP-BareMetal-MGMT subnet and VLAN. A separate VLAN and subnet was not defined for S3.
Table 2 lists the VMs or bare metal servers necessary for deployment as outlined in this document.
Virtual Machine Description |
VLAN |
IP Address |
Comments |
OCP AD1 |
1022 |
10.102.2.249 |
Hosted on pre-existing management infrastructure within the FlexPod |
OCP AD2 |
1022 |
10.102.2.250 |
Hosted on pre-existing management infrastructure within the FlexPod |
OCP Installer |
1022 |
10.102.2.10 |
Hosted on pre-existing management infrastructure within the FlexPod |
NetApp Active IQ Unified Manager |
1021 |
10.102.1.97 |
Hosted on pre-existing management infrastructure within the FlexPod |
Cisco Intersight Assist Virtual Appliance |
1021 |
10.102.1.96 |
Hosted on pre-existing management infrastructure within the FlexPod |
Table 3 lists the software revisions for various components of the solution.
Layer |
Device |
Image Bundle |
Comments |
Compute |
Cisco UCS Fabric Interconnect S9108 |
4.3(5.240191) |
|
|
Cisco UCS X210C M7 |
5.3(5.250001) |
|
Network |
Cisco Nexus 93600CD-GX NX-OS |
10.4(4)M |
|
Storage |
NetApp AFF C800 |
ONTAP 9.16.1 |
Latest patch release |
Software |
Red Hat OpenShift |
4.17 |
|
NetApp Trident |
25.02.1 |
|
|
NetApp DataOps Toolkit |
2.5.0 |
|
|
Cisco Intersight Assist Appliance |
1.1.1-1 |
1.1.1-0 initially installed and then automatically upgraded |
|
NetApp Active IQ Unified Manager |
9.16 |
|
|
NVIDIA L40S GPU Driver |
550.144.03 |
|
The information in this section is provided as a reference for cabling the physical equipment in a FlexPod environment. To simplify cabling requirements, a cabling diagram was used.
The cabling diagram in this section contains the details for the prescribed and supported configuration of the NetApp AFF C800 running NetApp ONTAP 9.16.1.
Note: For any modifications of this prescribed architecture, consult the NetApp Interoperability Matrix Tool (IMT).
Note: This document assumes that out-of-band management ports are plugged into an existing management infrastructure at the deployment site. These interfaces will be used in various configuration steps.
Note: Be sure to use the cabling directions in this section as a guide.
The NetApp storage controller and disk shelves should be connected according to best practices for the specific storage controller and disk shelves. For disk shelf cabling, refer to NetApp Support.
Figure 2 details the cable connections used in the validation lab for the FlexPod topology based on the Cisco UCS S9108 fabric interconnect directly in the chassis. Two 100Gb links connect each Cisco UCS Fabric Interconnect to the Cisco Nexus Switches and each NetApp AFF controller to the Cisco Nexus Switches. Additional 1Gb management connections will be needed for one or more out-of-band network switches that sit apart from the FlexPod infrastructure. Each Cisco UCS fabric interconnect and Cisco Nexus switch is connected to the out-of-band network switches, and each AFF controller has a connection to the out-of-band network switches. Layer 3 network connectivity is required between the Out-of-Band (OOB) and In-Band (IB) Management Subnets.
The OpenShift Tenant is intended to be built on top of FlexPod Base and can coexist with other tenants. If FlexPod base has not been installed on the FlexPod, use FlexPod Datacenter Base Configuration using IaC with Cisco IMM and NetApp ONTAP to install FlexPod Base. Note that the OpenShift Tenant is an IP-only solution, but that other tenants utilizing Fibre Channel could be installed on the FlexPod. When FlexPod Base is installed, use it to configure all the available FlexPod components.
One part of the FlexPod Base installation is installing an Ansible VM or machine that is used to run the Ansible playbooks. For FlexPod Base, in the .ansible.cfg file in the user directory, “jinja2_native=True” was set. For running the FlexPod OpenShift Tenant scripts, this parameter needs to be commented out as shown below.
cat ~/.ansible.cfg
[defaults]
interpreter_python=/usr/bin/python3.11
#jinja2_native=True
You need to use a GitHub repository from one public location; the first step in the process is to clone the GitHub collection named FlexPod-IMM-OpenShift (https://github.com/ucs-compute-solutions/FlexPod-IMM-OpenShift.git) to a new empty folder on the Ansible workstation. Cloning the repository creates a local copy, which is then used to run the playbooks that have been created for this solution.
Step 1. From the Ansible workstation, change directories to the folder where the Ansible collections are located – something like /home/admin/ansible.
Step 2. Clone the GitHub collection using the following command:
git clone https://github.com/ucs-compute-solutions/FlexPod-IMM-OpenShift.git
Step 3. Change directories to the new folder named FlexPod-IMM-OpenShift.
Network Switch Configuration
This chapter contains the following:
● Cisco Nexus Switch Ansible Configuration
This chapter provides a detailed procedure for using an Ansible playbook to configure the Cisco Nexus 93600CD-GX switches for use in a FlexPod with Red Hat OpenShift on Bare Metal environment.
Note: The following procedures describe how to configure the Cisco Nexus switches for use in the OpenShift Bare Metal FlexPod environment. This procedure assumes the use of Cisco Nexus 9000 10.4(4)M.
● The following procedure includes the setup of NTP distribution on the bare metal VLAN. The interface-vlan feature and ntp commands are used to set this up.
● This procedure adds the tenant VLANs to the appropriate port-channels.
Cisco Nexus Switch Ansible Configuration
Procedure 1. Configure the Cisco Nexus switches from the Ansible workstation
Step 1. Add Nexus switch ssh keys to /home/admin/.ssh/known_hosts. Adjust known_hosts as necessary if errors occur:
ssh admin@<nexus-A-mgmt0-ip>
exit
ssh admin@<nexus-B-mgmt0-ip>
exit
Step 2. Edit the following variable files to ensure proper Cisco Nexus variables are entered:
● FlexPod-IMM-OpenShift/group_vars/all.yml
● FlexPod-IMM-OpenShift /group_vars/secrets.yml
● FlexPod-IMM-OpenShift /group_vars/nexus.yml
● FlexPod-IMM-OpenShift /inventory
● FlexPod-IMM-OpenShift /host_vars/n9kA.yml
● FlexPod-IMM-OpenShift /host_vars/n9kB.yml
Note: Port-channel numbers in FlexPod-IMM-OpenShift/group_vars/nexus.yml should be the same as setup in FlexPod Base.
Step 3. From FlexPod-IMM-OpenShift, run the Setup_Nexus.yml Ansible playbook:
ansible-playbook ./Setup_Nexus.yml -i inventory
Step 4. The following commands can be used to see the switch configuration and status:
show run
show vpc
show vlan
show port-channel summary
show ntp peer-status
show cdp neighbors
show lldp neighbors
show run int
show int
show udld neighbors
show int status
NetApp ONTAP Storage Configuration
This chapter contains the following:
● NetApp ONTAP Storage Ansible Configuration
This chapter provides a detailed procedure for using an Ansible playbook to configure the NetApp AFF C800 storage for use in a FlexPod with Red Hat OpenShift on Bare Metal environment.
Note: The following procedures describe how to configure the NetApp ONTAP storage for use in the OpenShift Bare Metal FlexPod environment. This procedure assumes the use of NetApp AFF C800 running ONTAP 9.16.1 software version.
● The following procedure includes the creation of dedicated IPspace for the OpenShift tenant, then creating relevant broadcast-domains, VLANs, adding VLANs to corresponding broadcast-domains.
● This procedure creates an SVM for OpenShift tenant and creates/enables the required services (NFS, iSCSI etc.) on the SVM.
● The following procedure includes the creation of logical interfaces (LIFs) for storage access
Note: The ONTAP Ansible playbook also provides ONTAP S3 configuration for the OpenShift tenant SVM.
NetApp ONTAP Storage Ansible Configuration
Procedure 1. Configure the NetApp ONTAP Storage for the OpenShift Tenant
Step 1. Edit the following variable files to ensure proper NetApp ONTAP storage variables are entered:
● FlexPod-IMM-OpenShift/group_vars/all.yml
● FlexPod-IMM-OpenShift /group_vars/secrets.yml
● FlexPod-IMM-OpenShift /group_vars/ontap
● FlexPod-IMM-OpenShift /inventory
● FlexPod-IMM-OpenShift /vars/ontap_main.yml
Step 2. From FlexPod-IMM-OpenShift, run the Setup_ONTAP.yml Ansible playbook with the associated tag for this section:
ansible-playbook ./Setup_ONTAP.yml -i inventory -t ontap_config
Note: Use the -vvv tag to see detailed execution output log.
Cisco Intersight Managed Mode Configuration
This chapter contains the following:
● Set up Cisco Intersight Resource Group
● Set up Cisco Intersight Organization
● Add Intersight IMM Pools and OpenShift VLANs
● Add Intersight IMM Server Policies
● Add Intersight IMM Server Profile Templates
● Clone and Adjust Server Profile Templates
The Cisco Intersight platform is a management solution delivered as a service with embedded analytics for Cisco and third-party IT infrastructures. The Cisco Intersight Managed Mode (also referred to as Cisco IMM or Intersight Managed Mode) is an architecture that manages Cisco Unified Computing System (Cisco UCS) fabric interconnect–attached systems through a Redfish-based standard model. Cisco Intersight managed mode standardizes both policy and operation management for Cisco UCS C-Series M7 and Cisco UCS X210c M7 compute nodes used in this deployment guide.
Cisco UCS B-Series M6 servers, connected and managed through Cisco UCS FIs, are also supported by IMM. For a complete list of supported platforms, go to: https://www.cisco.com/c/en/us/td/docs/unified_computing/Intersight/b_Intersight_Managed_Mode_Configuration_Guide/b_intersight_managed_mode_guide_chapter_01010.html
Procedure 1. Set up Cisco Intersight Resource Group
In this procedure, a Cisco Intersight resource group for the Red Hat OpenShift tenant is created where resources will be logically grouped. In FlexPod Base, a Resource Group for the entire FlexPod was setup. In our lab example, this was AA02-rg. In this deployment, a tenant resource group (AA02-OCP-rg) is created to host all the tenant resources, but you can choose to create multiple resource groups for granular control of the resources.
Step 1. Log into Cisco Intersight.
Step 2. Select System.
Step 3. Click Resource Groups on the left.
Step 4. Click + Create Resource Group in the top-right corner.
Step 5. Provide a name for the Resource Group (for example, AA02-OCP-rg).
Step 6. Under Resources, select Custom.
Step 7. Select all resources that are connected to this Red Hat OpenShift FlexPod tenant.
Note: If more than one FlexPod tenant is sharing the FIs, a subset of the servers can be assigned to the Resource Group.
Step 8. Click Create.
Procedure 2. Set Up Cisco Intersight Organization
In this procedure, an Intersight organization for the Red Hat OpenShift tenant is created where all Cisco Intersight Managed Mode configurations including policies are defined. In FlexPod Base, just like with Resource Groups, in our lab validation, the FlexPod Base Organization was AA02, and the OpenShift Tenant Organization is AA02-OCP.
Step 1. Log into the Cisco Intersight portal.
Step 2. Select System.
Step 3. Click Organizations on the left.
Step 4. Click + Create Organization in the top-right corner.
Step 5. Provide a name for the organization (for example, AA02-OCP), optionally select Share Resources with Other Organizations, and click Next.
Step 6. Select the Resource Group created in the last step (for example, AA02-OCP-rg) and click Next.
Step 7. Click Create.
Procedure 3. Add Intersight IMM Pools and OpenShift VLANs
This procedure adds the necessary Intersight IMM Pools and add the OpenShift VLANs to the Fabric Interconnects.
Step 1. Edit the following variable files to ensure proper Cisco Intersight IMM variables are entered:
● FlexPod-IMM-OpenShift/group_vars/all.yml
● FlexPod-IMM-OpenShift/group_vars/secrets.yml
● FlexPod-IMM-OpenShift/group_vars/ucs.yml
● FlexPod-IMM-OpenShift/SecretKey.txt
● FlexPod-IMM-OpenShift/roles/UCS-IMM/create_pools/defaults/main.yml
Step 2. From FlexPod-IMM-OpenShift, run the Setup_IMM_Pools.yml Ansible playbook.
ansible-playbook ./Setup_IMM_Pools.yml
Procedure 4. Add Intersight IMM Server Policies
The Setup_IMM_Server_Policies.yml playbook is designed to be run more than once if you have servers with different CPU types (Intel or AMD) or server generations (M6, M7, or M8). The different settings will generate different BIOS policies for each type of machine. It is important to run the Setup_IMM_Server_Policies.yml playbook and the Setup_IMM_Server_Profile_Templates.yml playbooks in succession before changing the CPU type or server generation in the FlexPod-IMM-OpenShift/group_vars/ucs.yml file and running both playbooks again.
Step 1. Edit the following variable files to ensure proper Cisco Intersight IMM variables are entered:
● FlexPod-IMM-OpenShift/group_vars/all.yml
● FlexPod-IMM-OpenShift/group_vars/secrets.yml
● FlexPod-IMM-OpenShift/group_vars/ucs.yml
● FlexPod-IMM-OpenShift/SecretKey.txt
● FlexPod-IMM-OpenShift/roles/create_server_policies/defaults/main.yml
Step 2. From FlexPod-IMM-OpenShift, run the Setup_IMM_Server_Policies.yml Ansible playbook:
ansible-playbook ./Setup_IMM_Policies.yml
Procedure 5. Add Intersight IMM Server Profile Templates
The Setup_IMM_Server_Profile_Templates.yml playbook is designed to be run immediately after the Setup_IMM_Server_Policies.yml playbook is run to create Server Profile Templates for a particular CPU type and server generation. Both a blade (X- or B-Series) and rack (C-Series) server profile template will be created.
Step 1. Edit the following variable files to ensure proper Cisco Intersight IMM variables are entered:
● FlexPod-IMM-OpenShift/group_vars/all.yml
● FlexPod-IMM-OpenShift/group_vars/secrets.yml
● FlexPod-IMM-OpenShift/group_vars/ucs.yml
● FlexPod-IMM-OpenShift/SecretKey.txt
● FlexPod-IMM-OpenShift/roles/create_server_profile_template/defaults/main.yml
Step 2. From FlexPod-IMM-OpenShift, run the Setup_IMM_Server_Profile_Templates.yml Ansible playbook:
ansible-playbook ./Setup_IMM_Profile_Templates.yml
Step 3. If you have additional servers with different CPU types or different generations, go back to Add Intersight IMM Server Policies and run the two playbooks again.
Procedure 6. Clone and Adjust Server Profile Templates
The server profile templates created above assume that each server has two M.2 cards and an M.2 RAID controller. If you have any servers with just one M.2 card, you can clone a template created by the Ansible playbooks and adjust it for one M.2 card.
In this example, there is one Cisco UCS X210c M7 that has only one M.2 card and uses that machine as a Worker node.
Step 1. In Cisco Intersight under Configure > Templates > UCS Server Profile Templates, click the … to the right of the <prefix>-Worker-Intel-M7-Blade-SPT template and select Clone.
Step 2. Make sure the correct Destination Organization is selected and click Next.
Step 3. Adjust the Clone Name (for example, <prefix>-Worker-Intel-M7-Blade-1M.2-SPT) and Description as needed and click Clone.
Step 4. From the Templates window, click the … to the right of the newly created clone and click Edit.
Step 5. Click Next until you get to Storage Configuration. Place the mouse over the M.2-RAID-Storage-Policy and click the X to delete the M.2-RAID-Storage-Policy.
Step 6. Click Next and Close to save this template.
Complete the Cisco UCS IMM Setup
Procedure 1. Derive Server Profiles
Step 1. From the Configure > Templates page, to the right of the OCP-Control-Plane template, click … and select Derive Profiles.
Note: If using combined control-plane and worker nodes, use the OCP-Worker template for all nodes.
Step 2. Under the Server Assignment, select Assign Now and select the three Cisco UCS X210c M7 servers that will be used as OpenShift Control-Plane Nodes.
Step 3. Click Next.
Step 4. For the Profile Name Prefix, enter the first part of the OpenShift Control-Plane Node hostnames (for example, control. Set Start Index for Suffix to 0 (zero). The three server Names should now correspond to the OpenShift Control-Plane Node hostnames.
Step 5. Click Next.
Step 6. Click Derive to derive the OpenShift Control-Plane Node Server Profiles.
Step 7. Select Profiles on the left and then select the UCS Server Profiles tab.
Step 8. Select the three OpenShift Control-Plane Node profiles and then click the … at the top or bottom of the list and select Deploy.
Step 9. Select Reboot Immediately to Activate and click Deploy.
Step 10. Repeat this process to create three OpenShift Worker Node Server Profiles using the OCP-Worker-Template.
OpenShift Installation and Configuration
This chapter contains the following:
● OpenShift – Installation Requirements
● Add an Additional Administrative User to the OpenShift Cluster
● Add a Worker Node to an OpenShift Cluster
● Deploy a Sample Containerized Application
OpenShift 4.17 is deployed on the Cisco UCS infrastructure as M.2 booted bare metal servers. The Cisco UCS X210C M7 servers need to be equipped with an M.2 controller (SATA or NVMe) card and either 1 or 2 identical M.2 drives. Three control-plane nodes and three worker nodes are deployed in the validation environment and additional worker nodes can easily be added to increase the scalability of the solution. This document will guide you through the process of using the Assisted Installer to deploy OpenShift 4.17.
OpenShift – Installation Requirements
The Red Hat OpenShift Assisted Installer provides support for installing OpenShift on bare metal nodes. This guide provides a methodology to achieving a successful installation using the Assisted Installer.
The FlexPod for OpenShift utilizes the Assisted Installer for OpenShift installation therefore when provisioning and managing the FlexPod infrastructure, you must provide all the supporting cluster infrastructure and resources, including an installer VM or host, networking, storage, and individual cluster machines.
The following supporting cluster resources are required for the Assisted Installer installation:
● The control plane and compute machines that make up the cluster
● Cluster networking
● Storage for the cluster infrastructure and applications
● The Installer VM or Host
The following infrastructure services need to be deployed to support the OpenShift cluster, during the validation of this solution we have provided VMs on your hypervisor of choice to run the required services. You can use existing DNS and DHCP services available in the data center.
There are various infrastructure services prerequisites for deploying OpenShift 4.16. These prerequisites are as follows:
● DNS and DHCP services – these services were configured on Microsoft Windows Server VMs in this validation
● NTP Distribution was done with the Cisco Nexus switches
● Specific DNS entries for deploying OpenShift – added to the DNS server
● A Linux VM for initial automated installation and cluster management – a Rocky Linux 9 / RHEL 9 VM with appropriate packages
NTP
Each OpenShift node in the cluster must have access to at least two NTP servers.
NICs
vNICs configured on the Cisco UCS servers based on the design previously discussed.
DNS
Clients access the OpenShift cluster nodes over the bare metal network. Configure a subdomain or subzone where the canonical name extension is the cluster name.
The following domain and OpenShift cluster names are used in this deployment guide:
● Base Domain: flexpodb4.cisco.com
● OpenShift Cluster Name: ocp
The DNS domain name for the OpenShift cluster should be the cluster name followed by the base domain, for example, ocp.flexpodb4.cisco.com.
Table 4 lists the information for fully qualified domain names used during validation. The API and Nameserver addresses begin with canonical name extensions. The hostnames of the control plane and worker nodes are exemplary, so you can use any host naming convention you prefer.
Usage |
Hostname |
IP Address |
API |
api.ocp.flexpodb4.cisco.com |
10.102.2.228 |
Ingress LB (apps) |
*.apps.ocp.flexpodb4.cisco.com |
10.102.2.229 |
control0 |
control0.ocp.flexpodb4.cisco.com |
10.102.2.211 |
control1 |
control1.ocp.flexpodb4.cisco.com |
10.102.2.212 |
control2 |
control2.ocp.flexpodb4.cisco.com |
10.102.2.213 |
worker0 |
worker0.ocp.flexpodb4.cisco.com |
10.102.2.214 |
worker1 |
worker1.ocp.flexpodb4.cisco.com |
10.102.2.215 |
worker2 |
worker2.ocp.flexpodb4.cisco.com |
10.102.2.216 |
DHCP
For the bare metal network, a network administrator must reserve several IP addresses, including:
● One IP address for the API endpoint
● One IP address for the wildcard Ingress endpoint
● One IP address for each control-plane node (DHCP server assigns to the node)
● One IP address for each worker node (DHCP server assigns to the node)
Note: Obtain the MAC addresses of the bare metal Interfaces from the UCS Server Profile for each node to be used in the DHCP configuration to assign reserved IP addresses (reservations) to the nodes. The KVM IP address also needs to be gathered for the control-plane and worker nodes from the server profiles.
Procedure 1. Gather MAC Addresses of Node Bare Metal Interfaces
Step 1. Log into Cisco Intersight.
Step 2. Select Configure > Profiles > Server Profile (for example, ocp-worker2).
Step 3. In the center pane, select Inventory > Network Adapters > Network Adapter (for example, UCSX-ML-V5D200G).
Step 4. In the center pane, select Interfaces.
Step 5. Record the MAC address for NIC Interface eno5.
Step 6. Select the General tab and select Identifiers in the center pane.
Step 7. Record the Management IP assigned out of the OCP-BareMetal-IP-Pool.
Table 5 lists the IP addresses used for the OpenShift cluster including bare metal network IPs and UCS KVM Management IPs for IPMI or Redfish access.
Hostname |
IP Address |
UCS KVM Mgmt. IP Address |
BareMetal MAC Address (eno5) |
control0.ocp.flexpodb4.cisco.com |
10.102.2.211 |
10.102.2.241 |
00:25:B5:A2:0A:80 |
control1.ocp.flexpodb4.cisco.com |
10.102.2.212 |
10.102.2.240 |
00:25:B5:A2:0A:81 |
control2.ocp.flexpodb4.cisco.com |
10.102.2.213 |
10.102.2.239 |
00:25:B5:A2:0A:82 |
worker0.ocp.flexpodb4.cisco.com |
10.102.2.214 |
10.102.2.243 |
00:25:B5:A2:0A:83 |
worker1.ocp.flexpodb4.cisco.com |
10.102.2.215 |
10.102.2.244 |
00:25:B5:A2:0A:85 |
worker2.ocp.flexpodb4.cisco.com |
10.102.2.216 |
10.102.2.242 |
00:25:B5:A2:0A:87 |
Step 8. From Table 5, enter the hostnames, IP addresses, and MAC addresses as reservations in your DHCP and DNS server(s) or configure the DHCP server to dynamically update DNS.
Step 9. You will also need to pipe VLAN interfaces for up to all six storage VLANs into your DHCP server(s) and assign IPs in the storage networks on those interfaces. Then create a DHCP scope for each storage VLAN and subnet where the IPs assigned by the scope do not overlap with storage LIF IPs. Either enter the nodes in the DNS server or configure the DHCP server to forward entries to the DNS server. For the cluster nodes, create reservations to map the hostnames to the desired IP addresses.
Step 10. Setup either a VM or spare server as an OCP-Installer machine with the network interface connected to the Bare Metal VLAN and install either Red Hat Enterprise Linux (RHEL) 9.5 or Rocky Linux 9.5 “Server with GUI” and create an administrator user. Once the VM or host is up and running, update it and install and configure XRDP. Also, install Google Chrome onto this machine. Connect to this host with a Windows Remote Desktop client as the admin user.
Procedure 2. Install Red Hat OpenShift using the Assisted Installer
Use the following steps to install OpenShift from the OCP-Installer VM.
Step 1. From the Installer desktop, open a terminal session and create an SSH key pair to use to communicate with the OpenShift hosts:
ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519
Step 2. Copy the public SSH key to the user directory:
cp ~/.ssh/id_ed25519.pub ~/
Step 3. Add the private key to the ssh-agent:
eval "$(ssh-agent)"
ssh-add ~/.ssh/id_ed25519
Step 4. Launch Chrome and connect to https://console.redhat.com/openshift/cluster-list. Log into your Red Hat account.
Step 5. Click Create cluster to create an OpenShift cluster.
Step 6. Select Datacenter and then select Bare Metal (x86_64).
Step 7. Select Interactive to launch the Assisted Installer.
Step 8. Provide the cluster name and base domain. Select the latest OpenShift 4.17 version. Scroll down and click Next.
Step 9. It is not necessary to install any Operators at this time, they can be added later. Click Next.
Step 10. Click Add hosts.
Step 11. Under Provisioning type, from the drop-down list select the Minimal image file. Under SSH public key, click Browse and browse to, select, and open the id_ed25519.pub file. The contents of the public key should now appear in the box. Click Generate Discovery ISO.
Step 12. If your Cisco UCS Servers have the Intersight Advantage license installed, click Add hosts from Cisco Intersight. If you do not have the Advantage license or you do not wish to use the Cisco Intersight Integration, skip to Step 15.
Step 13. A Cisco Intersight tab will appear in Chrome. Log into Intersight and select the appropriate account. Select the appropriate Organization (AA02-OCP). Click the pencil icon to select the servers for the OpenShift installation. In the list on the right, select the servers to install OpenShift onto and click Save. In the lower right-hand corner, click Execute. The Workflow will mount the Discovery ISO from the Red Hat Cloud and reboot the servers into the Discovery ISO.
Step 14. Back in the Red Hat Hybrid Cloud Console, click Close to close the Add hosts popup. Skip to Step 22 below.
Step 15. Click Download Discovery ISO to download the Discovery ISO into the Downloads directory. Click Close when the download is done.
Step 16. Copy the Discovery ISO to an http server. Use a web browser to get a copy of the URL for the Discovery ISO.
Step 17. Use Chrome to connect to Cisco Intersight and log into the Intersight account previously set up.
Step 18. Go to Configure > Policies and edit the Virtual Media policy attached to your OpenShift server profiles. Once on the Policy Details page, click Add Virtual Media.
Step 19. In the Add Virtual Media dialogue, leave CDD selected and select HTTP/HTTPS. Provide a name for the mount and add the URL for File Location.
Step 20. Click Add. Click Save & Deploy then click Save & Proceed. It is not necessary to reboot the hosts to add the vMedia mount. Click Deploy. Wait for each of the six servers to complete deploying the profile.
Step 21. Go to Configure > Profiles > UCS Server Profiles. Once all six server profiles have a status of OK, click the … to the right of each profile and select Server Actions > Power > Power Cycle then Power Cycle to reboot each of the six servers. If the M.2 drives or virtual drives are blank, the servers should boot from the Discovery ISO. This can be monitored with the vKVM if desired.
Step 22. Once all six servers have booted “RHEL CoreOS (Live)” from the Discovery ISO, they will appear in the Assisted Installer under Host discovery. Use the drop-down lists under Role to assign the appropriate server roles. Scroll down and click Next.
Note: If using combined control-plane and worker nodes, enable Run workloads on control plane nodes. When the “Control pane node” role is selected, it will also include the “Worker” role.
Step 23. Expand each node and verify CoreOS and OpenShift is being installed to sda (the M.2 device). Click Next.
Step 24. Under Network Management, make sure Cluster-Managed Networking is selected. Under Machine network, from the drop-down list select the subnet for the BareMetal VLAN. Enter the API IP for the api.cluster.basedomain entry in the DNS servers. For the Ingress IP, enter the IP for the *.apps.cluster.basedomain entry in the DNS servers.
Step 25. Scroll down. All nodes should all have a status of Ready. Click Next.
Step 26. Review the information and click Install cluster to begin the cluster installation.
Step 27. On the Installation progress page, expand the Host inventory. The installation will take 30-45 minutes. When installation is complete, all nodes will show a Status of Installed.
Step 28. Select Download kubeconfig to download the kubeconfig file. In a terminal window, setup a cluster directory and save credentials:
cd
mkdir <clustername> # for example, ocp
cd <clustername>
mkdir auth
cd auth
mv ~/Downloads/kubeconfig ./
mkdir ~/.kube
cp kubeconfig ~/.kube/config
Step 29. In the Assisted Installer, click the icon to copy the kubeadmin password:
echo <paste password> > ./kubeadmin-password
Step 30. In a new tab in Chrome, connect to https://access.redhat.com/downloads/content/290. Download the OpenShift Linux Client for the version of OpenShift that you installed:
cd ..
mkdir client
cd client
ls ~/Downloads
mv ~/Downloads/oc-x.xx.x-linux.tar.gz ./
tar xvf oc-x.xx.x-linux.tar.gz
ls
sudo mv oc /usr/local/bin/
sudo mv kubectl /usr/local/bin/
oc get nodes
Step 31. To enable oc tab completion for bash, run the following:
oc completion bash > oc_bash_completion
sudo mv oc_bash_completion /etc/bash_completion.d/
Step 32. If you used the Cisco UCS Integration in the OpenShift installation process, connect to Cisco Intersight and from Configure > Profiles > UCS Server Profiles, select all OpenShift Server Profiles. Click the … at either the top or bottom of the column and select Deploy. It is not necessary to reboot the servers, only select the second or lower check box and click Deploy.
Step 33. If you did not use the Cisco UCS Integration in the OpenShift installation process, in Cisco Intersight, edit the Virtual Media policy and remove the link to the Discovery ISO. Click Save & Deploy and then click Save & Proceed. Do not select “Reboot Immediately to Activate.” Click Deploy. The virtual media mount will be removed from the servers without rebooting them.
Step 34. In Chrome or Firefox, in the Assisted Installer page, click Launch OpenShift Console to launch the OpenShift Console. Use kubeadmin and the kubeadmin password to login. On the left, go to Compute > Nodes to see the status of the OpenShift nodes.
Step 35. In the Red Hat OpenShift console, go to Compute > Bare Metal Hosts. For each Bare Metal Host, click the ellipses to the right of the host and select Edit Bare Metal Host. Select Enable power management. Using Table 5, fill in the BMC Address. For an IPMI connection to the server, use the BMC IP Address. For a redfish connection to the server, use redfish://<BMC IP>/redfish/v1/Systems/<server Serial Number> and make sure to check Disable Certificate Verification. Also, make sure the Boot MAC Address matches the MAC address in Table 5. For the BMC Username and BMC Password, use what was entered into the Cisco Intersight Local User policy. Click Save to save the changes. Repeat this step for all Bare Metal Hosts.
Note: If using redfish to connect to the server, it is critical to check the box Disable Certificate Verification.
Step 36. Go to Compute > Bare Metal Hosts. Once all hosts have been configured, the Status should show “Externally provisioned,” and the Management Address should be populated. You can now manage power on the OpenShift hosts from the OpenShift console.
Step 37. Enable dynamic resource allocation for kubelet. On your installer VM, create a directory for resource allocation, place the following YAML files in it, and run the commands to create the configuration:
cat worker-kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: dynamic-node
spec:
autoSizingReserved: true
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: ""
cat control-plane-kubeletconfig.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: dynamic-node-control-plane
spec:
autoSizingReserved: true
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/master: ""
oc create -f worker-kubeletconfig.yaml
oc create -f control-plane-kubeletconfig.yaml
Step 38. To setup NTP on the worker and control-plane nodes, and NVMe-TCP on the worker nodes, run the following:
cd
cd <cluster-name> # For example, ocp
mkdir machine-configs
cd machine-configs
curl https://mirror.openshift.com/pub/openshift-v4/clients/butane/latest/butane --output butane
chmod +x butane
Step 39. Build the following files in the machine-configs directory with variations for your network:
cat 99-control-plane-chrony-conf-override.bu
variant: openshift
version: 4.17.0
metadata:
name: 99-control-plane-chrony-conf-override
labels:
machineconfiguration.openshift.io/role: master
storage:
files:
- path: /etc/chrony.conf
mode: 0644
overwrite: true
contents:
inline: |
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
server 10.102.2.3 iburst
server 10.102.2.4 iburst
cat 99-worker-chrony-conf-override.bu
variant: openshift
version: 4.17.0
metadata:
name: 99-worker-chrony-conf-override
labels:
machineconfiguration.openshift.io/role: worker
storage:
files:
- path: /etc/chrony.conf
mode: 0644
overwrite: true
contents:
inline: |
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
server 10.102.2.3 iburst
server 10.102.2.4 iburst
cat 99-worker-nvme-discovery.bu
variant: openshift
version: 4.17.0
metadata:
name: 99-worker-nvme-discovery
labels:
machineconfiguration.openshift.io/role: worker
openshift:
kernel_arguments:
- loglevel=7
storage:
files:
- path: /etc/nvme/discovery.conf
mode: 0644
overwrite: true
contents:
inline: |
--transport=tcp --traddr=192.168.32.51 --trsvcid=8009
--transport=tcp --traddr=192.168.32.52 --trsvcid=8009
--transport=tcp --traddr=192.168.42.51 --trsvcid=8009
--transport=tcp --traddr=192.168.42.52 --trsvcid=8009
Step 40. Create .yaml files from the butane files with butane, then load the configurations into OpenShift:
./butane 99-control-plane-chrony-conf-override.bu -o ./99-control-plane-chrony-conf-override.yaml
./butane 99-worker-chrony-conf-override.bu -o ./99-worker-chrony-conf-override.yaml
./butane 99-worker-nvme-discovery.bu -o ./99-worker-nvme-discovery.yaml
oc create -f 99-control-plane-chrony-conf-override.yaml
oc create -f 99-worker-chrony-conf-override.yaml
oc create -f 99-worker-nvme-discovery.yaml
Note: If using combined control-plane and worker nodes, 99-control-plane-nvme-discovery.bu and 99-control-plane-nmve-discovery.yaml files will need to be created and loaded into OpenShift.
Step 41. To enable iSCSI and multipathing on the workers, create the 99-worker-ontap-iscsi.yaml and upload as a machine config:
cat 99-worker-ontap-iscsi.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
name: 99-worker-ontap-iscsi
labels:
machineconfiguration.openshift.io/role: worker
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,IyBkZXZpY2UtbWFwcGVyLW11bHRpcGF0aCBjb25maWd1cmF0aW9uIGZpbGUKCiMgRm9yIGEgY29tcGxldGUgbGlzdCBvZiB0aGUgZGVmYXVsdCBjb25maWd1cmF0aW9uIHZhbHVlcywgcnVuIGVpdGhlcjoKIyAjIG11bHRpcGF0aCAtdAojIG9yCiMgIyBtdWx0aXBhdGhkIHNob3cgY29uZmlnCgojIEZvciBhIGxpc3Qgb2YgY29uZmlndXJhdGlvbiBvcHRpb25zIHdpdGggZGVzY3JpcHRpb25zLCBzZWUgdGhlCiMgbXVsdGlwYXRoLmNvbmYgbWFuIHBhZ2UuCgpkZWZhdWx0cyB7Cgl1c2VyX2ZyaWVuZGx5X25hbWVzIHllcwoJZmluZF9tdWx0aXBhdGhzIG5vCn0KCmJsYWNrbGlzdCB7Cn0K
verification: {}
filesystem: root
mode: 600
overwrite: true
path: /etc/multipath.conf
systemd:
units:
- name: iscsid.service
enabled: true
state: started
- name: multipathd.service
enabled: true
state: started
osImageURL: ""
oc create -f 99-worker-ontap-iscsi.yaml
Note: If using combined control-plane and worker nodes, the 99-control-plane-ontap-iscsi.yaml file will need to be created and loaded into OpenShift.
Note: The Base 64 encoded source above is the following file (/etc/multipath.conf) encoded. It is necessary to set “find_multipaths” to no.
cat multipath.conf
# device-mapper-multipath configuration file
# For a complete list of the default configuration values, run either:
# # multipath -t
# or
# # multipathd show config
# For a list of configuration options with descriptions, see the
# multipath.conf man page.
defaults {
user_friendly_names yes
find_multipaths no
}
blacklist {
}
Step 42. Over the next 20-30 minutes each of the nodes will go through the “Not Ready” state and reboot. You can monitor this by going to Compute > MachineConfigPools in the OpenShift Console. Wait until both pools have an Update status of “Up to date.”
Step 43. The Kubernetes NMState Operator will be used to configure the storage networking interfaces on the workers (and also virtual machine connected interfaces if OpenShift Virtualization is installed). In the OpenShift Console, go to Operators > OperatorHub. In the search box, enter NMState and Kubernetes NMState Operator should appear. Click Kubernetes NMState Operator.
Step 44. Click Install. Leave all the defaults in place and click Install again. The operator will take a few minutes to install.
Step 45. Once the operator is installed, click View Operator.
Step 46. Select the NMState tab. On the right, click Create NMState. Leave all defaults in place and click Create. The nmstate will be created. You will also need to refresh the console because additional items will be added under Networking.
Step 47. In an NMState directory on the ocp-installer machine, create the following YAML files:
cat eno6.yaml
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: ocp-iscsi-a-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: eno6
description: Configuring eno6 on workers
type: ethernet
state: up
ipv4:
dhcp: true
enabled: true
ipv6:
enabled: false
cat eno7.yaml
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: ocp-iscsi-b-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: eno7
description: Configuring eno7 on workers
type: ethernet
state: up
ipv4:
dhcp: true
enabled: true
ipv6:
enabled: false
cat eno8.yaml # If configuring NFS
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: ocp-nfs-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: eno8
description: Configuring eno8 on workers
type: ethernet
state: up
ipv4:
dhcp: true
enabled: true
ipv6:
enabled: false
cat eno6.3032.yaml # If configuring NVMe-TCP
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: ocp-nvme-tcp-a-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: eno6.3032
description: VLAN 3032 using eno6
type: vlan
state: up
ipv4:
dhcp: true
enabled: true
ipv6:
enabled: false
vlan:
base-iface: eno6
id: 3032
cat eno7.3042.yaml # If configuring NVMe-TCP
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: ocp-nvme-tcp-b-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: eno7.3042
description: VLAN 3042 using eno7
type: vlan
state: up
ipv4:
dhcp: true
enabled: true
ipv6:
enabled: false
vlan:
base-iface: eno7
id: 3042
Step 48. Add the Node Network Configuration Policies to the OpenShift cluster:
oc create -f eno6.yaml
oc create -f eno7.yaml
oc create -f eno8.yaml # If configuring NFS
oc create -f eno6.3032.yaml # If configuring NVMe-TCP
oc create -f eno7.3042.yaml # If configuring NVMe-TCP
Step 49. The policies should appear under Networking > NodeNetworkConfigurationPolicy.
Note: If using combined control-plane and worker nodes, since all nodes have the worker role, the node selector will apply these policies to all nodes.
Step 50. Using ssh core@<node IP>, connect to each of the worker nodes and use the ifconfig -a and chronyc sources commands to verify the correct network and NTP setup of the servers.
Procedure 3. Install the NVIDIA GPU Operator (optional)
If you have GPUs installed in your Cisco UCS servers, you need to install the Node Feature Discovery (NFD) Operator to detect NVIDIA GPUs and the NVIDIA GPU Operator to make these GPUs available to containers and virtual machines.
Step 1. In the OpenShift web console, click Operators > OperatorHub.
Step 2. Type Node Feature in the Filter box and then click the Node Feature Discovery Operator with Red Hat in the upper right corner. Click Install.
Step 3. Do not change any settings and click Install.
Step 4. When the Install operator is ready for use, click View Operator.
Step 5. In the bar to the right of Details, click NodeFeatureDiscovery.
Step 6. Click Create NodeFeatureDiscovery.
Step 7. Click Create.
Step 8. When the nfd-instance has a status of Available, Upgradeable, select Compute > Nodes.
Step 9. Select a node that has one or more GPUs and then select Details.
Step 10. The following label should be present on the host:
Note: This label should appear on all nodes with GPUs.
Step 11. Return to Operators > OperatorHub.
Step 12. Type NVIDIA in the Filter box and then click on the NVIDIA GPU Operator. Click Install.
Step 13. Do not change any settings and click Install.
Step 14. When the Install operator is ready for use, click View Operator.
Step 15. In the bar to the right of Details, click ClusterPolicy.
Step 16. Click Create ClusterPolicy.
Step 17. Do not change any settings and scroll down and click Create. This will install the latest GPU driver.
Step 18. Wait for the gpu-cluster-policy Status to become Ready.
Step 19. Connect to a terminal window on the OCP-Installer machine. Type the following commands. The output shown is for two servers that are equipped with GPUs:
oc project nvidia-gpu-operator
Now using project "nvidia-gpu-operator" on server "https://api.ocp.flexpodb4.cisco.com:6443".
oc get pods
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-jmlbr 1/1 Running 0 6m45s
gpu-feature-discovery-l2l6n 1/1 Running 0 6m41s
gpu-operator-6656d9fbf-wkkfm 1/1 Running 0 11m
nvidia-container-toolkit-daemonset-gb8d9 1/1 Running 0 6m45s
nvidia-container-toolkit-daemonset-t4xdf 1/1 Running 0 6m41s
nvidia-cuda-validator-lc8zr 0/1 Completed 0 4m33s
nvidia-cuda-validator-zxvnx 0/1 Completed 0 4m39s
nvidia-dcgm-exporter-k6tnp 1/1 Running 2 (4m7s ago) 6m41s
nvidia-dcgm-exporter-vb66w 1/1 Running 2 (4m20s ago) 6m45s
nvidia-dcgm-hfgz2 1/1 Running 0 6m45s
nvidia-dcgm-qwm46 1/1 Running 0 6m41s
nvidia-device-plugin-daemonset-nr6m7 1/1 Running 0 6m41s
nvidia-device-plugin-daemonset-rpvwr 1/1 Running 0 6m45s
nvidia-driver-daemonset-416.94.202407231922-0-88zcr 2/2 Running 0 7m42s
nvidia-driver-daemonset-416.94.202407231922-0-bvph6 2/2 Running 0 7m42s
nvidia-node-status-exporter-bz79d 1/1 Running 0 7m41s
nvidia-node-status-exporter-jgjbd 1/1 Running 0 7m41s
nvidia-operator-validator-8fxqr 1/1 Running 0 6m41s
nvidia-operator-validator-tbqtc 1/1 Running 0 6m45s
Step 20. Connect to one of the nvidia-driver-daemonset containers and view the GPU status:
oc exec -it nvidia-driver-daemonset-416.94.202407231922-0-88zcr -- bash
[root@nvidia-driver-daemonset-417 drivers]# nvidia-smi
Thu Mar 6 13:13:33 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:38:00.0 Off | 0 |
| N/A 28C P8 34W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S On | 00000000:D8:00.0 Off | 0 |
| N/A 27C P8 35W / 350W | 1MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Procedure 4. Enable the GPU Monitoring Dashboard (Optional)
Step 1. Using https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/enable-gpu-monitoring-dashboard.html, enable the GPU Monitoring Dashboard to monitor GPUs in the OpenShift Web-Console.
NetApp Trident is an open-source, fully supported storage orchestrator for containers and Kubernetes distributions. It was designed to help meet the containerized applications’ persistence demands using industry-standard interfaces, such as the Container Storage Interface (CSI). With Trident, microservices and containerized applications can take advantage of enterprise-class storage services provided by the NetApp portfolio of storage systems. More information about Trident can be found here: NetApp Trident Documentation. NetApp Trident can be installed via different methods. In this solution we will discuss installing the NetApp Trident version 25.2.1 using Trident Operator (installed using OperatorHub).
Trident Operator is a component used to manage the lifecycle of Trident. The operator simplifies the deployment, configuration, and management of Trident. The Trident operator is supported with OpenShift version 4.10 and above.
Note: In this solution, we validated NetApp Trident with the ontap-nas driver and ontap-nas-flexgroup driver using the NFS protocol. We also validated the ontap-san driver for iSCSI and NVMe-TCP. Make sure to install only the backends and storage classes for the storage protocols you are using.
Procedure 1. Install the NetApp Trident Operator
In this implementation NetApp Trident Operator minimally version 25.2.1 is installed.
Step 1. In the OpenShift web console, click Operators > OperatorHub.
Step 2. Type Trident in the Filter box and then click the NetApp Trident operator. Click Continue to accept the warning about Community Operators. Click Install.
Step 3. Verify that at least Version 25.2.1 is selected. Click Install.
Step 4. Once the operator is installed and ready for use, click View Operator.
Step 5. In the bar to the right of Details, click Trident Orchestrator.
Step 6. Click Create TridentOrchestrator. Click Create. Wait for the Status to become Installed.
Step 7. On the installer VM, check the Trident OpenShift pods after installation:
oc get pods -n trident
NAME READY STATUS RESTARTS AGE
trident-controller-5df9c4b4b5-sdlft 6/6 Running 0 7m57s
trident-node-linux-7pjfj 2/2 Running 0 7m57s
trident-node-linux-j4k92 2/2 Running 0 7m57s
trident-node-linux-kzb6n 2/2 Running 0 7m57s
trident-node-linux-q7ndq 2/2 Running 0 7m57s
trident-node-linux-tl2z8 2/2 Running 0 7m57s
trident-node-linux-vtfr6 2/2 Running 0 7m57s
Procedure 2. Obtain tridentctl
Step 1. From the OpenShift directory, download Trident software from GitHub and untar the .gz file to obtain the trident-installer folder:
mkdir trident
cd trident
wget https://github.com/NetApp/trident/releases/download/v25.02.1/trident-installer-25.02.1.tar.gz
tar -xvf trident-installer-25.02.1.tar.gz
Step 2. Copy tridentctl to /usr/local/bin:
sudo cp trident-installer/tridentctl /bin/
Note: If the NetApp Trident deployment fails and does not bring up the pods to Running state, use the tridentctl logs -l all -n trident command for debugging.
Note: Before configuring the backends that Trident needs to use for user apps, go to: https://docs.netapp.com/us-en/trident/trident-reference/objects.html#kubernetes-customresourcedefinition-objects to understand the storage environment parameters and its usage in Trident.
Procedure 3. Configure the Storage Backends in Trident
Step 1. Configure the connections to the SVM on the NetApp storage array created for the OpenShift installation. For more options regarding storage backend configuration, go to https://docs.netapp.com/us-en/trident/trident-use/backends.html.
Step 2. Create a backends directory and create the following backend definition files in that directory. Note that each backend definition includes a volume name template parameter that will give the volume configured on storage as part of the persistent volume a name that includes the backend name, the namespace, and the persistent volume claim (PVC) name (RequestName).
Note: Customizable volume names are compatible with ONTAP on-premises drivers only. Also, these volume names do not apply to existing volumes.
Note: In the following backend config definition files, we used “StoragePrefix” attribute under name template. The default value for StoragePrefix is “trident.”
cat backend_NFS.yaml
---
version: 1
storageDriverName: ontap-nas
backendName: ocp-nfs-backend
managementLIF: 10.102.2.50
dataLIF: 192.168.52.51
svm: OCP-SVM
username: vsadmin
password: <password>
useREST: true
defaults:
spaceReserve: none
exportPolicy: default
snapshotPolicy: default
snapshotReserve: '5'
nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"
cat backend_NFS_flexgroup.yaml
---
version: 1
storageDriverName: ontap-nas-flexgroup
backendName: ocp-nfs-flexgroup
managementLIF: 10.102.2.50
dataLIF: 192.168.52.51
svm: OCP-SVM
username: vsadmin
password: <password>
useREST: true
defaults:
spaceReserve: none
exportPolicy: default
snapshotPolicy: default
snapshotReserve: '5'
nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"
cat backend_iSCSI.yaml
---
version: 1
storageDriverName: ontap-san
backendName: ocp-iscsi-backend
managementLIF: 10.102.2.50
svm: OCP-SVM
sanType: iscsi
useREST: true
username: vsadmin
password: <password>
defaults:
spaceReserve: none
spaceAllocation: 'false'
snapshotPolicy: default
snapshotReserve: '5'
nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"
cat backend_NVMe.yaml
---
version: 1
backendName: ocp-nvme-backend
storageDriverName: ontap-san
managementLIF: 10.102.2.50
svm: OCP-SVM
username: vsadmin
password: <password>
sanType: nvme
useREST: true
defaults:
spaceReserve: none
snapshotPolicy: default
snapshotReserve: '5'
nameTemplate: "{{.config.StoragePrefix}}_{{.config.BackendName}}_{{.volume.Namespace}}_{{.volume.RequestName}}"
Step 3. Activate the storage backends for all storage protocols in your FlexPod:
tridentctl -n trident create backend -f backend_NFS.yaml
tridentctl -n trident create backend -f backend_NFS_flexgroup.yaml
tridentctl -n trident create backend -f backend_iSCSI.yaml
tridentctl -n trident create backend -f backend_NVMe.yaml
tridentctl -n trident get backend
+-------------------+---------------------+--------------------------------------+--------+------------+-----
| NAME | STORAGE DRIVER | UUID | STATE | USER-STATE | VOLU
+-------------------+---------------------+--------------------------------------+--------+------------+-----
| ocp-nfs-backend | ontap-nas | 6bcb2421-a148-40bb-b7a4-9231e58efc2a | online | normal |
| ocp-nfs-flexgroup | ontap-nas-flexgroup | 68428a01-c5e6-4676-8cb5-e5521fc04bc7 | online | normal |
| ocp-iscsi-backend | ontap-san | bbf1664d-1615-42d3-a5ed-1b8aed995a42 | online | normal |
| ocp-nvme-backend | ontap-san | 2b6861a2-6980-449a-b718-97002079e7f3 | online | normal |
+-------------------+---------------------+--------------------------------------+--------+------------+-----
Step 4. Create the following Storage Class files:
cat storage-class-ontap-nfs.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ontap-nfs
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.trident.netapp.io
parameters:
backendType: "ontap-nas"
provisioningType: "thin"
snapshots: "true"
allowVolumeExpansion: true
cat storage-class-ontap-nfs-flexgroup.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ontap-nfs-flexgroup
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: csi.trident.netapp.io
parameters:
backendType: "ontap-nas-flexgroup"
provisioningType: "thin"
snapshots: "true"
allowVolumeExpansion: true
cat storage-class-ontap-iscsi.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ontap-iscsi
parameters:
backendType: "ontap-san"
sanType: "iscsi"
provisioningType: "thin"
snapshots: "true"
allowVolumeExpansion: true
provisioner: csi.trident.netapp.io
cat storage-class-ontap-nvme.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ontap-nvme-tcp
parameters:
backendType: "ontap-san"
sanType: "nvme"
provisioningType: "thin"
snapshots: "true"
allowVolumeExpansion: true
provisioner: csi.trident.netapp.io
Step 5. Create the storage classes:
oc create -f storage-class-ontap-nfs.yaml
oc create -f storage-class-ontap-nfs-flexgroup.yaml
oc create -f storage-class-ontap-iscsi.yaml
oc create -f storage-class-ontap-nvme.yaml
Step 6. Create a VolumeSnapshotClass file:
cat ontap-volumesnapshot-class.yaml
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: ontap-snapclass
driver: csi.trident.netapp.io
deletionPolicy: Delete
Step 7. Create the VolumeSnapshotClass using the above file.
oc create -f ontap-volumesnapshot-class.yaml
Step 8. Create a test PersistentVolumeClaim (PVC). In the OpenShift console, click Storage > PersistentVolumeClaims. Select an appropriate project (for example, default) or create a new project and select it. On the right, click Create PersistentVolumeClaim.
Step 9. Select a StorageClass and give the PVC a name. Select an Access mode (RWO or RWX for NFS classes, and RWO for iSCSI or NVMe-TCP classes). Set a size and select a Volume mode (normally Filesystem). Click Create to create the PVC. For illustration, we created a test PVC using “ontap-nvme-tcp” storage class.
Step 10. Wait for the PVC to have a status of Bound. The PVC can now be attached to a container.
Step 11. Create a NetApp volume snapshot of the PVC by clicking the … to the right of the PVC and selecting Create snapshot. Adjust the snapshot name and click Create. The snapshot will appear under VolumeSnapshots and can also be seen in NetApp ONTAP System Manager under the corresponding PV with a modified name.
Note: Make sure the volume name for the PV matches the volume name mapping from the backend configuration in the above screenshot.
Step 12. Delete the test PVC and snapshot by first selecting the Snapshot under Storage > VolumeSnapshots and clicking the … to the right of the snapshot and selecting Delete VolumeSnapshot followed by Delete. Select the PVC under Storage > PersistentVolumeClaims and click the … to the right of the PVC and select Delete PersistentVolumeClaim and click Delete.
The version 2.5.0 toolkit is currently compatible with Kubernetes versions 1.20 and above, and OpenShift versions 4.7 and above.
The toolkit is currently compatible with Trident versions 20.07 and above. Additionally, the toolkit is compatible with the following Trident backend types used in this validation:
● ontap-nas
● ontap-nas-flexgroup
More operations and capabilities about NetApp DataOps Toolkit are available and documented here: https://github.com/NetApp/netapp-dataops-toolkit
Prerequisites
The NetApp DataOps Toolkit for Kubernetes requires that Python 3.8 or above be installed on the local host. Additionally, the toolkit requires that pip for Python3 be installed on the local host. For more details regarding pip, including installation instructions, refer to the pip documentation.
Procedure 1. NetApp DataOps Toolkit Installation
Step 1. To install the NetApp DataOps Toolkit for Kubernetes on the OCP-Installer VM, run the following command:
sudo dnf install python3.11
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3.11 get-pip.py
rm get-pip.py
python3.11 -m pip install netapp-dataops-k8s
NetApp DataOps Toolkit is used to create jupyterlab, clone jupyterlab, create a snapshot for a JupyterLab workspace, and so on.
Note: You can use NetApp DataOps Toolkit to create Jupyter notebooks in this solution. For more information, go to: Create a new JupyterLab workspace.
Add an Additional Administrative User to the OpenShift Cluster
It is recommended to install a permanent administrative user to an OpenShift cluster to provide an alternative to logging in with the “temporary” kubeadmin user. This section shows how to build and install an HTPasswd user. Other Identity providers are also available.
Procedure 1. Add the admin User
Step 1. On the OCP-Installer VM in the auth directory where the kubeadmin-password and kubeconfig files are stored, create an admin.htpasswd file by typing:
htpasswd -c -B -b ./admin.htpasswd admin <password>
Adding password for user admin
Step 2. Using Chrome or Firefox on the OCP-Installer VM, connect to the OpenShift console with the kubeadmin user. In the blue banner near the top of the page, click cluster OAuth configuration.
Step 3. Use the Add pulldown under Identity providers to select HTPasswd. Click Browse and browse to the admin.htpasswd file created above. Highlight the file and click Select. Click Add. The htpasswd should now show up as an Identity provider.
Step 4. Click View authentication conditions for reconfiguration status and wait for the status to become Available.
Step 5. Log out of the cluster and log back in with htpasswd and the admin user. Click Skip tour and log out of the cluster.
Step 6. Log back into the cluster with kube:admin and the kubeadmin user. Select User Management > Users, then select the admin user. Select the RoleBindings tab and click Create binding.
Step 7. Select for a Cluster-wide role binding and name the RoleBinding admin-cluster-admin. From the drop-down list under Role name, select the cluster-admin role. Click Create.
Step 8. Select User Management > Users, then select the admin user. Select the RoleBindings tab. Click the ellipses to the right of the user-settings RoleBinding to delete that RoleBinding, leaving only the cluster-admin RoleBinding.
Step 9. You can now log out of the cluster and log back in with httpasswd and the admin user. On the top left select the Administrator role. You now have full cluster-admin access to the cluster.
etcd is the key-value store for OpenShift, which persists the state of all resource objects.
For more information, see: https://docs.openshift.com/container-platform/4.16/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html.
Procedure 1. Back up etcd using a script
Assuming that the OCP-Installer VM is backed up regularly, regular OpenShift etcd backups can be taken and stored on the OCP-Installer VM.
Step 1. Create a directory on the OCP-Installer VM and create a directory inside this directory to store the backups:
cd
cd ocp
mkdir etcd-backup
cd etcd-backup
mkdir etcd-backups
Note: For more robust storage of etcd backups, an NFS volume can be created on the NetApp storage and mounted as etcd-backups in the example above.
Step 2. The following script can be created and made executable to create and save the etcd backup:
cat etcd-backup-script
#! /usr/bin/bash
ssh core@<control0 ip> sudo /usr/local/bin/cluster-backup.sh /home/core/assets/backup
ssh core@<control0 ip> sudo chmod 644 /home/core/assets/backup/*
scp core@<control0 ip>:/home/core/assets/backup/* /home/admin/ocp/etcd-backup/etcd-backups/
ssh core@<control0 ip> sudo rm /home/core/assets/backup/*
chmod 600 /home/admin/ocp/etcd-backup/etcd-backups/*
find /home/admin/ocp/etcd-backup/etcd-backups -type f -mtime +30 -delete
Note: This script deletes backups over 30 days old.
Step 3. Using sudo, add execution of this script to /etc/crontab:
cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# For details see man 4 crontabs
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
0 2 * * * admin /home/admin/ocp/etcd-backup/etcd-backup-script
Note: This example backs up etcd data daily at 2:00 am.
Step 4. In the event that an etcd restore is needed, the appropriate backup files would need to be copied back to a working control-plane node from the control-plane node:
ssh core@<control0 ip> sudo scp admin@<ocp installer vm IP>:/home/admin/ocp/etcd-backup/etcd-backups/snapshot_2024-11-12_165737.db /home/core/assets/backup/
ssh core@<control0 ip> sudo scp admin@<ocp installer vm IP>:/home/admin/ocp/etcd-backup/ static_kuberesources_2024-11-12_170543.tar.gz /home/core/assets/backup/
Step 5. To recover the cluster, see https://docs.openshift.com/container-platform/4.17/hosted_control_planes/hcp_high_availability/hcp-recovering-etcd-cluster.html#hcp-recovering-etcd-cluster.
Add a Worker Node to an OpenShift Cluster
It is often necessary to scale up an OpenShift cluster by adding worker nodes to the cluster. This set of procedures describes the steps to add a node to the cluster. These procedures require a Cisco UCS Server connected to a set of Fabric Interconnects with all VLANs in the Server Profile configured.
Procedure 1. Deploy a Cisco UCS Server Profile
Deploy a Cisco UCS Server Profile in Cisco Intersight.
Step 1. Depending on the type of server added (Cisco UCS X-Series or Cisco UCS C-Series), clone the existing OCP-Worker template and create and adjust the template according to the server type.
Step 2. From the Configure > Templates page, to the right of the OCP-Worker template setup above, click the … and select Derive Profiles.
Step 3. Under the Server Assignment, select Assign Now and select the Cisco UCS server that will be added to the cluster as a Worker Node. Click Next.
Step 4. Assign the Server Profile an appropriate Name (for example, ocp-worker3) and select the appropriate Organization. Click Next.
Step 5. Click Derive.
Step 6. From the Infrastructure Service > Profiles page, to the right of the just-created profile, click the … and select Deploy. Select Reboot Immediately to Activate and click Deploy.
Step 7. Wait until the profile deploys and activates.
Step 8. Click the server profile and go to Configuration > Identifiers and Inventory tabs note the server’s management IP, serial number, and the MAC of address of network interface eno5.
Procedure 2. Create the Bare Metal Host (BMH)
Step 1. On the OCP-Installer VM, create the following yaml file (the example shown is for worker node worker3.<domain-name>.<base-domain>:
cat bmh.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: worker3-bmc-secret
namespace: openshift-machine-api
type: Opaque
data:
username: ZmxleGFkbWlu
password: SDFnaJQwbJQ=
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
name: worker3.ocp.flexpodb4.cisco.com
namespace: openshift-machine-api
spec:
online: True
bootMACAddress: 00:25:B5:A2:0A:1B
bmc:
address: redfish://10.102.2.238/redfish/v1/Systems/WZP27020EG1
credentialsName: ocp-worker3-bmc-secret
disableCertificateVerification: True
customDeploy:
method: install_coreos
externallyProvisioned: true
Note: The username and password shown in this file are base64 encoded and can be obtained by typing “echo -ne <username> | base64”. In this case typing “echo -ne flexadmin | base64” yielded ZmxleGFkbWlu.
Note: Also note the bmc address. In this case redfish is used to connect to the server. The URL has the server serial number at the end of the URL. If you would like to use IPMI over LAN instead of redfish, just put the server’s management IP for the bmc address.
Step 2. Create the Bare Metal Host by typing the following:
oc project openshift-machine-api
oc create -f bmh.yaml
Step 3. Verify that the BMH is created by selecting Compute > Bare Metal Hosts in the OpenShift Console.
Note: With this method of creating the BMH, the server is not inspected and some details such as Serial Number, Network Interfaces, and Disks are not retrieved from the server, but the Power Management functions do work.
Step 4. In the OpenShift Console, select Compute > MachineSets. Click the … to the right of the worker MachineSet and choose Edit Machine count. Use the plus sign to increase the count by one. Click Save.
Step 5. Click Compute > Machines. A new machine in the Provisioning phase should now appear in the list.
Procedure 3. Install Red Hat CoreOS on the New Worker
Step 1. Connect to the Red Hat Hybrid Cloud Console here: https://console.redhat.com/openshift/overview and log in with your Red Hat credentials. On the left, select Cluster List. Under Cluster List, click your cluster to open it.
Step 2. Select the Add Hosts tab. Click Add hosts.
Step 3. Do not change the field settings and click Next.
Step 4. For Provisioning type, select Full image file. Browse to and select the SSH public key file used in the original cluster installation. Click Generate Discovery ISO.
Step 5. If your Cisco UCS Servers have the Intersight Advantage license installed, follow the procedure from Step 12 to use the Cisco Intersight workflow to boot the server with the Discovery ISO. Then skip to Step 15.
Step 6. Click Download Discovery ISO. The file will download to your machine. Click Close.
Note: This is a slightly different ISO than the one used to install the cluster and must be downloaded to successfully add a node.
Step 7. Place the downloaded Discovery ISO on your HTTP or HTTPS web server and use a web browser to obtain the URL of the ISO.
Step 8. In Cisco Intersight, edit the Virtual Media Policy that is part of the server profile. On the Policy Details page, select Add Virtual Media.
Step 9. In the Add Virtual Media dialogue, leave CDD selected and select HTTP/HTTPS. Provide a name for the mount and add the URL for File Location.
Step 10. Click Add.
Step 11. Click Save.
Step 12. Under Infrastructure Service > Profiles, click the three dots to the right of the newly added worker server profile and select Deploy. Select only the bottom checkbox and select Deploy.
Note: It is not necessary to redeploy the remaining server profiles. The Inconsistent status will be resolved after CoreOS is installed on the newly added worker.
Step 13. Click the … to the right of the newly added worker profile and select Server Actions > Power > Power Cycle. In the popup, click Power Cycle. The reboot from the Discovery ISO can be monitored with a vKVM Console (Server Actions > Launch vKVM).
Step 15. Click Install ready hosts. The installation of CoreOS will take several minutes.
Note: Once the CoreOS installation completes (Status of Installed), the server will reboot, boot CoreOS, and reboot a second time.
Step 16. In Cisco Intersight, edit the vMedia policy and remove the virtual media mount. Go to Profiles > Server Profiles page, deploy the profile to the newly added worker profile without rebooting the host. The Inconsistent state on the remaining profiles should be cleared.
Step 17. In the OpenShift Console, select Compute > Nodes. Once the server reboots have completed, the newly added worker will appear in the list as Discovered. Click Discovered and then select Approve. Click Not Ready and select Approve.
Step 18. To link the Bare Metal Host to the Machine, select Compute > Machines. For the newly-added machine in the Provisioning Phase, note the last five characters in the machine name (for example, bqz2k).
Step 19. Select Compute > Bare Metal Hosts. Select the BMH above the newly added BMH (for example, worker2). Select the YAML tab. Select and copy the entire consumerRef field right underneath the externallyProvisioned field.
Step 20. Select Compute > Bare Metal Hosts. Select the BMH for the newly added BMH (for example, worker3). Select the YAML tab. Place the cursor at the end of the externallyProvisioned: true line and press Enter to insert a new line. Backspace to the beginning of the line and then paste in the consumerRef field from the previous step. Replace the last five characters in the name field with the five characters noted above (for example, bqz2k).
Step 21. Click Save. Click Compute > Machines. The newly added machine should now be in the Provisioned Phase.
Step 22. To link this machine to the node, click this newly added machine and select the YAML tab. Under spec, select and copy the entire providerID line.
Step 23. Select Compute > Nodes. Select the newly-added node and select the YAML tab. Scroll down to find the spec field. Select and delete the {} to the right of spec: and press Enter to add a line. Paste in the providerID field with a two space indention and click Save.
Note: The OpenShift nodes update frequently, and it will be necessary if an update has occurred to reload the YAML tab. After reloading, you may need to make the changes again.
Step 24. Select Compute > Bare Metal Hosts. The newly-added BMH should now be linked to a node.
Step 25. Select Compute > Machines. The newly-added machine should now be in the Provisioned as node Phase and should be linked to the node.
Deploy a Sample Containerized Application
To demonstrate the installation of Red Hat OpenShift on Bare Metal on FlexPod Datacenter, a sample containerized application can be installed and run. In this case Stable Diffusion XL 1.0 will be run utilizing the Intel CPUs. If you have NVIDIA GPUs installed, refer to FlexPod Datacenter with Generative AI Inferencing - Cisco for details on deploying Stable Diffusion utilizing an NVIDIA GPU. This installation will use NetApp DataOps Toolkit, installed above, to install a Jupyter Notebook and then Intel OpenVino to run Stable Diffusion XL.
Procedure 1. Deploy Jupyter Notebook
Step 1. From the OCP-Installer VM, run the following command to deploy a jupyter notebook with an nfs persistent storage 90G disk, no gpus, and the latest PyTorch container at this time:
netapp_dataops_k8s_cli.py create jupyterlab --workspace-name=sd-xl -c ontap-nfs --size=90Gi --nvidia-gpu=0 -i nvcr.io/nvidia/pytorch:25.03-py3
Step 2. Enter and verify a password for the notebook. The notebook is created in the ‘default’ namespace. The deployment will take a few minutes to reach Ready state:
Setting workspace password (this password will be required in order to access the workspace)...
Enter password:
Verify password:
Creating persistent volume for workspace...
Creating PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.
PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-sd-xl' created. Waiting for Kubernetes to bind volume to PVC.
Volume successfully created and bound to PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.
Creating Service 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.
Service successfully created.
Creating Deployment 'ntap-dsutil-jupyterlab-sd-xl' in namespace 'default'.
Deployment 'ntap-dsutil-jupyterlab-sd-xl' created.
Waiting for Deployment 'ntap-dsutil-jupyterlab-sd-xl' to reach Ready state.
Deployment successfully created.
Workspace successfully created.
To access workspace, navigate to http://10.102.2.211:31809
Step 3. Once the Workspace is successfully created, use a Web browser on a machine with access to the Baremetal subnet to connect to the provided URL. Log in with the password provided.
Step 4. Click the Terminal icon to launch a terminal in the PyTorch container. The Stable Diffusion XL 1.0 model by default is stored in /root/.cache. To redirect this to the persistent storage (mounted on /workspace), run the following commands:
mkdir /workspace/.cache
cp -R /root/.cache/* /workspace/.cache/
rm -rf /root/.cache
ln -s /workspace/.cache /root/.cache
Step 5. Install Diffusers and OpenVino:
pip install --upgrade diffusers transformers scipy accelerate
pip install optimum[openvino]
pip install openvino==2024.6.0
Step 6. Click the + icon to add a window and select Python File. Add the following:
from optimum.intel import OVStableDiffusionXLPipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipeline = OVStableDiffusionXLPipeline.from_pretrained(model_id)
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k resolution"
image = pipeline(prompt).images[0]
image.save("astronaut_intel.png")
Step 7. Right-click untitled.py and select Rename Python File. Name the file Run-SDXL.py and choose Rename. Click the x to the right of Run-SDXL.py to close the file and click Save.
Step 8. In the Terminal window, run Stable Diffusion XL by typing python Run-SDXL.py. On the first run, the Stable Diffusion XL model will be downloaded to persistent storage. Subsequent runs will take less time.
Step 9. Once the run is complete, double-click the astronaut_intel.png file from the list on the left.
Step 10. From the OpenShift console, on the left click Workloads > Pods. In the center pane, from the drop-down list select the default Project.
Step 11. On the left, select Deployments. In the center pane, select the Jupyterlab Deployment and then select the YAML tab. This info can be used as a guide to create a yaml file to do a command line deployment using “oc” of a pod. The YAML can also be modified to customize the deployment. If you edit the deployment, you will need to delete the corresponding pod to spin the container and you will then need to add the symbolic link and reinstall the python libraries with pip.
John George, Technical Marketing Engineer, Cisco Systems, Inc.
John has been involved in designing, developing, validating, and supporting the FlexPod Converged Infrastructure since it was developed more than 13 years ago. Before his role with FlexPod, he supported and administered a large, worldwide training network and VPN infrastructure. John holds a master’s degree in Computer Engineering from Clemson University.
Kamini Singh, Technical Marketing Engineer, Hybrid Cloud Infra & OEM Solutions, NetApp
Kamini Singh is a Technical Marketing engineer at NetApp. She has more than five years of experience in data center infrastructure solutions. Kamini focuses on FlexPod hybrid cloud infrastructure solution design, implementation, validation, automation, and sales enablement. Kamini holds a bachelor’s degree in Electronics and Communication and a master’s degree in Communication Systems.
Acknowledgements
For their support and contribution to the design, validation, and creation of this Cisco Validated Design, the authors would like to thank:
● Archana Sharma, Technical Marketing Engineer, Cisco Systems, Inc.
● Paniraja Koppa, Technical Marketing Engineer, Cisco Systems, Inc.
Feedback
For comments and suggestions about this guide and related guides, join the discussion on Cisco Community at https://cs.co/en-cvds.
CVD Program
ALL DESIGNS, SPECIFICATIONS, STATEMENTS, INFORMATION, AND RECOMMENDATIONS (COLLECTIVELY, "DESIGNS") IN THIS MANUAL ARE PRESENTED "AS IS," WITH ALL FAULTS. CISCO AND ITS SUPPLIERS DISCLAIM ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE. IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THE DESIGNS, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
THE DESIGNS ARE SUBJECT TO CHANGE WITHOUT NOTICE. USERS ARE SOLELY RESPONSIBLE FOR THEIR APPLICATION OF THE DESIGNS. THE DESIGNS DO NOT CONSTITUTE THE TECHNICAL OR OTHER PROFESSIONAL ADVICE OF CISCO, ITS SUPPLIERS OR PARTNERS. USERS SHOULD CONSULT THEIR OWN TECHNICAL ADVISORS BEFORE IMPLE-MENTING THE DESIGNS. RESULTS MAY VARY DEPENDING ON FACTORS NOT TESTED BY CISCO.
CCDE, CCENT, Cisco Eos, Cisco Lumin, Cisco Nexus, Cisco StadiumVision, Cisco TelePresence, Cisco WebEx, the Cisco logo, DCE, and Welcome to the Human Network are trademarks; Changing the Way We Work, Live, Play, and Learn and Cisco Store are service marks; and Access Registrar, Aironet, AsyncOS, Bringing the Meeting To You, Catalyst, CCDA, CCDP, CCIE, CCIP, CCNA, CCNP, CCSP, CCVP, Cisco, the Cisco Certified Internetwork Expert logo, Cisco IOS, Cisco Press, Cisco Systems, Cisco Systems Capital, the Cisco Systems logo, Cisco Unified Computing System (Cisco UCS), Cisco UCS B-Series Blade Servers, Cisco UCS C-Series Rack Servers, Cisco UCS S-Series Storage Servers, Cisco UCS X-Series, Cisco UCS Manager, Cisco UCS Management Software, Cisco Unified Fabric, Cisco Application Centric Infrastructure, Cisco Nexus 9000 Series, Cisco Nexus 7000 Series. Cisco Prime Data Center Network Manager, Cisco NX-OS Software, Cisco MDS Series, Cisco Unity, Collaboration Without Limitation, EtherFast, EtherSwitch, Event Center, Fast Step, Follow Me Browsing, FormShare, GigaDrive, HomeLink, Internet Quotient, IOS, iPhone, iQuick Study, LightStream, Linksys, MediaTone, MeetingPlace, MeetingPlace Chime Sound, MGX, Networkers, Networking Academy, Network Registrar, PCNow, PIX, PowerPanels, ProConnect, ScriptShare, SenderBase, SMARTnet, Spectrum Expert, StackWise, The Fastest Way to Increase Your Internet Quotient, TransPath, WebEx, and the WebEx logo are registered trade-marks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries. (LDW_P3)
All other trademarks mentioned in this document or website are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (0809R)