Ultra Cloud Core Subscriber Microservices Infrastructure - Operations Guide - SMI Cluster Manager Operations [Cisco Ultra Cloud Core - Subscriber Microservices Infrastructure]

Overview

The Subscriber Microservices Infrastructure (SMI) Kubernetes (K8s) Cluster Manager allows you to deploy the virtual machines (VMs) – provisioned through the vCenter – on your VMware environment. You can deploy the SMI K8s Cluster Manager either in All-in-one (AIO) or Multimode configurations. In the multimode configuration, the SMI K8s Cluster Manager can manage multiple clusters - with the minimum being three clusters. Also, the in-built Ingress support in K8s allows you to access the K8s cluster over HTTP.

The SMI K8s cluster includes the following VMs:

Control Plane – The stateless control plane VMs.
ETCD – The database for the K8s metadata. It is separated from the control plane VMs.
OAM – The OAM nodes reserved for CEE. It performs monitoring and logging.
APP – The SMI applications such as SMF, PCF and so on

The subsequent sections provide more information about the different components in the SMI K8s Cluster Manager.

Sync API

The Sync API is responsible for deploying the VMs, provisioning the base OS, applications, and application host OS. It also provisions the K8s and add-ons (associated with K8s) in both the VMware and OpenStack environments.

For application provisioning, the Sync API allows you to install the applications and configure it through their own APIs. All the 5G applications come equipped with its own Ops Centers. The SMI K8s Cluster Manager allows you to deploy these application Ops Center on top of it.

Note

The SMI supports only the UTC time zone by default. Use this time zone for all your deployment and installation activities related to the SMI.

Offline Images

All the applications (5G and Cisco Cable) based on the SMI framework are provided as helm charts (templating for K8s) and docker images. The Helm charts provide the metadata and docker images provide the application containers in the K8s cluster.

For the offline deployment, the docker images and helm charts are provided as TAR balls for loading into the K8s.

For online deployment, anyone with access to devhub.cisco.com can pull these docker images and helm charts through the CLI. Also, you can customize specific applications that are loaded into the SMI K8s Cluster Manager using the customization (docker) image.

Command Line Interface

The SMI K8s Cluster Manager Command Line Interface (CLI) consists of two major components: Environment and Clusters.

Environment – The environment defines the vCenter environment to be used. It has two options: vCenter or Manual. For OpenStack environments, select the Manual option in the CLI to skip deploying the VMs on to the cluster.
Cluster – You can specify the SMI K8s Cluster Manager to link to the specific cluster through the CLI. In addition, you can also define cluster configurations such as Virtual-IP address, size, Kubernetes add-ons and so on.

To view the current running configuration, use the show running config command in the CLI.

Operating the SMI Cluster Manager on Bare Metal

The SMI Cluster Manager manages the configuration and life-cycle of the Kubernetes Cluster and the VNF Nodes (UPF) deployed on Bare Metal servers. The subsequent sections describe the operations involved in deploying the remote clusters on Bare Metal (Cisco UCS) servers.

Deploying Remote Clusters

This section describes the procedure to deploy a remote Kubernetes and UPF cluster using the SMI Cluster Manager on Cisco UCS servers.

Deploying Kubernetes Cluster

You can deploy a Kubernetes Cluster when a Cluster Manager (HA or AIO or Inception) is already available. To deploy a Kubernetes Cluster:

Setup the cluster configuration.

The following is a sample UCS Configuration for deploying a Kubernetes Cluster:

Important

When upgrading to Ubuntu 22.04, use routes gateway_address instead of gateway4 gateway_address in your network configuration.

software cnf cee 
 url <repo_url> 
 user <user_name> 
 password <password> 
 sha256 <sha256_hash> 
exit 

 # associating to Bare Metal environment
environments bare-metal 
 ucs-server 
 exit 

# General cluster configuration
clusters <cluster_name> 
  environment bare-metal 
  addons ingress bind-ip-address <bind_ip_address> 
  addons cpu-partitioner enabled 
  configuration master-virtual-ip <master_vip> 
  configuration master-virtual-ip-interface <master_vip_interface_name> #For example, eno1 
  configuration allow-insecure-registry true 
 node-defaults initial-boot default-user <username> #For example, cloud-user 
 node-defaults initial-boot default-user-ssh-public-key  
  "<SSH_Public_Key>"
 node-defaults initial-boot default-user-password <password> 
 node-defaults netplan template  
 node-defaults initial-boot netplan ethernets eno1
  dhcp4    false
  dhcp6    false
  gateway4 <gateway_ipv4address>
  nameservers search [ <domain_name> ]
  nameservers addresses [ <ipv4address>...<ipv4address> ]
 exit 
 node-defaults k8s ssh-username <username> 
 node-defaults k8s ssh-connection-private-key 
 "<SSH_Private_Key>" 
 #initial-boot section of node-defaults
   node-defaults ucs-server host initial-boot networking static-ip netmask <ipv4_address> 
   node-defaults ucs-server host initial-boot networking static-ip gateway <ipv4_address> 
   node-defaults ucs-server host initial-boot networking static-ip dns <ipv4_address> 
   node-defaults ucs-server cimc user <username> 
   node-defaults ucs-server cimc password <password> 
   node-defaults ucs-server cimc remote-management sol enabled 
   node-defaults ucs-server cimc remote-management sol baud-rate <baud_rate> 
   node-defaults ucs-server cimc remote-management sol comport <com_port>  
   node-defaults ucs-server cimc remote-management sol ssh-port <ssh_port> 
   node-defaults ucs-server cimc networking ntp enabled 
   node-defaults ucs-server cimc networking ntp servers <ntp_server_url> 
   exit 

 node-defaults os proxy https-proxy <proxy_server> 
 node-defaults os proxy no-proxy <proxy_servers> 
 node-defaults os ntp enabled 
 node-defaults os ntp servers <ntp_server_url> #For exmaple, ntp.esl.cisco.com node-defaults os proxy https-proxy http://proxy-wsa.esl.cisco.com:80
 exit 
  
#node configuration
  ucs-server host initial-boot networking static-ip ipv4-address <ipv4address> 
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
  exit 
 
# control plane node configuration
nodes <control_plane_node_name> #For example, control-plane-1 
  k8s node-type control-plane 
  k8s ssh-ip <ipv4address>  
  k8s node-labels <node_label/node_type> #For example, smi.cisco.com/oam 
  exit 
  ucs-server host initial-boot networking static-ip ipv4-address <ipv4address> 
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
  exit 
 nodes <control_plane_node_name> #For example, control-plane-2 
  k8s node-type control-plane 
  k8s ssh-ip <ipv4address> 
  k8s node-labels <node_label/node_type> #For example, smi.cisco.com/oam 
  exit 
  ucs-server host initial-boot networking static-ip ipv4-address <ipv4address>  
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
  exit 
 nodes <control_plane_node_name> #For example, control-plane-3 
  k8s node-type control-plane 
  k8s ssh-ip <ipv4address> \ 
  k8s node-labels <node_label/node_type>  #For example, smi.cisco.com/node-type oam 
  exit 
  ucs-server host initial-boot networking static-ip ipv4-address <ipv4address>  
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
  exit 
 ops-centers cee <ops_center_name> #For example, cee 
  repository-local <repo_name> cee-2020-02-0-i04 
 exit 
exit

Login to the Cluster Manager CLI and enter the configuration mode

Add the Kubernetes Cluster configuration to deploy the Kubernetes Cluster.

Note

A sample Kubernetes Cluster Configuration is provided here.

Commit the configuration.

Run the synchronization.

clusters cluster_name actions sync run debug true

Monitor the progress of the synchronization.

monitor sync-logs cluster_name

Note

The synchronization completes after 30 minutes approximately. The time taken for synchronization is based on network speed, VM power, and so on.

The node names are added to /etc/host as part of the sync process. You can connect to the nodes using the node name from the control plane node.

Deploying Remote UPF Clusters

The SMI Cluster Manager uses the Kernel Based Virtual Machine (KVM) to deploy the User Plane Function (UPF) on Bare Metal Servers. The subsequent sections describe the procedures involved in installing the KVM and UPF.

Installing Kernel Based Virtual Machines

The SMI Cluster Manager utilizes the Kernel Based Virtualization (KVM) – a virtualization technology – to deploy the KVMs.

To deploy the KVM:

Setup the KVM configuration.

The following is a sample KVM configuration.

Important

When upgrading to Ubuntu 22.04, use routes gateway_address instead of gateway4 gateway_address in your network configuration.

software upf <version_number> #For example, v748 
 url <repo_url> 
 user <username> 
 password <password> 
 sha256 <sha256_hash> #For example, 9141df47188fb795f3805fd61abf8784d52d2916f32014f564572a6cbbf7c545 
 description "<description>" #For example, "UPF software version v748" 
 exit 

 # associating to Bare Metal environment
environments bare-metal 
 ucs-server 
 exit 

# General cluster configuration
clusters <cluster_name> 
  environment bare-metal 
  addons ingress bind-ip-address <bind_ip_address> 
  addons cpu-partitioner enabled 
  configuration master-virtual-ip <master_vip> 
  configuration master-virtual-ip-interface <master_vip_interface_name> #For example, eno1 
  configuration allow-insecure-registry true 
  node-defaults ssh-username <username> 
  node-defaults initial-boot default-user <username> 
  node-defaults initial-boot default-user-ssh-public-key  
  "<SSH_Public_key>"
  node-defaults ssh-connection-private-key  
  "-----BEGIN OPENSSH PRIVATE KEY-----
    <SSH_Private_Key>    
  -----END OPENSSH PRIVATE KEY-----\n"
   node-defaults initial-boot default-user-password <password> 
   node-defaults initial-boot netplan ethernets <ethernet_interface> #For example, eno1 
    dhcp4 false 
    dhcp6 false 
    gateway4 <gateway_ipv4_address> 
    nameservers search <<domain_name>> 
    nameservers addresses <nameserver_ipv4_addresses> 
    exit 

 # initial-boot section of node-defaults
   node-defaults ucs-server host initial-boot networking static-ip netmask <ipv4_address> 
   node-defaults ucs-server host initial-boot networking static-ip gateway <ipv4_address> 
   node-defaults ucs-server host initial-boot networking static-ip dns <ipv4_address> 
   node-defaults ucs-server cimc user <username> 
   node-defaults ucs-server cimc password <password> 
   node-defaults ucs-server cimc remote-management sol enabled 
   node-defaults ucs-server cimc remote-management sol baud-rate <baud_rate> 
   node-defaults ucs-server cimc remote-management sol comport <com_port>  
   node-defaults ucs-server cimc remote-management sol ssh-port <ssh_port> 
   node-defaults ucs-server cimc networking ntp enabled 
   node-defaults ucs-server cimc networking ntp servers <ntp_server_url> 
   exit 
 
   node-defaults os proxy https-proxy <proxy_server> 
   node-defaults os proxy no-proxy <proxy_servers> 
   node-defaults os ntp enabled 
   node-defaults os ntp servers <ntp_server_url> 
   #For monitoring the LCM's IP range (Optional)
   node-defaults kvm monitoring local-ip-address-range <ipv4address/subnet>  
   exit 
   node-type-defaults kvm 
     os netplan-additions bridges <bridge_name> #For example, ex4000 
     addresses <ipv4_address/subnet> 
     exit 
   exit 

#node configuration
 nodes <node_name> #For example, kvm-1 
  ssh-ip <ssh_ipv4address> 
  type kvm 
  vms <vm_name> kvm-1 
    upf software <software_version> #For example, v748 
    upf networking management ip <ipv4addess>      
    upf networking management netmask <ipv4address> 
    upf networking management gateway <ipv4address> 
    upf networking management interface-type bridge 
    upf networking management bridge name <bridge_name> #For example, ex4000 
    type upf 
    exit 
  vms <vm_name> #For example, upf2 
    upf software <software_version> #For example, v748 
    upf networking management ip  <ipv4address> 
    upf networking management netmask <ipv4address> 
    upf networking management gateway <ipv4address> 
    upf networking management interface-type bridge 
    upf networking management bridge name <bridge_name> #For example, ex4000 
    type upf 
    exit 
  
#Default VM configuration
vm-defaults upf day0 username username 
  vm-defaults upf day0 password password 
  ucs-server host initial-boot networking static-ip ipv4-address <ipv4address> 
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
   
  initial-boot netplan ethernets <interface_name> #For example, eno1 
   addresses <ipv4address/subnet> 
   exit 
 exit 
 
# control-plane node configuration 
  ucs-server host initial-boot networking static-ip ipv4-address <ipv4address>  
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
    initial-boot netplan ethernets <interface_name> #For example, eno1 
   addresses <ipv4address/subnet> 
  #Configure secure boot (optional)
  ucs-server cimc bios configured-boot-mode Uefi 
  ucs-server cimc bios uefi-secure-boot yes 
  exit 
 exit 
   ucs-server host initial-boot networking static-ip ipv4-address <ipv4address>  
  ucs-server cimc ip-address <ipv4address> 
  ucs-server cimc storage-adaptor create-virtual-drive true 
  initial-boot netplan ethernets <interface_name> #For example, eno1 
   addresses <ipv4address/subnet> 
  exit 
 exit 
 ops-centers cee <ops_center_name> #For example, cee 
  repository-local <repo_name> cee-2020-02-0-i04 
 exit 
exit

Note

For the Clusters on Edge deployments, you must define the ucs-server host initial-boot networking parameter. This reduces latency in bringing up the ISO media in CIMC.

The following is an example configuration for deploying UPF on remote sites:

ucs-server host initial-boot networking static-ip ipv4-address <IPv4address> 
ucs-server host initial-boot networking static-ip netmask <IPv4address> 
ucs-server host initial-boot networking static-ip gateway <IPv4address> 
ucs-server host initial-boot networking static-ip dns <IPv4address>

If ucs-server host initial-boot networking parameter is not defined, the CIMC can timeout (and throw errors) while trying to download a SMI hard drive image instead of embedding it into the ISO file (30 MB versus 300 MB).

Login to the Cluster Manager CLI and enter configuration mode.

Add the KVM configuration to deploy the KVM clusters.

Note

A sample configuration to deploy KVM clusters is provided here.

Commit and exit the configuration.

Validate the cluster configuration in the Cluster Manager.

Example:

clusters cndp-testbed-cm actions validate-config run log-level DEBUG vmware-checks false k8s-node-checks false
This will run validation.  Are you sure? [no,yes] yes
message 2020-05-06 19:50:38.597 INFO __main__: Verifying ntp config ......
...
...
2020-05-06 19:50:45.723 INFO __main__: You have not run all checks together. Run clusters cndp-testbed-cm actions validate run

valid TRUE

Configure KVM using the following configuration:

configure
	  clusters <cluster_name>
		vm-defaults
		node-defaults kvm monitoring local-ip-address-range ip-address-and-prefix 
             node-type-defaults kvm
             nodes 
		type <k8s/kvm>
		vms <name>
		ssh-ip <kvm_IP>
             exit
         exit

Configure the KVM using the following configuration.

day 0 – Specifies the configuration applicable to the VM when it is first created. After initial creation (day 0), changes here will not apply unless the VM is deleted and redeployed.
- username – Specifies the StarOS administrator username
- password – Specifies the StarOS administrator password
- syslog-ip – Specifies the StarOS logging syslog IP

networking – Specifies the configuration for the management interface.

Note

Other networks are expected to be provisioned as day 1 configuration by talking directly to the VM.

IP_Address - Specifies the IP address.
Netmask - Specifies the Netmask.
Gateway - Specifies the Gateway.
interface-type - Only bridge is supported for now.
bridge name - Specifies the bridge name. For more details, see the bridge section defined in KVM configuration.
domain-name - Specifies the domain name.
name-servers - Specifies the name servers.

ntp-servers – Specifies the NTP (Network Time Protocol) settings for VM.
software – Specifies the link to version of software to deploy.
nodes [name] vms - Currently, two VMs are allowed per node to align with NUMA.
- type - Provides the ability to deploy UPF, VPC-DI control function (CF), or service function (SF) VMs. UPF is default type of VM chosen to be deployed.
os enable-passthrough [true_or_false] - To use PCI Passthrough to pass the NIC to VM, enable-passthrough must be enabled. By default VMs use SRIOV to configure the network interfaces.
os num-vfs-per-pf [vf_num] - Specifies the number of VFs created for each PF. The default value is 16.

Run the synchronization.

clusters cluster_name actions sync run debug true

Monitor the progress of the cluster synchronization.
```
monitor sync-logs cluster_name 
```

Connect to the VM node through console in KVM after the synchronization is complete

virsh console <vm_name> serial1--force

or SSH to VM if the bridge is configured locally

ssh username@management_ip

Note

For authentication, you can use the username and password configured in Day 0 configuration. Also, the SMI Cluster Manger is not required for operating the deployed VMs.

NOTES:

vm-defaults - Allows configuring the necessary values across all the VMs. It is available at cluster and node level.
kvm monitoring local-ip-address-range ip-address-and-prefix – Specifies the IP configuration for LCM.
node-type-defaults kvm - Allows configuring the necessary values across all the KVMs.
type <k8s/kvm> - Allows only nodes of type kvm to be run
vms <name> - Specifies the VM configuration.
ssh-ip <kvm_IP> - Specifies the KVM IPv4 address

Upgrading the UPF Clusters

You can upgrade the UPF Clusters using the SMI Cluster Manager. To upgrade, use the following configurations:

Login to the Inception Cluster Manager CLI and enter the Global Configuration mode.

To upgrade, add a new software definition for the software.

configure 
  software upf <upf_software_version> 
  url <repo_url> 
  user <user_name> 
  password <password> 
  sha256 <SHA256_hash_key> 
  exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# software upf <upf_software_version>
SMI Cluster Manager(config)# url <repo_url>
SMI Cluster Manager(config)#user <username>
SMI Cluster Manager(config)#password "<password>"
SMI Cluster Manager(config)#sha256 <sha256_hash_key>
SMI Cluster Manager(config)#exit
SMI Cluster Manager(config)#

Commit the changes.

Trigger the Cluster synchronization.

configure 
  clusters <cluster_name> actions sync run debug true

Example:

Cluster Manager# config 
Cluster Manager(config)# clusters cndp-testbed actions sync run debug true

Monitor the upgrade progress

monitor syc-logs <cluster_name>

Example:

Cluster Manager# monitor syc-logs cndp-testbed-cm

Login to the CEE control plane node after the Cluster synchronization completes.
```
https://cli.upf-ops-center.<ipv4_address>.<domain_name>/ 
```

Verify the software version using the following command.

show system ops-center

Example:

SMI Cluster Manager# show system ops-center

NOTES:

software upf <upf_software_version> - Specifies the UPF software package.
url <repo_url> - Specifies the HTTP/HTTP/file URL of the software.
user <user_name> - Specifies the username for HTTP/HTTPS authentication.
password <password> - Specifies the password used for downloading the software package.
sha256 <SHA256_hash_key> - Specifies the SHA256 hash of the downloaded software.

Deploying the CEE Clusters

You can deploy the CEE Clusters using the SMI Cluster Manager. To deploy a CEE Cluster, use the following configurations:

Note

From the SMI Cluster Manager perspective, the product refers to Common Execution Environment (CEE). The deployment procedure mentioned in the subsequent section is specific to CEE. However, you can follow the same procedure to deploy 5G Network Functions (SMF or PCF) using the SMI Cluster Manager.

Setup the Cluster Manager Configuration.

The following is an sample UCS configuration for deploying CEE Clusters.

software cnf <software_version> #For example, cee-2020-02-0-i04 
 url <repo_url> 
 user <username> 
 password <password> 
 sha256 <sha256_hash> 
exit 
environments bare-metal 
 ucs-server 
exit 
clusters <cluster_name> #For example, cndp-testbed 
  ops-centers cee <app_name> #For example, cee 
  repository-local <repo_name> #For example, cee-2020-02-0-i04 
 exit 
exit

Login to the Cluster Manager CLI and enter the Global Configuration mode.
Commit the changes.

Trigger the Cluster synchronization.

configure 
  clusters <cluster_name> actions sync run debug true

Example:

Cluster Manager# config 
Cluster Manager(config)# clusters cndp-testbed actions sync run debug true

Monitor the upgrade progress

monitor syc-logs <cluster_name>

Example:

Cluster Manager# monitor syc-logs cndp-testbed-cm

Login to the CEE control plane node after the Cluster synchronization completes.
```
https://cli.cee-cee-ops-center.<ipv4_address>.<domain_name>/ 
```

NOTES:

software cnf <cnf_software_version> - Specifies the Cloud Native Function software package.
url <repo_url> - Specifies the HTTP/HTTP/file URL of the software.
user <user_name> - Specifies the username for HTTP/HTTPS authentication.
password <password> - Specifies the password used for downloading the software package.
sha256 <SHA256_hash_key> - Specifies the SHA256 hash of the downloaded software.

Upgrading the CEE Clusters

You can upgrade the CEE Clusters using the SMI Cluster Manager. To upgrade, use the following configurations:

Login to the Inception Cluster Manager CLI and enter the Global Configuration mode.

To upgrade, add a new software definition for the software.

configure 
  software cnf <cnf_software_version> 
  url <repo_url> 
  user <user_name> 
  password <password> 
  sha256 <SHA256_hash_key> 
  exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# software cnf <cnf_software_version>
SMI Cluster Manager(config)# url <repo_url>
SMI Cluster Manager(config)#user <username>
SMI Cluster Manager(config)#password "<password>"
SMI Cluster Manager(config)#sha256 <sha256_hash_key>
SMI Cluster Manager(config)#exit
SMI Cluster Manager(config)#

Commit the changes.

Trigger the Cluster synchronization.

configure 
  clusters <cluster_name> actions sync run debug true

Example:

Cluster Manager# config 
Cluster Manager(config)# clusters cndp-testbed actions sync run debug true

Monitor the upgrade progress

monitor syc-logs <cluster_name>

Example:

Cluster Manager# monitor syc-logs cndp-testbed-cm

Login to the CEE control plane node after the Cluster synchronization completes.
```
https://cli.cee-ops-center.<ipv4_address>.<domain_name>/ 
```

Verify the software version using the following command.

show system ops-center

Example:

SMI Cluster Manager# show system ops-center

NOTES:

software cnf <cnf_software_version> - Specifies the CNF software package.
url <repo_url> - Specifies the HTTP/HTTP/file URL of the software.
user <user_name> - Specifies the username for HTTP/HTTPS authentication.
password <password> - Specifies the password used for downloading the software package.
sha256 <SHA256_hash_key> - Specifies the SHA256 hash of the downloaded software.

Cluster Synchronization Lock

In Cluster Manager HA mode with multiple remote clusters, you can lock cluster synchronization to prevent operational errors on the undesired cluster. Once you enable the lock, cluster sync will be rejected and can be triggered only after the lock is disabled.

When the cluster is locked, you can run cluster sync using the actions sync run debug true command when synchronization phase is set to server-check .

To enable or disable cluster synchronization, use the following command in Config mode:

[ no ] clusters cluster_name sync-lock

Deploy the VPC-DI cluster

The VPC-DI deployment is essential to build a scalable, flexible, resilient, and cost-effective network that can support a wide range of services, including Mobile IoT and Mobility. Deploying CUPS as a virtualized network function (VNF) on SMI allows you to take advantage of the benefits of virtualization and cloudification.

Table 1. Feature History
Feature Name	Release	Feature Description
Lifecycle management of VPC-DI	2025.02.1	This release provides necessary support to deploy the VPC-DI VMs on SMI. This release offers new software types and VM configurations for KVM deployments, exposing VM sizing parameters. Command enhanced: clusters vpc-di nodes `node_name` vms { cf \| sf \| upf }

Note

This release supports only the deployment of CUPS CP.

Before you begin

The VPC-DI software package (including the qvpc-di-cf.qcow2 and qvpc-di-xf.qcow2 images) must be available in a location accessible by the SMI Cluster Deployer.
The host profile configuration must be created and available. For more information, see the Configurable Host Profile Support section.

Procedure

Step 1

Select the software type as VPC-DI for KVM deployment and configure the VPC-DI software in SMI cluster deployer.

Example:

SMI Cluster Deployer# show running-config software vpc-di
software vpc-di 21.28.mh26.96898
 url                            file:////data/software/local-file/qvpc-di-21.28.mh26.96898.qcow2.tgz
 user                           admin
 password                       $8$iHVaBKTYwrEhFnBf7R+67ChgdnJ1FbK25vQ9nNP3EkM=
 accept-self-signed-certificate true
 sha256                         48b9ed1abc39a8fff4340002fc49f6d4b4cde5188632a2162df808ed10f1407c
exit

Step 2

Configure the cluster, nodes, and VMs. Define the VM type as cf (control function) or sf (service function). VPC-DI software package has two images in the bundle. The -cf image is for CF and the -xf- image is for sf.

Example:

Sample CF configuration:

SMI Cluster Deployer# show running-config clusters vpc-di-xxx nodes node1
clusters vpc-di-xxx
 nodes node1
  ssh-ip 209.165.201.1
  type   kvm
  vms cf1
   vpc-di software 21.28.mh26.96898
   vpc-di networking management ip 209.165.201.2
   vpc-di networking management netmask 255.255.255.0
   vpc-di networking management gateway 209.165.201.3
   vpc-di networking management interface-type bridge
   vpc-di networking management bridge name ex4000
   vpc-di day0 username admin
   vpc-di day0 password test
   vpc-di day0 syslog-ip 209.165.201.4
   ## key must be in pem format
   vpc-di day0 user-key "$8$OGub4KW9QtlJepOdkD35Jv3Egp7bWAsP3t8uCHxS4ECk2j1rCdaAWmqXuXefR/BRW9Xp0hQU\n5LeTrRDpVx0iyer/
lVmGbLWqSG6NPgNwG9t+YjItzYCvf8kdvS0HYvHTUyq+hS1YbLUHdv+x\nPyaIIe818UXxPiEaHDob3S3l8M6goasXhrRe/d++C54CDD2wt37bhjS2vny3m+a3E3UL5Pkr
\nruPRdXn9G2zFd1esttXfA3+x5mHpa72N7uIWH7jUWLEY5/5jBdZGuC83/zBfgiA4eNYVl6Gd\nzpmkCytdR1ZmgvJYjvrIVGdc2G7c9dJekJU4qxZlsXINeI9H5rhKzqkg4/
7mAJipewOpDdug\nCROB+qj7KQO1RQUpBFp59Sq/NQCsWka+hbAeBVuzHT5ip4JZYm+2cqO9TmV2Ae+KiM8KSMFB\ncMyWxCmgC5bMAq31Vs5/9PvxCxe/vdvkVMV6OyKIwww
XcFMM7MhGejjjBrvqJZwr1eDsaQNt\nQF6Iga6a66XjbcaUSDgHBPkZWm3/QpwDt5nVZ29pSfim+i3HuucM/HTU6G2NdJadbgmpZ5G\n44mOCMoMZRo23uqsP/tfJpAXt8rC7SH9KAn5bc
3uoM8kM59kFjNqM+m3fGw1entmzF1BPXg3\n4e/0RlvzM3I4piANJTVXbqHUHAHWGFFKVEx/kT4wml3qo9VinybqyFRkGtkS8iGcHyyVYrf9\nzwfApCWAuAce7BASczZeH1ITzck89LS76Kx
EqftMLXAUf4DUw3z+p5Pek+Ofojyny5IuaSSj\n3cL1R8hOkTWoOJ0HGWNPTU4bR8eaoKVshpOZJVAq4misPOwwjHr4Qx8XTRkuBgyibDKs4BZ3\nRKI88HOkINNo7UblhE34op52KY+2Lwb3Gmw
31yvFIYl23P6Lho2/HBz6/QDOLU0+r1rTMO/w\n6xjQRS3qgEFcjpelFNAsBTlUrnzsfEADRX4mAfyww6kfp3+VZM8CciHXktbl16NiCjsWEGKO\n9ta/MrnLMKeBjLvLB5Pfpn3PsIk0bd7m+L7Sjra/
tdOPjFzUULkwMG4gVwx1cSvCPg1VgqeZ\nGtG7d8Ss0F6wqsT9SpnHpmvLDDthifZnL+5JyYkHOHi7B6u6i330jxkiLUU3Fd4MCfMd677/\nqOUST/ksnmmaVHaFZiFTuPV/mppgasGU4p6f9oHI
4gA0fWyuLqhuyZ7NxPf86WkDyhvw87dD\ndr5AEZvkh9vz8TYlZsljX/u8cBihMQfqNbHDWwMN039l19v3V9mUnvNjQNdkVia8UegRqJX/\nxKCzLfDOQCBatAimjhPPB88jZAiIwzPepujfQSEMTe+S/
Qbynm5HHgHZ8U5CSGdW9ggyZ3uT\nrkiz9tJRKvSNukeNhNrOh0D85AWMj4bs0umOZnxD6CeudMQFkRLMj/84rMVq0yCtVshpdYey\ne1WK+aX22+4byV4vw0KhXxEhTJ3/TAJT007ypKMNbndL6b1vOcJH+
LoRnqePNkKQuTay7sQj\n7fpqgAZEiqAwjn3AUxdYECUL8/i4st/POUJ6NhzE6mCawLWy0mc4iJDMrlPqgC5LBSk9R6QG\n3ArZzv6aELkgl0FUhqfFXeWVWmPpJCBHhYblUt2aI4kkun4ZQWJ0yBF3bB4Xdh
AgnjBI+ZHQ\nVUgJcVrv1gt5oXVafAbjgLeeVfKLGivTkP+48cREwYZT9ydbhpCL6XB706a7ODFKlWG0g/K9\ntxDqYaKNbjN32t602Gi2q2S2ym2zb2LOb6c5l9UFgyH5GJqpUGiRHF4C5ZMRC+52ueK0ax52
\nDnvkMGa74Ovv9PTvlPudIyKeiCKQEv+Ls5hpsmbokfEuiKsTsetAgLf+perHLI3jAFDWi8iX\n0V4dPGf2Et0/tJeNNOZwltSgu+zZEZs4v230m1OAR3313z4ChYF4oxwzHmcthzwIWUafsV0g\nN4BGJ1DZghY
3RJBn/EzT6RgQL3BjWTMeKUa9eClWwaEYSnmjejkxnbl57ki1eegFZOW6XAcd\nTS5ixcEsjzV+UezU6ZDXjn2sNY9vqlgIUK0lG0z524RAWiaHffNGit4TNm+f98qKDx/dGDzJ\nN9M+5zfalGZyhh+FU4o1ltz
VQcz+hWb0FEUZMJ+3exgvR8qPHYWmDJ0MiqIypX33zolxG7jh\nEsEK0q+rZYYPPwTq/zSWollXARkocA/9R0DVpjoxPM5pRqPEnxTKMnXRAAh9F+QiTw3bBpvj\nRIFN9436ufX4qr/A+lIw+LxTe00HYJU5W/
HxH60UcFFQTJRjZuN7lNWysy0eWXoahq1L1evo\nCLNIRFkfi+LMVJDI60aW3/6M/z030W+04Rdp4nKwN126jIGh99sW7F+aSmRCXS22i/Gcu95D\nctDLgJ9kWtRYMtZ9g5Qe3VZo4XLHV6zFiTCz5pY3FEsf
K1szuX0cqXL+etQkIhWoNtaYrLP3\nrbITXhgs8eZA77upuEsato2z0+LjDUgKKIolzReQ/AWUFgYv8ddop18vY4xMcjZCKVJStekR\nJ7/mPeqOpbecVeO2MCgGvtjCNvOiYyH7nP+oWgaZLAesFseDOtsZpIkdk
Nhc44ho0wLKmLC4\nqbywV4M/eBvGOtao5GvrQi4ZrnBp3Yt4b4SfdKkwgj2JhoYGy1BtPgbbojV3ez78AyqufGsq\nKDjlgLf0tPL5quT1BIjW4s30AJx0duhr05vc0GcuEMbnRqWRIXSYPQfwh8uWDgxh8/gCcmAI\
n16NMz61ncxlY3bm5vze8nICmLs8N54nWxTsX6AazLiXZ3CdJuOec+koy8zvqkHRfOBdj6Wv3\nSmWuhxF5HC3/kpTHho2i1uSgQrO/Su4B8OlmLSPhlcjjyZp0ttuFcXGDe9oSUQIVyk52QsT8\nhq31oLojjdtB6/
X+FAiyY76MIJwjYU4IIBe9h3zh41YExrzlPt/LHLYw7jVNI/ckdo+Xg6ii\nC1UKwe3nL+yTWbawwZueyFv1VPMX/iGrF9rd7/2m7wz7g61V9Ke81GtLFdPNwzRn/+cNppjy\nTFGYdYBk6StJVO1CRJpwcGPPhIadx
9Pvo5StPTH71EKq2jpRd2t+Ie/54l0vIft40BHX43oe\n3hav2OujEPNecrn0bnpBQZwVu0A35vZqcwymK7Yfezaq1fLSoFvkRLsMNwIle5Sqh9Oxz39M\nhFQaMj0GewfjOX35eHQmOu1YTL5lt7Sfdkg5HUa4a4tN
3WcM1wqWVboQ4OqvnA4Fc1tSog0W\n3RfNFpyvxU8QQMIbcRrKtnerd6TxwJCbvXXHIlKwodnPv7PYDPwT5AKK9EndZ1ifmyw="
   vpc-di day0 user-key-pub "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDSLkJJBNE9OE/FJXrz5a0IX9SeY61POky7/1moJ4/kHbQsFU7auvw+VIpFasfuQxw
JgYSosoPa5vnXMsRwYyQV589xBT4FgG3/ojtvWH+esWITCqYWz+4B1zhw1LH6bI98F5B6DK7/Glpi4/5p8uVKSNEBrseIBO2+X2B5Q0G08ZsCrceCIqW1Gn5noSYR96ogQytY0o
DcasUDyUdvXPl4Doic5lRLsy6KmpvNY0jT7flBm0tf4vaM0tkTPxrfPfjJ7geMdfsZvO4oZhuW0Xk7UldSmk4Rnaoi8exgTIqj
Fkw7FUnrrpNjITGl0DIwPdbLgmEsWWS4OJyUhF08sSU7nn3T1XyGNJwvun/9zy/uyUp99+/D++wGf0SVQ+dyFt3YO4xiB/ZRBWg6j1APlFYv/+
6vUh0cvtRLARMavo42uXZQ+JW+t2kMvZbFRCB59QOTIVAfJONHR+4fttVHNxBgTadSr6zlWOKms9WtbpCtnIDl8HVQX5QU8GSbsBABgME= user@user-dev"
   type cf
  exit
  initial-boot netplan vlans vlan1006
   addresses [ 209.165.201.1/24 ]
  exit
  initial-boot netplan vlans vlan522
   addresses [ 209.165.201.4/25 ]
  exit
  os additional-ssh-ips [ 209.165.201.4 ]
 exit

Example:

Sample SF configuration

SMI Cluster Deployer# show running-config clusters vpc-di-xxx nodes node1
clusters vpc-di-xxx
 nodes node1
  ssh-ip 209.165.201.1
  type   kvm
  vms sf1
   vpc-di vm-sizing disk-size disk-root-gb 20
   vpc-di boot-parameters "CARDSLOT=3\nDI_INTERNAL_VLANID=200\n"
   vpc-di networking service-interface-numbers 4
   vpc-di networking di-net-pf-list pf-name ens1f0
  exit
  vpc-di networking di-net-pf-list pf-name ens3f0
  exit
  vpc-di networking service-net-pf-list pf-name ens1f1
  exit
  vpc-di networking service-net-pf-list pf-name ens3f1
 exit
exit

Step 3

Expose the VM sizing parameters, such as VM NUMA, CPU count, cpuset, memory, and disk size (hdd1 and hdd2).

Example:

SMI Cluster Deployer# show running-config clusters vpc-di-xxx nodes node1 vms cf1 vpc-di vm-sizing
clusters vpc-di-xxx
 nodes node1
  vms cf1
   vpc-di vm-sizing disk-size disk-root-gb 20
   vpc-di vm-sizing disk-size disk-data-gb 160
   vpc-di vm-sizing numa mode strict
   vpc-di vm-sizing numa nodeset 0-1
   vpc-di vm-sizing cpu count 62
   vpc-di vm-sizing cpu cpuset 2-32, 34-64
   vpc-di vm-sizing cpu emulatorpin 0,32
   vpc-di vm-sizing cpu placement static
   vpc-di vm-sizing memory size 100
  exit
 exit
exit

Use the provided configuration for VM deployment and ignore the VM flavor.

Step 4

Configure the number of VPC-DI VMs. Typically, a maximum of four VMs can be configured per KVM node.

Note

In 2025.02.1 release, you can configure only one VM per node.

Step 5

Disable hyper threading in the host profile.

Step 6

Perform the KVM VM lifecycle management similar to UPF VM.

Step 7

Trigger the cluster synchronization operation.

Example:

SMI Cluster Manager# clusters vpc-di-xxx actions sync run

Life Cycle Management of the Virtual Machines

The SMI Cluster Manager supports the complete Life Cycle Management (LCM) of the deployed VMs, for example, UPF, VPC-DI control function (CF), and service function (SF). The different processes in the LCM of a VM includes:

VM Provisioning
VM Removal
VM Recovery
VM Recovery Configuration
Adding a New Node
Removing an Existing Node
Server Return Material Authorizations (RMAs)

The subsequent sections describe all the processes involved in LCM of the VM.

Provisioning the Virtual Machine

Provisioning a Virtual Machine involves creating and configuring a VM based on the requirements. In Bare Metal deployments, the VM is provisioned using the KVM technology. For more details on provisioning the VM using KVM, see Installing the Kernel Based Virtual Machine and User Plane VM section.

Removing the Virtual Machine

You can remove the VM through the KVM cluster from the ConfD CLI. To remove the VM, use the following configuration:

Login to the ConfD CLI.

Remove the VM using the following configuration:

cluster <kvm_cm_name> 
nodes <node_name> 
no vms <vm_name> 
commit

Run the following command to trigger the cluster synchronization and remove the VM from cluster.
```
clusters cluster_name actions sync run debug true 
```

NOTES:

cluster <kvm_cm_name> - Specifies the KVM cluster.
nodes <node_name> - Specifies the VM node.
no vms <vm_name> - Removes the specified VM.

Recovering the Virtual Machine

The staros_monitor service is responsible for monitoring the deployed VMs. You can verify the status of the systemd service using the following command:

systemctl status staros_monitor

The staros_monitor service restarts the VM in the following conditions:

When the VM is stopped.
When the VM is running but not responding to the ping command.

Note

For troubleshooting any issues, you can view the logs captured in the /var/log/staros_monitor.log file. For UPF VM, you can configure the failure occurrence and Ping interval through /data/upf/monitor/staros_monitor.cfg file. You must restart the staros_monitor service for the changes to take effect.

Configuring the Virtual Machine Recovery

To monitor the VMs locally, a bridge called kvm-monitoring is created locally. You can configure the local and VNFM IPs of the VM using the following command:

node-defaults kvm monitoring local-ip-address-range <local_IP_address_range>

The following are the IP assignments when you configure the local IP address range to 192.168.0.0/24:

Local Bridge: 192.168.0.1
VM assigned to NUMA node 0: 192.168.0.2
VM assigned to NUMA node 1: 192.168.0.3. The default value of local-ip-address-range is 192.168.1.0/24.

NOTES:

kvm monitoring - Specifies the locally created bridge for montioring the VMs.
<local_IP_address_range> - Specifies the local IP address range. The default value is 192.168.1.0/24

Adding and Removing KVM Nodes

You can add a new KVM node to the existing cluster when required. To add a new KVM node, see Installing the Kernel Based Virtual Machine and User Plane VM section.

Similarly, you can remove an existing KVM node when required. To remove an existing node, see Removing the Virtual Machine section.

Server RMAs

For Return Material Authorization (RMA) servers, you can replace the old node and replace it with a new one. When the hardware is fixed, you can synchronize the cluster and deploy the VM. For more information on replacing the Bare Metal nodes, see SMI Cluster RMA section.

UCS Firmware Upgrade

The firmware upgrade process runs on the UCS server's Cisco Integrated Management Controller (CIMC). After triggering the upgrade, a synchronization process connects to the CIMC at regular (pre-configured) intervals to verify the progress of the upgrade. The upgrade process continues to run even if there is a loss of connectivity between the Cluster Manager running the synchronization process and UCS firmware upgrade.

The UCS firmware upgrade process involves:

The Cluster synchronization is triggered with the appropriate firmware version added to the configuration.
The Cluster manager (CM) authenticates with the CIMC.

The CM compares the current running version of the firmware with the firmware version inside the ISO image.

Note

The CM proceeds with the upgrade if there is a discrepancy between the versions. Otherwise, the CM skips the upgrade process.

The CIMC receives an API call to begin the firmware upgrade from the CM. Also, the API call includes the link for the ISO image the CIMC server needs to download. The ISO image is located in the CM.

The CIMC triggers the server reboot and starts downloading the ISO image.

Note

The firmware upgrade aborts if there is no access to the ISO image. Consequently, the server boots up and the sync process fails.

The CM starts verifying the status of the firmware (for a maximum time period of 120 minutes) with the CIMC.
The CIMC successfully downloads the ISO image and mounts it as media on the server. The server boots from the media and starts the firmware upgrade process.

The server restarts multiple times before finishing with the upgrade process. The CIMC upgrades and disconnects access to any Graphical User Interface (GUI).

Note

The sync process waits for the CIMC to come online. It reconnects and re-authenticates with the CIMC.

The firmware upgrade process completes and the server reboots. The server boots from the media again and firmware validation beings. If successful the firmware process completes and the server boots. If a failure is identified the firmware upgrade is marked as failed and the sync process fails as well

Note

If the validation is successful the firmware upgrade process ends and boots the server. If the validation fails, the failure is identified and the firmware upgrade and sync process is marked as a failure.

The CM sync process receives a success answer from the CIMC. The sync process continues to next step.

Updating the UCS Server Firmware

The SMI Cluster Manager allows updating the UCS server firmware remotely or locally in a data center. This section describes the procedures involved in updating the UCS Server firmware using the SMI Cluster Manager.

To update the UCS firmware, use the following configurations:

Setup the UCS configuration.

The following is a sample UCS configuration:

software ucs software_version 
 url repo_url 
 sha512 file_checksum 
exit 
software ucs software_version 
 url repo_url 
 sha512 ssh_key 
exit 
software upf software_version 
 url repo_url 
 sha256 ssh_key 
exit 
environments ucs 
 ucs-server 
exit 
... 
 nodes node_name 
  ssh-ip ipv4address 
  type kvm 
  vms node_name 
   upf networking management ip ipv4address 
  exit 
  ucs-server cimc ip-address ipv4address 
  ucs-server software software_version 
  initial-boot netplan vlans vlan_name 
   addresses ipv4address/subnet 
  exit 
 exit 
 nodes node_name 
  ssh-ip ipv4address 
  type kvm 
  vms node_name 
   upf networking management ip ipv4address 
  exit 
  ucs-server cimc ip-address ipv4address 
  initial-boot netplan vlans vlan_name 
   addresses ipv4address/subnet 
  exit 
 exit

Note

Cisco Integrated Management Controller (CIMC) manages the life cycle of all the components - including firmware updates - in UCS C Series servers. The UCS firmware is updated through the CIMC API.

Log in to SMI Cluster Manager CLI and enter the configuration mode.

Add the UCS firmware configuration.

Note

A sample UCS configuration is provided here.

Commit and exit the configuration.

Configure the UCS firmware.

To configure the firmware version for UCS servers inside a cluster, use the following configuration:
```
configure 
  node-defaults 
    ucs-server software software_version 
    exit 
```
To configure the firmware version for UCS servers at the node level, use the following configuration:
```
configure 
  nodes node_name 
    ucs-server software software_version 
    exit 
```

Note

All the UCS firmware configuration parameters are applicable only for UCS based environments. You can add multiple versions of the firmware software under the software section in the UCS configuration. It is recommended to have a minimum of two versions (new and old) provisioned. You can download the UCS firmware software from https://software.cisco.com/

Trigger the cluster synchronization, using the following command.

clusters cluster_name actions sync run debug true

Example:

SMI-CM# clusters test actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes

Monitor the progress of the cluster synchronization, using the following command.

monitor sync-logs cluster_name

Note

The cluster synchronization process time depends on the number of UCS servers planned for an upgrade. The average time taken for updating a single UCS server ranges from 30-50 minutes.
You can select the upgrade strategy manually using the following command:
```
clusters cluster_name actions sync run upgrade-strategy { rolling | concurrent } 
```

NOTES:

node-defaults ucs-server software software_version - Specifies the default version of UCS firmware to be applied to all UCS servers inside a cluster.
nodes node_name ucs-server software software_version - Specifies the version of UCS firmware to be applied to all UCS servers at the node level.
clusters cluster_name - Specifies the name of the cluster.
actions sync run - Triggers the cluster synchronization
upgrade-strategy - Specifies the upgrade strategy. rolling specifies the rolling upgrade strategy. concurrent specifies the concurrent upgrade strategy. The default upgrade strategy is auto.

Failure Scenarios

The UCS firmware upgrade process may fail for multiple reasons. The following are some of the most common reasons:

Hardware Issues - Faulty components, power, and cabling issues. It most often requires Return Merchandise Authorization (RMA), component re-seating or visual inspection of the hardware.
Software Failure - This happens when certain components fail to apply the new software version. These components will skip the upgrade process and fall back to the previous or backup version.
Unsupported Hardware - The upgrade fails, if the new firmware does not support some of the components.
Software Misconfiguration - If the inappropriate firmware package is used or an attempt to upgrade to an incorrect or unsupported firmware version, the upgrade process fails.
Timeout - If the upgrade takes more than 2 hours to complete, the sync process time outs and fails.

Note

The CIMC performs a list of prerequisites before proceeding with the firmware installation. When all the prerequisites are successful, the CIMC proceeds with the upgrade. There is no dry-run option available for the firmware upgrade.

Failure Impact

The impact of the firmware upgrade failure falls into the following three categories:

Non-Service Impacting - The server continues to work. This is the most common behavior. The reason for the failure are software misconfiguration, software failure or a failed prerequisites verification. In such a scenario, it’s recommended to troubleshoot the failure, fix the problem and upgrade firmware again either automatically (using the same sync process) or manually. A restart of the CIMC may also reset and unblock the issue.
Partially Service Impacting – The server continues to work with some performance, connectivity or stability issues. An unsupported hardware, incorrect firmware version, or an issue introduced with the new firmware may have caused this. In such a scenario, it’s recommended to perform a BUG scrub or validation to ensure that the OS version, drivers, firmware and hardware combination is officially supported by Cisco. You can perform the validation here:

https://ucshcltool.cloudapps.cisco.com/public/
Service Impacting – The server fails to boot or becomes inaccessible. If the firmware upgrade failure affects the CIMC access, then the server either must be sent for a RMA or physical access may be required. A hardware issue or a fatal error (needs further investigation by TAC team) might have caused this.

Note

When you are unable to identify the root cause, it is recommended to open a TAC case and involve a UCS expert to help further troubleshoot the issue.

Troubleshooting

If one or multiple servers fail to upgrade the firmware, you must follow these steps to isolate and identify the origin or root cause of the issue:

Verify the output of the sync process. It contains the high-level details of the root cause and exact step where of the firmware upgrade failed.
You can also use the CLI for the status of the last firmware for a node or multiple nodes in a cluster.
Based on the error, the root cause may have caused by either of the following reasons:
1. An incorrect UCS CIMC password was provided
2. An incorrect ISO was provided or the ISO was corrupt
3. The CIMC failed to download the ISO image. A firewall, security-group, or Cluster-Manager (based on the deployment) may have caused this issue
4. The CIMC failed to trigger, run, or validate the firmware upgrade
Access the CIMC server and verify the server console, SEL and alerts or issues for more details (if required).
Based on the sync logs and the CIMC data, identify the problem and its resolution. If required, perform a CIMC reboot
If you can identify the error and resolve the issue, re-run the upgrade either automatically (through sync process) or manually. You must have the ISO image to perform this operation
Open a TAC case and involve an UCS expert, if you are unable to find the root cause or resolve the issue.

UCS Server Health Check

SMI allows you to perform extended heath check and monitoring of the deployed UCS servers. The method is capable of communicating with supported UCS C-series servers.

The server health check is a steadfast method to check and report the server health before executing a cluster-sync operation. This check reduces the disruptions caused by cluster-sync failures.

The existing health reporting system within the REST (Redfish) API of the UCS servers is leveraged to obtain information from multiple hardware platforms.

The sync-log reports the following information:

TASK section displays the following details:
- Host information—Hostname, Serial Number, Model, Firmware version, HTTPS Certificate validity, and Current Faults
- Summary information—Processors, Memory, Fans, Storage, PCI Devices, Power Supplies, and MLOM
Current System Faults section lists the faults present on the server at sync time.

The format of the fault entries is "Date Raised Severity (fault identifier number) Component: Description: What action is required". Note that all faults have an action.

For example: "2024-03-06 12:11:21 Critical (F0185) DDR4_P1_E1_ECC: DIMM 9 is inoperable : Check or replace DIMM"
Checking for recent POST events section in the log helps with debugging. The System Event Log (SEL) is used to display host reboots and uptimes.

The POST Time column displays the time taken for the host to complete the Power On Self Test (POST). The short uptimes indicate host reboot loops, and longer uptimes indicate firmware upgrades or hardware component replacements.
System Warnings section accumulates the data returned from the server.

The warnings provide additional information on faults or other issues. It will also contain the current firmware recommendations to use. If the versions do not match, it will indicate future upgrades and does not necessarily require immediate action.
Warnings in the following scenarios:
- Any health check such as thermal health and memory check that is not OK
- Mismatched disk sizes
- HDD found when SSD is expected
- Power Policy or Fan Policy is not as expected
- The following warnings are displayed for firmware checks on M5 servers only:
  - UCS team recommends firmware versions of at least 4.3.2.230270 for M5 servers
  - If the firmware version is less than or equal to 4.1.2: Firmware version 4.0(4h) is not supported
  - If potential compatibility issues are identified with the firmware version: Firmware Version: 4.0(4h) - Potential compatibility issue
Failures in the following scenarios when:
- The server reports Critical health status
- The CIMC reports Critical health status
- The Host is powered off
- No storage controllers are found
- No drives are found
- Any DIMM failure
- The server cannot be reached for reasons such as bad IP/credentials and so on

Overriding Health Check Sync Failure

The health check is performed prior to the software download tasks to provide a quick exit if there are actionable issues. The check is non-intrusive and you can perform only read operations.

If you want to proceed with a sync knowing there are health issues with the server, you can execute the following CLI command to override the check failure:

clusters cluster_name nodes node_name ucs-server ignore-health { false | true }

If ignore-health is configured, then cluster sync may fail.

Configuring Server-Check Synchronization Phase

SMI supports an additional server-check synchronization phase to review the condition of the servers anytime.

To execute the server-check sync phase, use the following command in Cluster Configuration mode:

clusters cluster_name actions sync run sync-phase server-check

To view the sync-log report, use the following command:

monitor sync-logs cluster_name

SMI Cluster RMA

This section describes procedures involved in replacing a Bare Metal node within a SMI Cluster in the event of node failure or planned maintenance.

The NSO must subscribe to the notification stream alert notification to receive the k8-node-not-ready alert, when the node(s) become unresponsive due to unplanned node(s) outage, and requires it to be removed from the cluster before putting it in maintenance mode and continue. See the Notifications and Alerts Reference section for more details on the alert notifications and references.

Important

CNDP does not support cluster upgrade and RMA in the same cluster sync.

Prerequisites for RMA Process

For GR deployment, the node-monitor pods starts automatically. During RMA procedure, the node-monitor pod automatically shutdown the rack if multi-compute failure is detected when the node is drain and deleted.

Before starting RMA process, perform the following:

Switch the role for both the instance to other rack using geo switch-role role command and make sure the target rack for RMA is in STANDBY_ERROR role for both the instances.
Disable the node-monitor pod.
1. Take the backup of daemonsets.
  
  kubectl get daemonsets node-monitor -n cn -o yaml > node-monitor.yaml
2. Delete node-monitor pods.
  
  kubectl delete daemonsets node-monitor -n cn
Continue with RMA procedure. For more information, see the link.

Once RMA procedure is complete, check if the node-monitor pods are already spawned.

kubectl get pods -n cn -o wide | grep node-monitor

If the node-monitor pods have not started, restart them.

kubectl create -f node-monitor.yaml

Note

node-monitor.yaml file is same as in Step 2.a.

Correct the role for the instances accordingly.

Note

For both earlier and current SMI versions:

If you are replacing hardware components during an RMA procedure that contain firmware, such as an mLOM card, before adding the repaired or replaced node back to the cluster, you must run the HUU (Host Upgrade Utility) to ensure that the component is compatible with the system before syncing the node back into service.
As part of RMA, if you remove a node from the cluster and before you return it to the manufacturer, you must purge all data on the device as per instructions provided by the hardware vendor.

Unified RMA Procedure for Planned and Failure Events on Bare Metal

For the SMI version 2020.02.2.41 and later, the RMA (Return Merchandise Authorization) procedure for planned maintenance and any unplanned node failure events is unified for the SMI Bare Metal stacked cluster. The same procedure is applicable for both primary and worker nodes. This feature simplifies the automation and MOP requirements for both NSO and user.

Note

For information on the RMA procedures for all previous versions of SMI Bare Metal, refer to the following sections in this guide:

Control Plane Bare Metal Node Failure - Unplanned
Control Plane Bare Metal Node Maintenance - Planned

Unified RMA Procedure for the Control Plane and Worker Nodes

This section describes the unified RMA procedure applicable to the following scenarios:

Replace a working control plane Bare Metal node or worker node for maintenance.
Replace a failed Bare Metal control plane node or worker node in a stacked cluster.

Notes:

If you are performing RMA for a control plane node, you must ensure that the majority of control plane nodes are still available during the RMA process.
Disable auto-sync before you perform the RMA procedure.

Use the following steps to replace a working or failed control plane or worker node:

Drain and remove the node which is sent for maintenance, using the following command:

clusters cluster_name nodesnode_name actions sync drain remove-node true

Example:

SMI Cluster Deployer# clusters kali-stacked nodes controlplane1 actions sync drain remove-node true                  
This will run drain on the node, disrupting pods running on the node.  Are you sure? [no,yes] yes
message accepted

For a planned maintenance scenario, shutdown the node if the node is still running.

Assign the node to maintenance mode in the cluster configuration using the following CLI commands:

config 
  clusters cluster_name 
     nodes node_name 
        maintenance true 
     commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
SMI Cluster Deployer(config-nodes-controlplane1)# maintenance true 
SMI Cluster Deployer(config-nodes-controlplane1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-controlplane1)# end
SMI Cluster Deployer#

The node is ready for the RMA process.

Note

If the remaining nodes must be upgraded or updated, run a cluster sync in this state. However, it's not a part of the RMA process.

Add the node back to the cluster when it is repaired or replaced and available.

Note

If the remaining nodes have been upgraded to a new SMI release during the time when this node was under maintenance, then it's recommended to clear the boot drive and delete the virtual drive on the node. This step ensures that virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.

Attach the new Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:

config 
  clusters cluster_name 
     nodes node_name 
        maintenance false 
     commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
SMI Cluster Deployer(config-nodes-controlplane1)# maintenance false 
SMI Cluster Deployer(config-nodes-controlplane1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-controlplane1)# end
SMI Cluster Deployer#

Run the cluster synchronization using the following command:

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters kali-stacked actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Monitor the cluster synchronization using the following command:

monitor sync-logs cluster_name

Example:

SMI Cluster Deployer# monitor sync-logs kali-stacked 
2020-09-30 01:50:02.159 DEBUG cluster_sync.kali-stacked: Cluster name: kali-stacked 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: Force VM Redeploy: false 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: Force partition Redeploy: false 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: reset_k8s_nodes: false 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: purge_data_disks: false 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: upgrade_strategy: auto 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: sync_phase: all 
2020-09-30 01:50:02.160 DEBUG cluster_sync.kali-stacked: debug: true 
.
.
.
.
2020-09-30 01:53:27.638 DEBUG cluster_sync.kali-stacked: Cluster sync successful 
2020-09-30 01:53:27.638 DEBUG cluster_sync.kali-stacked: Ansible sync done 
2020-09-30 01:53:27.638 INFO cluster_sync.kali-stacked: _sync finished.  Opening lock

To verify the status of the cluster, use the following command:

clusterscluster_name actions k8s cluster-status

Example:

pods-desired-count 99
pods-ready-count 99
pods-desired-are-ready true
etcd-healthy true
all-ok true

NOTES:

clusters cluster_name - Specifies the K8s cluster.
nodes node_name - Specifies the control plane Bare Metal node.
maintenance true/false - Assigns or removes the primary control plane Bare Metal mode to maintenance mode.
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

SMI Bare Metal Stacked Cluster

In SMI Bare Metal deployment, a stacked cluster is defined as a cluster containing three Bare Metal control plane nodes with a primary control plane, OAM, and other applications hosted on the same three nodes. A healthy stacked cluster needs at least two working control plane nodes to perform all the functions. When the two control plane nodes fail, the entire cluster fails and you must redeploy the entire cluster. You can run the K8s cluster synchronization from the current Active node of the Cluster Manager HA.

The subsequent sections provide information about handling the stacked cluster in the event of a Bare Metal node failure (unplanned) or planned maintenance.

Control Plane Baremetal Node Failure - Unplanned

This section describes the procedure to replace a failed primary Bare Metal control plane 1 node in a stacked cluster. When the primary control plane 1 Bare Metal node fails, the node status changes to NotReady. You can use the following command to view the status of the nodes in the cluster:

kubectl get nodes

In the following example, the status of the primary control plane 1 node changes to NotReady after it fails.

user1-cloud@kali-stacked-control-plane:~$ kubectl get nodes
NAME                         STATUS     ROLES           AGE    VERSION
kali-stacked-control-plane1   NotReady   control-plane   136m   v1.21.0
kali-stacked-control-plane2   Ready      control-plane   10h    v1.21.0
kali-stacked-control-plane3   Ready      control-plane   10h    v1.21.0

All the pods in the failed primary control plane 1 Bare Metal node remains either terminated or in pending state. You verify the status of the pods using the kubectl get pods command as shown in the following example:

user1-cloud@kali-stacked-controlplane3:~$ kubectl get pods -A
NAMESPACE           NAME                                             READY   STATUS    RESTARTS   AGE
kube-system         calico-kube-controllers-5d7fff4bc6-lxkpc         1/1     Running   0          7h26m
kube-system         calico-node-tx7zg                                1/1     Running   0          10h
kube-system         calico-node-v6m7v                                1/1     Running   0          10h
kube-system         coredns-66d57f55d9-6dnsn                         1/1     Running   0          136m
kube-system         coredns-66d57f55d9-rdtbd                         1/1     Running   0          136m
kube-system         etcd-kali-stacked-controlplane2                  1/1     Running   0          10h
kube-system         etcd-kali-stacked-controlplane3                  1/1     Running   0          10h
kube-system         kube-apiserver-kali-stacked-controlplane2        1/1     Running   0          10h
kube-system         kube-apiserver-kali-stacked-controlplane3        1/1     Running   0          10h

To replace the failed primary control plane 1 Bare Metal node:

Delete the failed primary control plane 1 Bare Metal node using the following command:

kubectl delete node node_name

Example:

user1-cloud@kali-stacked-controlplane3:~$ kubectl delete node kali-stacked-controlplane1
node "kali-stacked-controlplane1" deleted

Assign the primary control plane 1 Bare Metal node to maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes controlplane1 
  maintenance true 
  commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
SMI Cluster Deployer(config-nodes-controlplane1)# maintenance true 
SMI Cluster Deployer(config-nodes-controlplane1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-controlplane1)# end
SMI Cluster Deployer#

The node is ready for the RMA process.

Note

If the remaining nodes need to be upgraded or NFs need to be synchronized, run a cluster sync in this state. However, it's not a part of the RMA process.

Add the node back to the cluster when it is repaired or replaced and available.

Note

If you add a node after it's repaired, ensure that the disks are clean by clearing the boot drive and virtual drive on the node. This step is to ensure that the virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.

Attach the new primary control plane 1 Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes controlplane1 
  maintenance false 
  commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
SMI Cluster Deployer(config-nodes-controlplane1)# maintenance false 
SMI Cluster Deployer(config-nodes-controlplane1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-controlplane1)# end
SMI Cluster Deployer#

Run the cluster synchronization using the following command:

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters kali-stacked actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Verify the status of the cluster using the following command:

clusters cluster_name actions k8s cluster-status

Example:

SMI Cluster Deployer# clusters kali-stacked actions k8s cluster-status     
pods-desired-count 40
pods-ready-count 39
pods-desired-are-ready true
etcd-healthy true
all-ok true

NOTES:

clusters cluster_name - Specifies the K8s cluster.
nodes controlplane1 - Specifies primary control plane 1 Bare Metal node.
maintenance true/false - Assigns or removes the primary control plane 1 Bare Metal mode to maintenance mode
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

Control Plane Bare Metal Node Maintenance - Planned

This section describes the procedures involved in replacing a working primary control plane Bare Metal node for maintenance. To replace a primary control plane Bare Metal node for maintenance, use the following:

Drain and remove the control plane node which is sent for maintenance, using the following command:

clusters cluster_name nodes controlplane1 actions sync drain remove-node true

Example:

SMI Cluster Deployer# clusters kali-stacked nodes controlplane1 actions sync drain remove-node true                  
This will run drain on the node, disrupting pods running on the node.  Are you sure? [no,yes] yes
message accepted

Verify the status of the control plane node using the following command:

clusters cluster_name nodes controlplane1 actions sync logs

Example:

SMI Cluster Deployer# clusters kali-stacked nodes controlplane1 actions sync logs
logs 2020-09-30 01:46:14.498 DEBUG cluster_sync.kali-stacked.controlplane1: Cluster name: kali-stacked 
2020-09-30 01:46:14.498 DEBUG cluster_sync.kali-stacked.controlplane1: Node name: controlplane1 
2020-09-30 01:46:14.499 DEBUG cluster_sync.kali-stacked.controlplane1: debug: false 
2020-09-30 01:46:14.499 DEBUG cluster_sync.kali-stacked.controlplane1: remove_node: true
.
.
.
.
2020-09-30 01:46:53.028 DEBUG cluster_sync.kali-stacked.controlplane1: Cluster sync successful 
2020-09-30 01:46:53.029 DEBUG cluster_sync.kali-stacked.controlplane1: Ansible sync done 
2020-09-30 01:46:53.029 INFO cluster_sync.kali-stacked.controlplane1: _sync finished.  Opening lock

Shutdown the node.

Assign the primary control plane Bare Metal node to maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes controlplane1 
  maintenance true 
  commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
SMI Cluster Deployer(config-nodes-controlplane1)# maintenance true 
SMI Cluster Deployer(config-nodes-controlplane1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-controlplane1)# end
SMI Cluster Deployer#

The node is ready for the RMA process.

Note

If the remaining nodes need to be upgraded or NFs need to be synchronized, run a cluster sync in this state. However, it's not a part of the RMA process.

Add the node back to the cluster when it is repaired or replaced and available.

Note

If you add a node after it's repaired, ensure that the disks are clean by clearing the boot drive and virtual drive on the node. This step is to ensure that the virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.

Attach the new primary control plane Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes controlplane1 
  maintenance false 
  commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes controlplane1 
SMI Cluster Deployer(config-nodes-controlplane1)# maintenance false 
SMI Cluster Deployer(config-nodes-controlplane1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-controlplane1)# end
SMI Cluster Deployer#

Run the cluster synchronization using the following command:

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters kali-stacked actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

NOTES:

clusters cluster_name - Specifies the K8s cluster.
nodes controlplane1 - Specifies primary control plane 1 Bare Metal node.
maintenance true/false - Assigns or removes the primary control plane 1 Bare Metal mode to maintenance mode
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

Stacked Cluster with Worker Nodes

For scaling the NFs, you can add a number of worker nodes along with the stacked control plane nodes. After adding the worker nodes to the stacked control plane nodes, you can distribute the NFs to the worker nodes.

This section describes procedures involved in replacing a Bare Metal worker node within a SMI Cluster in the event of node failure or planned maintenance.

Worker Bare Metal Node Failure - Unplanned

This section describes the procedure to replace a failed primary Bare Metal worker node in a stacked cluster. When the primary worker Bare Metal node fails, the node status changes to NotReady.

Verify the status of the nodes in the cluster using the following command:

kubectl get nodes

In the following example, the status of the primary worker node changes to NotReady after it fails.

user1-cloud@kali-stacked-controlplane1:~$ kubectl get nodes
NAME                        STATUS     ROLES           AGE    VERSION
kali-stacked-cmts-worker1   NotReady   <none>          38m    v1.21.0
kali-stacked-cmts-worker2   Ready      <none>          38m    v1.21.0
kali-stacked-cmts-worker3   Ready      <none>          38m    v1.21.0
kali-stacked-controlplane1  Ready      control-plane   125m   v1.21.0
kali-stacked-controlplane2  Ready      control-plane   13h    v1.21.0
kali-stacked-controlplane3  Ready      control-plane   13h    v1.21.0

To replace the failed primary Bare Metal worker node in the Cluster:

Delete the failed primary worker Bare Metal node using the following command:

kubectl delete node node_name

Example:

user1-cloud@kali-stacked-controlplane3:~$ kubectl delete node kali-stacked-cmts-worker1
node "kali-stacked-cmts-worker1" deleted

Assign the primary worker Bare Metal node to maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes worker_node 
  maintenance true 
  commit 
  end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes cmts-worker1   
SMI Cluster Deployer(config-nodes-cmts-worker1)# maintenance true     
SMI Cluster Deployer(config-nodes-cmts-worker1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-cmts-worker1)# end

The node is ready for the RMA process.

Note

If the remaining nodes need to be upgraded or NFs need to be synchronized, run a cluster sync in this state. However, it's not a part of the RMA process.

Add the node back to the cluster when it is repaired or replaced and available.

Note

If you add a node after it is repaired, ensure that the disks are clean by clearing the boot drive and virtual drive on the node. This step is to ensure that the virtual drive is in a clean state without the previous state before you add it back. However, it is not necessary to remove the virtual drive for a new replacement node.

Attach the new primary worker Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
     nodes worker_node 
        maintenance false 
        commit 
        end

Example:

SMI Cluster Deployer(config)# clusters kali-stacked 
SMI Cluster Deployer(config-clusters-kali-stacked)# nodes cmts-worker1 
SMI Cluster Deployer(config-nodes-cmts-worker1)# maintenance false 
SMI Cluster Deployer(config-nodes-cmts-worker1)# commit
Commit complete.
SMI Cluster Deployer(config-nodes-cmts-worker1)# end
SMI Cluster Deployer#

Run the cluster synchronization using the following command:

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters kali-stacked actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Increase the cluster synchronization success rates by setting a threshold for worker node failures that the system can tolerate. When server failures occur during cluster synchronization, these failed nodes will be removed from the remaining sync actions to provision the rest of the cluster.

Use this command to isolate the failed worker nodes:
```
cluster cluster_name actions sync upgrade-strategy concurrent [ isolate-on-hardware-failure { true | false } node-tolerance node_tolerance_value ] 
```
- The node_tolerance_value value must be in the range of 0 to 9.
- Setting isolate-on-hardware-failure to false or node-tolerance to 0 will prevent isolation.

Verify the status of the cluster using the following command:

clusters cluster_name actions k8s cluster-status

Example:

SMI Cluster Deployer# clusters kali-stacked actions k8s cluster-status 
pods-desired-count 67
pods-ready-count 67
pods-desired-are-ready true
etcd-healthy true
all-ok true

Verify the status of the pods redeployed on the added worker node using the following command:

clusters cluster_name nodes  worker_node actions k8s pod-status show-pod-details

Example:

SMI Cluster Deployer# clusters kali-stacked nodes cmts-worker1 actions k8s pod-status show-pod-details 
Value for 'show-pod-details' [false,true]: true
pods {
    name calico-node-67gs6
    namespace kube-system
    owner-kind DaemonSet
    owner-name calico-node
    ready true
}
pods {
    name coredns-f9fd979d6-b2gsb
    namespace kube-system
    owner-kind ReplicaSet
    owner-name coredns-f9fd979d6
    ready true
}
pods {
    name kube-proxy-5m9qh
    namespace kube-system
    owner-kind DaemonSet
    owner-name kube-proxy
    ready true
}
pods {
    name maintainer-2nxlq
    namespace kube-system
    owner-kind DaemonSet
    owner-name maintainer
    ready true
}
pods {
    name charts-cee-2020-02-0-i21-4
    namespace registry
    owner-kind StatefulSet
    owner-name charts-cee-2020-02-0-i21
    ready true
}
pods {
    name charts-cluster-deployer-2020-02-0-i22-5
    namespace registry
    owner-kind StatefulSet
    owner-name charts-cluster-deployer-2020-02-0-i22
    ready true
}
pods {
    name registry-cee-2020-02-0-i21-5
    namespace registry
    owner-kind StatefulSet
    owner-name registry-cee-2020-02-0-i21
    ready true
}
pods {
    name registry-cluster-deployer-2020-02-0-i22-5
    namespace registry
    owner-kind StatefulSet
    owner-name registry-cluster-deployer-2020-02-0-i22
    ready true
}
pods {
    name software-unpacker-3
    namespace registry
    owner-kind StatefulSet
    owner-name software-unpacker
    ready true
}
pods {
    name keepalived-jrj4g
    namespace smi-vips
    owner-kind DaemonSet
    owner-name keepalived
    ready true
}
pods-count 10
pods-available-to-drain-count 6

Note

You can follow the same procedure to replace one or more failed worker nodes in the cluster.

NOTES:

clusters cluster_name - Specify the K8s cluster.
nodes worker - Specify the primary worker Bare Metal node.
maintenance true/false - Assign or remove the primary control plane 1 Bare Metal mode to maintenance mode.
actions sync run debug true - Synchronize the cluster configuration.
actions k8s cluster-status - Display the status of the cluster.

Worker Bare Metal Node Maintenance - Planned

This section describes the procedures involved in replacing a working primary worker Bare Metal node for maintenance. To replace a primary worker Bare Metal node for maintenance, use the following:

Drain and remove the worker node which is sent for maintenance, using the following command:

clusters cluster_name nodes worker_node actions sync drain remove-node true

Example:

[installer-controlplane] SMI Cluster Deployer# clusters kali-stacked nodes cmts-worker1 actions sync drain remove-node true 
This will run drain on the node, disrupting pods running on the node.  Are you sure? [no,yes] yes
message accepted

Verify the status of the worker node using the following command:

clusters cluster_name nodes worker_node actions sync logs

Example:

[installer-controlplane] SMI Cluster Deployer# clusters kali-stacked nodes cmts-worker1 actions sync logs 
logs 2020-10-06 20:01:48.023 DEBUG cluster_sync.kali-stacked.cmts-worker1: Cluster name: kali-stacked 
2020-10-06 20:01:48.024 DEBUG cluster_sync.kali-stacked.cmts-worker1: Node name: cmts-worker1 
2020-10-06 20:01:48.024 DEBUG cluster_sync.kali-stacked.cmts-worker1: debug: false 
2020-10-06 20:01:48.024 DEBUG cluster_sync.kali-stacked.cmts-worker1: remove_node: true 
.
.
.
.
2020-10-06 20:02:30.057 DEBUG cluster_sync.kali-stacked.cmts-worker1: Cluster sync successful 
2020-10-06 20:02:30.058 DEBUG cluster_sync.kali-stacked.cmts-worker1: Ansible sync done 
2020-10-06 20:02:30.058 INFO cluster_sync.kali-stacked.cmts-worker1: _sync finished.  Opening lock

Shutdown the node.

Assign the primary control plane Bare Metal node to maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes worker_node 
  maintenance true 
  commit 
  end

Example:

[installer-controlplane] SMI Cluster Deployer# config 
Entering configuration mode terminal
[installer-controlplane] SMI Cluster Deployer(config)# clusters kali-stacked 
[installer-controlplane] SMI Cluster Deployer(config-clusters-kali-stacked)# nodes cmts-worker1 
[installer-controlplane] SMI Cluster Deployer(config-nodes-cmts-worker1)# maintenance true 
[installer-controlplane] SMI Cluster Deployer(config-nodes-cmts-worker1)# commit
Commit complete.
[installer-controlplane] SMI Cluster Deployer(config-nodes-cmts-worker1)# end

The node is ready for the RMA process.

Note

If the remaining nodes need to be upgraded or NFs need to be synchronized, run a cluster sync in this state. However, it's not a part of the RMA process.

Add the node back to the cluster when it is repaired or replaced and available.

Note

If you add a node after it's repaired, ensure that the disks are clean by clearing the boot drive and virtual drive on the node. This step is to ensure that the virtual drive is in a clean state without the previous state before you add it back. However, removal of the virtual drive is not required for a new replacement node.

Attach the new primary worker Bare Metal node and remove it from the maintenance mode in the cluster configuration using the following commands:

configure 
  clusters cluster_name 
  nodes worker_node 
  maintenance false 
  commit 
  end

Example:

SMI Cluster Deployer# config 
Entering configuration mode terminal
[installer-controlplane] SMI Cluster Deployer(config)# clusters kali-stacked 
[installer-controlplane] SMI Cluster Deployer(config-clusters-kali-stacked)# nodes cmts-worker1 
[installer-controlplane] SMI Cluster Deployer(config-nodes-cmts-worker1)# maintenance false 
[installer-controlplane] SMI Cluster Deployer(config-nodes-cmts-worker1)# commit
Commit complete.
[installer-controlplane] SMI Cluster Deployer(config-nodes-cmts-worker1)# end

Run the cluster synchronization using the following command:

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters kali-stacked actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

NOTES:

clusters cluster_name - Specifies the K8s cluster.
nodes worker - Specifies primary worker Bare Metal node.
maintenance true/false - Assigns or removes the primary control plane 1 Bare Metal mode to maintenance mode
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

KVM Bare Metal Node Failure

For KVM Bare Metal node failures, you must replace the failed or faulty KVM Bare Metal node and run a cluster synchronization. When the cluster synchronizes successfully, the SMI Cluster Manager retains its database for Day 0 configuration. However, for Day 1 configuration, the SMI Cluster Manager retains its database from the Network Services Orchestrator (NSO).

Note

The SMI does not support any other data backup or restore methods currently.

Node Failures in SMI Cluster Manager HA

Unlike a typical a Kubernetes cluster, the SMI Bare Metal HA consists of two AIO Bare Metal nodes running independently but synchronized in data using an underlying Distributed Replicated Block Device (DRBD) driver. The data synchronization happens in a specific manner with restrictions. Also, the DRBD utilizes the master keepalived for switchovers. You can run the synchronization for the SMI Cluster Manager HA from the Inception Server.

The subsequent sections provide more information about replacing failed nodes in SMI Bare Metal Cluster Manager HA mode.

Active Bare Metal Node Failure - Unplanned

When an Active node fails in a Cluster Manager HA model, the Standby node becomes the Active node and continues to upgrade the remote clusters or the NFs. To replace the failed Active node:

Remove the Active node.

Note

Upgrade or synchronize the remote clusters using the Standby node.

For more information on upgrading the SMI Cluster Manager in HA mode, see Upgrading SMI Cluster Manager in HA section in UCC SMI Cluster Manager Deployment Guide.

Update the Cluster Manager HA configuration to swap the backup and control-plane node types when the Active node is up and running.

nodes active 
  k8s node-type backup #originally this would be control-plane type in the fresh deployment 
 exit 
 nodes standby 
  k8s node-type control-plane #originally this would be backup type in fresh deployment  
exit

Run the cluster synchronization.

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters trysjc-ha actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Verify the status of the cluster and pods when the Active and Standby nodes are up and running.
```
clusters cluster_name actions k8s cluster-status 
```
```
kubectl get pods -n namespace 
```
Log in to the Ops Center service and verify whether the cluster configuration and CEE data are retained.

Once the active node is back from maintenance, update the Cluster Manager HA configuration to swap the backup and control-plane node types.

nodes active 
  k8s node-type control-plane #originally this would be control-plane type in the fresh deployment 
 exit 
 nodes standby 
  k8s node-type backup #originally this would be backup type in fresh deployment  
exit

NOTES:

clusters cluster_name - Specifies the K8s cluster.
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

Active Bare Metal Maintenance - Planned

This section describes the procedures involved in replacing an Active node for maintenance. To replace a an Active node for maintenance, use the following procedure:

Shutdown the Active node.

Note

Upgrade or synchronize the remote clusters using the Standby node.

For more information on upgrading the SMI Cluster Manager in HA mode, see Upgrading SMI Cluster Manager in HA section in UCC SMI Cluster Manager Deployment Guide.

Update the Cluster Manager HA configuration to swap the backup and control plane node types when the Active node is up and running.

nodes active 
  k8s node-type backup #originally this would be control-plane type in the fresh deployment 
 exit 
 nodes standby 
  k8s node-type control-plane #originally this would be backup type in fresh deployment  
exit

Run the cluster synchronization.

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters trysjc-ha actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Verify the status of the cluster and pods when the Active and Standby nodes are up and running.
```
clusters cluster_name actions k8s cluster-status 
```
```
kubectl get pods -n namespace 
```
Log in to the Ops Center service and verify whether the cluster configuration and CEE data are retained.

NOTES:

clusters cluster_name - Specifies the K8s cluster.
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

Standby Bare Metal Node Failure or Maintenance

This section describes the procedures involved in replacing a Standby node which has either failed (unplanned) or removed for maintenance (planned). To replace a Standby node, use the following:

Shutdown the Standby node.

Note

Upgrade or synchronize the remote clusters using the Standby node.

For more information on upgrading the SMI Cluster Manager in HA mode, see Upgrading SMI Cluster Manager in HA section

in UCC SMI Cluster Manager Deployment Guide.

Add the Standby node to the same HA configuration when the Standby node is up and running.

Run the cluster synchronization.

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters trysjc-ha actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Verify the status of the cluster and pods when the Active and Standby nodes are up and running.
```
clusters cluster_name actions k8s cluster-status 
```
```
kubectl get pods -n namespace 
```
Log in to the Ops Center service and verify whether the cluster configuration and CEE data are retained.

NOTES:

clusters cluster_name - Specifies the K8s cluster.
actions sync run debug true - Synchronizes the cluster configuration.
actions k8s cluster-status - Displays the status of the cluster.

SMI Cluster Manager AIO Node Failure or Planned Maintenance

When the SMI Cluster Manager is deployed in All-In-One (AIO) mode, the CEE replicates the data across all the three nodes (control plane, worker and etcd).

Also, the SMI Cluster Manager AIO model does not support the back up and recovery of operational data. Therefore, you must ensure the following to prevent the loss of monitoring data when a double node or triple node failure occurs:

Back up the CEE configuration for the Ops Center using the standard Ops Center backup and restore functionality.
Back up the Cluster Manager configuration for the Ops Center (remote cluster configuration) using the standard Ops Center backup and restore functionality.
For planned maintenance or unplanned outage in the SMI AIO mode, you can perform a cluster synchronization and restore the backed-up configuration.
Export the metrics (KPIs) from bulk stats regularly.
Gather Alerts in an external system.

Disaster Recovery

This section describes the procedure to redeploy the SMI Cluster Manager HA, when both the Active and Standby nodes fail or when the entire site or cluster is down.

Active and Standby Node Failure - Unplanned

When both the Active and Standby node fails in a SMI Cluster Manager HA model, you can redeploy the SMI Cluster Manager in HA mode using the Inception Server. To redeploy the SMI Cluster Manager HA using the Inception Server, use the following procedure:

Log in to the Inception Server CLI and perform a cluster synchronization.

clusters cluster_name actions sync run debug true

Example:

SMI Cluster Deployer# clusters trysjc-ha actions sync run debug true
This will run sync.  Are you sure? [no,yes] yes
message accepted

Note

In the event of an unplanned outage, you cannot restore the data since the SMI does not backup data periodically in the background. However, in the event of planned maintenance, you can restore the data from the storage node manually. You can restore the data using the scp command on the storage node:

scp -i ../test_key2.pem -r host_name@<ipv4address>:/data

Copy the backed up data from the storage node and verify whether the NFs, CM and CEE configurations are present.

Note

Restore the backed up data only when the Active and Standby nodes and all the pods within these nodes are up and running.

NOTES:

clusters cluster_name - Specifies the K8s cluster.
actions sync run debug true - Synchronizes the cluster configuration.

Restoring an Entire Site or Cluster

When an entire site or cluster is down, you can redeploy the SMI Cluster Manager HA with NFs, CEE and remote cluster configurations using the Inception Server. To restore an entire site or cluster, use the following:

Redeploy the SMI Cluster Manager HA through Inception server. For more information, see Active and Standby Node Failure - Unplanned section.
Verify whether the NFs, CM and CEE configurations are present.

CEE Data Replication

The CEE replicates the data across all the three nodes (control plane, worker, and etcd). Also, the SMI Cluster Manager AIO model does not support the back up and recovery of operational data. Therefore, you must ensure the following to prevent the loss of monitoring data when a double node or triple node failure occurs:

Back up the CEE configuration for the Ops Center using the standard Ops Center backup and restore functionality.
Back up the Cluster Manager configuration for the Ops Center (remote cluster configuration) using the standard Ops Center backup and restore functionality.
For planned maintenance or unplanned outage in the SMI AIO mode, you can perform a cluster synchronization and restore the backed-up configuration.
Export the metrics (KPIs) from bulk stats regularly.
Gather Alerts in an external system.

Notifications

The SMI comes equipped with different notification streams to verify the event status (cluster or node synchronization) or synchronization state of the current cluster or nodes in it.

The show notification stream provides notification details for alert notifications, license status and system status.

show notification stream ?

Possible completions:
  alert-notification   Notifications for alerts
  license-status       Notifications for smart agent licensing will be generated here
  system-status        Notifications for system status

The NSO must subscribe to the alert-notification notification stream to receive alerts.

You can access the notification stream using the following command show command in the SMI Cluster Manager CLI:

show notification stream {node-state | node-status | sync-state | sync-status}

Sync-State Notification

You can access the synchronization state of a given cluster using the sync-state notification. After every cluster synchronization, a sync-state notification is registered as sent.

The following example displays the deployment of a stacked cluster:

SMI Cluster Deployer# show notification stream sync-state
notification 
 eventTime 2020-10-07T18:58:27.368+00:00
 sync-state-notification 
  cluster <stacked_cluster_name>
  state DEPLOYED
 !
!
notification 
 eventTime 2020-10-07T19:07:56.114+00:00
 sync-state-notification 
  cluster <stacked_cluster_name>
  state DEPLOYED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.531+00:00
 sync-state-notification 
  cluster <stacked_cluster_name>
  state DEPLOYED
 !
!
notification 
 eventTime 2020-10-08T15:59:56.697+00:00
 sync-state-notification 
  cluster <stacked_cluster_name>
  state DEPLOYED
 !
!
The node-state-notification could be used to assess the state of the node in a cluster deloyment sync process. Below example shows the state of control plane and worker nodes are joined in <stacked_cluster_name> cluster. Before reaching JOINED state if the join process is still running they would show up as JOINING.
[installer-controlplane] SMI Cluster Deployer# show notification stream node-state
notification 
 eventTime 2020-10-07T19:46:54.531+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node cmts-worker3
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.532+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node controlplane2
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.534+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node cmts-worker2
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.535+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node controlplane3
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.535+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node controlplane1
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.54+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node cmts-worker1
  state JOINED
 !
!

Node-State Notification

You can view the node state in a cluster deployment synchronization process using the node-state notification.

The following example displays the state of control plane and worker nodes that have joined the stacked cluster. The node state is displayed as JOINING , if the process is running. Otherwise, the node state is displayed as JOINED .

SMI Cluster Deployer# show notification stream node-state
notification 
 eventTime 2020-10-07T19:46:54.531+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node cmts-worker3
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.532+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node controlplane2
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.534+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node cmts-worker2
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.535+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node controlplane3
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.535+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node controlplane1
  state JOINED
 !
!
notification 
 eventTime 2020-10-07T19:46:54.54+00:00
 node-state-notification 
  cluster <stacked_cluster_name>
  node cmts-worker1
  state JOINED
 !
!

Sync-Status Notification

You view the progression of the cluster synchronization process using the sync-status notification.

The following example displays the different states (SYNC-START, ERROR and DONE) of the cluster synchronization process:

SMI Cluster Deployer# show notification stream sync-status
notification 
 eventTime 2020-10-08T16:05:53.326+00:00
 sync-status-notification 
  cluster <stacked_cluster_name>
  status ERROR
 !
!
notification 
 eventTime 2020-10-08T16:12:48.042+00:00
 sync-status-notification 
  cluster <stacked_cluster_name>
  status SYNC-START
 !
!
notification 
 eventTime 2020-10-08T16:26:23.444+00:00
 sync-status-notification 
  cluster <stacked_cluster_name>
  status ERROR
 !
!
notification 
 eventTime 2020-10-08T16:26:23.446+00:00
 sync-status-notification 
  cluster <stacked_cluster_name>
  status ERROR
 !
!
notification 
 eventTime 2020-10-08T16:31:42.551+00:00
 sync-status-notification 
  cluster <stacked_cluster_name>
  status SYNC-START
 !
!
notification 
 eventTime 2020-10-08T16:43:33.498+00:00
 sync-status-notification 
  cluster <stacked_cluster_name>
  status DONE
 !
!
notification 
 eventTime 2020-10-08T20:04:21.386+00:00
 sync-status-notification 
  cluster kali-stacked
  status SYNC-START
 !
!
notification 
 eventTime 2020-10-08T20:04:21.386+00:00
 sync-status-notification 
  cluster kali-stacked
  status SYNC-START
 !
!
notification 
 eventTime 2020-10-08T20:27:32.155+00:00
 sync-status-notification 
  cluster kali-stacked
  status DONE
 !
!

Node-Status Notification

You can view the status of the node synchronization using the node-status notification. The following example displays the status of the nodes in a stacked cluster:

SMI Cluster Deployer# show notification stream node-status
notification 
 eventTime 2020-09-25T19:59:56.728+00:00
 node-status-notification 
cluster kali-vm3
node controlplane
status DONE
 !
!
notification 
 eventTime 2020-09-29T16:36:05.962+00:00
 node-status-notification 
cluster kali
node etcd1
status DONE
 !
!
notification 
 eventTime 2020-09-29T20:54:18.415+00:00
 node-status-notification 
cluster kali-stacked
node controlplane1
status DONE
 !
!
notification 
 eventTime 2020-09-30T01:28:57.355+00:00
 node-status-notification 
cluster kali-stacked
node controlplane1
status DONE
 !
!
notification 
 eventTime 2020-09-30T01:46:53.034+00:00
 node-status-notification 
cluster kali-stacked
node controlplane1
status DONE
 !
!
notification 
 eventTime 2020-09-30T04:17:25.701+00:00
 node-status-notification 
cluster kali-stacked
node controlplane1
status ERROR
 !
!
notification 
 eventTime 2020-10-06T20:02:30.108+00:00
 node-status-notification 
cluster kali-stacked
node cmts-worker1
status DONE
 !
!

VM Status Alerts

The NSO needs to know the state of the NF (Network Function) that has been deployed and notifications for the following states:

DEPLOYED
ALIVE
UNDEPLOYED
ERROR
RECOVERING
RECOVERY_FAILED

Note

The SMI supports VM status notification for UPF only.

The following parameters are introduced in the notification streams for the VM status notifications:

Table 2. VM Status Notification
Parameter	Description
vm-state-notification stream	The vm-state-notification stream has the following details: cluster_name: the cluster holds KVM nodes node_name: KVM node name vm_name: UPF name state: VM state message: Carries useful information such as error message or mgmt-ip when the state of the VM is ALIVE.
alert-notification stream	The alert-notification stream has the following labels: node_name: KVM node name vm_name: UPF name state: VM state message: Carries useful info such as error message or mgmt-ip when state is ALIVE.

Depending on the life cycle stage of the VM, notifications are generated from either the Cluster Manager or CEE Ops-Center and sent to the NSO. When VM gets deleted or redeployed, the UNDEPLOYED notification is sent to the Cluster Manager notification stream. All other notifications are generated by the Alert Manager and then sent to the CEE Ops-Center notification stream.

Cluster Manager Notification

The Cluster Manager sends vm-state-notification for the UNDEPLOYED VM state.

CEE Ops-Center Notification

The CEE Ops-Center alert-notification sends the following alerts for different VM states:

vm-deployed: minor - DEPLOYED
vm-alive: minor – ALIVE (alert lasts for a short time and disappears automatically)
vm-error: major - ERROR
vm-recovering: warning - RECOVERING
vm-recovery_failed: critical - RECOVERY_FAILED

All required fields are included in alert labels for notification from alert-notification. All VM alerts are viewable on the Grafana dashboard.

VM Action Notifications

Delete Action: When delete VM action is triggered, CM sends notifications that the VM is deleted. The VM states are UNDEPLOYED and ERROR for vm delete action.

clusters abc-cluster-15 nodes kvm-1 vms upf1 actions delete

Redeploy Action: When VM is in RECOVERY_FAILED state, NSO sends a request to redeploy the VM. A redeploy action does both delete action and sync action.

clusters abc-cluster-15 nodes kvm-1 vms upf1 actions redeploy

Redeploy Action Notification: The redeploy action sends a notification to the CM. The redeploy vm action has the following 4 states: UNDEPLOYED, ERROR, REDEPLOYED, REDEPLOY_ERROR.

show notification stream vm-state


notification
eventTime 2021-02-23T21:27:28.692+00:00
vm-state-notification
cluster_name cndp-testbed
node_name kvm-1
vm_name upf2
state UNDEPLOYED
message
!
!
notification
eventTime 2021-02-23T21:29:18.699+00:00
vm-state-notification
cluster_name cndp-testbed
node_name kvm-1
vm_name upf2
state REDEPLOYED
message
!
!

Configuring the Alert Notification in CEE

The user must configure alert notifications when they deploy the UPF VMs. Log in to the CEE cli to add the following configuration:


config
bulk-stats prune-interval-days 3
prometheus kvm-metrics defaults private-key "-----BEGIN OPENSSH PRIVATE KEY-----LGXtil23N4YV=\n-----END OPENSSH PRIVATE KEY-----\n"
prometheus kvm-metrics defaults user cloud-user
prometheus kvm-metrics monitor-server 10.194.62.41
hostname abc-bm-15-controlplane
exit

Note

The user must replace the IP, hostname, private key and user details.

Sample Notification from the Alert Notification Stream


notification
eventTime 2021-01-08T03:28:54.501+00:00
smi-alert-notification
starts-at 2021-01-08T03:28:24.493874101Z
ends-at 0001-01-01T00:00:00Z
alert-status firing
smi-alert-notification alert-label
name alertname
value vm-recovery-failed
!
smi-alert-notification alert-label
name cluster
value test-cee-kvm_cee-voice
!
smi-alert-notification alert-label
name hostname
value test-bm-15-controlplane
!
smi-alert-notification alert-label
name instance
value metrics-proxy-test-bm-15-controlplane:9100
!
smi-alert-notification alert-label
name job
value metrics-proxy
!
smi-alert-notification alert-label
name message
value 10.1.1.3
!
smi-alert-notification alert-label
name monitor
value prometheus
!
smi-alert-notification alert-label
name node_name
value controlplane
!
smi-alert-notification alert-label
name replica
value test-cee-kvm_cee-voice
!
smi-alert-notification alert-label
name severity
value critical
!
smi-alert-notification alert-label
name state
value RECOVERY_FAILED
!
smi-alert-notification alert-label
name vm_name
value upf2
!
smi-alert-notification alert-annotation
name summary
value upf2 failed to recover.
!
smi-alert-notification alert-annotation
name type
value Equipment Alarm
!
!
!

Configurable Host Profile Support

This section describes the procedure to configure BIOS using host profiles on Cisco UCS servers.

Adding a Host Profile

This section describes the sequence of operation about how to add a host-profile to the software configuration configured on the CM.

To use host profiles, add the .tgz file to the software configuration on the CM by using the following CLI configuration and sync.
```
config 
    software 
        host-profile host-profile_name url software_url user user_name password passowrd 
```
NOTES:
- host-profile host-profile_name - Specify the name of the host profile, which is used to configure the BIOS settings for a UCS server.
- url software_url - Specify the HTTP, HTTPS, or file URL of the software. File format must be "file:///absolute/path/to/file".
  
  Must be a URI in the pattern (http:|https:|file:/)//.*.
- user user_name - Specify the user name for HTTP/HTTPS authentication.
  
  Must be a string.
- password passowrd - Specify the password for downloading software package.
  
  Must be an aes-cfb-128-encrypted string.
The following CLI commands form a sample configuration.
```
software host-profile cndp-upf
 url             https://www.abc.com/ucs/profile/cndp-upf-profile.tgz
 user            smi-readonly.gen
 password        $8$abc
 sha256          b01dc4d47926e7a35e352c2f2d1c9f8b280fd44a89d0657281ff858387669753
exit
```
To add the host-profile name in the cluster configuration, use the following CLI configuration.
```
config 
    clusters cluster_name 
        nodes node_name 
            host-profile host-profile_name 
            commit 
            exit 
```
NOTES:
- clusters cluster_name - Specify the name used to uniquely identify the cluster. Must be an alphanumeric string, and can contain the hyphen (-) character, however it must not start with a hyphen.
  
  Must be a string.
- nodes node_name - Specify the name of the node. name can be an alphanumeric string containing the hyphen (-). A host name cannot start with a hyphen (-). For example, kashaio-123.
  
  Must be a string.
- host-profile host-profile_name - Specify the name of the host profile, which is used to configure the BIOS settings for a UCS server.
The following CLI commands form a sample configuration.
```
config
clusters example-upf nodes kvm-1 host-profile cndp-upf
commit
top
exit
```
To return to the default settings, remove host-profile setting from the node. The following CLI commands form a sample configuration.
```
config
no clusters example-upf nodes kvm-1 host-profile
commit        
exit
```

BIOS Settings

The BIOS profile allows access to all the settings on all the CIMC variants.

The following sample configuration shows the default values when a host-profile is not configured for the node. You can create a host profile once you have the necessary tokens.

profiles:
  bios:
    name: cndp_default_settings
    description: "Default CIMC BIOS settings for CNDP"
    pids:
      UCSC-C220-M5SX:
        description: "Default CIMC BIOS settings for UCSC-C220-M5SX (the original)"
        tokens:
          cpuPerformance: hpc
          cpuEnergyPerformance: balanced-performance
          eppProfile: Performance
          intelHyperThreadingTech: disabled
          packageCstateLimit: C0 C1 State
          usbPortInternal: disabled
          usbPortKvm: enabled
          usbPortRear: disabled
          usbPortSdCard: disabled
        
      UCSC-C220-M6S:
        description: "Default CIMC BIOS settings for UCSC-C220-M6S"
        tokens:
          cpuPerformance: hpc
          cpuEnergyPerformance: balanced-performance
          eppProfile: Performance
          intelHyperThreadingTech: disabled
          packageCstateLimit: C0 C1 State
          usbPortInternal: disabled
          usbPortRear: disabled

      UCSC-C220-M7S
        description: "Default CIMC BIOS settings for UCSC-C220-M7S"
        tokens:
          CPUPerformance: HPC
          CpuEngPerfBias: Balanced Performance
          EPPProfile: Performance
          IntelHyperThread: Disabled
          PackageCstateLimit: C0 C1 State
          UsbPortRear: Disabled

      UCSC-C240-M4SX:
        tokens:
          cpuPerformance: enterprise
          cpuEnergyPerformance: balanced-performance
          ## not supported eppProfile: Performance
          intelHyperThreadingTech: Disabled
          packageCstateLimit: C0/C1
          usbPortInternal: Disabled
          usbPortKvm: Enabled
          usbPortRear: Disabled
          usbPortSdCard: Disabled

To add a host profile, see the Adding a Host Profile section.

Hyper Threading Support on KVM

The Hyper Threading (HT) allows more than one thread to run on each core to enable more work in parallel.

To enabled or disabled HT, see the BIOS Settings section.

When HT is enabled on the UCS server, the total CPU number doubles, which impacts the following:

Isolation CPU setting
VM CPU allocation
UPF CPU Worker Count setting

Isolation CPU Pinning

When HT is disabled, one CPU from each socket is reserved for the base OS. The configuration is as follows:

cloud-user@user-bm-15-controlplane:~$ cat/proc/cmdline

BOOT_IMAGE=/vmlinuz

root=LABEL=cloudimg-rootfs ro intel_iommu=on isolcpus=1-19,21-39 hugepagesz=1G
            hugepages=312 console=ttyS0,115200 console=tty1

When HT is enabled, isolcpus parameter is set to:

isolcpus=1-19,21-39,41-59,61-79

CPU Worker Count

The VPP_CPU_WORKER_CNT is set in the day 0 configuration. It depends on a number of other parameters such as the UPF flavor configuration, SR-IOV, HT setting, and the vCPU allocation.

The VPP_CPU_WORKER_CNT is adjusted according to the UPF flavor configuration. The UPF VM can be with different flavors, such as full VM (1 UPF), half VM (2 UPFs) and quarter VM (4 UPFs) in the UCS environment.

The SR-IOV provides capabilities to configure multiple VFs per PF.

The NIC allows the number of Rx and Tx queues to be configured per VF. The maximum Rx and Tx queues depends on the SR-IOV or PCI_PT configuration. A maximum of 16 Rx and Tx queues can be configured per VF in SR-IOV.

The VPP_CPU_WORKER_CNT configuration overrides the default value for PCI_PT and SR-IOV.

When HT is enabled, the CPU number is doubled and the VPP_CPU_WORKER_CNT is also doubled.

The following tables summarizes the Worker Count details for the different types of UCS servers.

Table 3. Parameter Values for UCS M5 Server with 48 CPU cores
Deployment Flavor	Hyperthreading	Number of UPFs	Worker Count	Rx Queues	Tx Queues	vCPUs
Full	Enabled	1	16	16	16	92
Full	Disabled	1	8	8	9	46
Half	Enabled	2	16	16	16	46
Half	Disabled	2	8	8	9	23
Quarter	Enabled	4	8	8	9	22
Quarter	Disabled	4	4	4	5	11

Table 4. Parameter Values for UCS M6 and M7 Servers
Deployment Flavor	Hyperthreading	Number of UPFs	Worker Count	Rx Queues	Tx Queues	vCPUs
Full	Enabled	1	48	48	48	124
Full	Disabled	1	24	24	24	62
Half	Enabled	2	24	24	24	62
Half	Disabled	2	12	12	12	31
Quarter	Enabled	4	12	12	12	30 / 32
Quarter	Disabled	4	6	6	6	15 / 16

You can check the worker count configuration as follows:

isoinfo -R -i /data/upf/libvirt/xml/upf1/upf1-cfg.iso -x /staros_param.cfg

Emulator Pinning

You can set the emulator pinning during the VM CPU allocation:

<emulatorpin cpuset="0,40”/> <emulatorpin cpuset="20,60”/>

To verify the VM CPU allocation, run the following command:

virsh dumpxml upf2

The KVM node must be redeployed if the HT setting is changed (for example, from enabled to disabled).

CPU Isolation

Table 5. Feature History
Feature Name	Release Information	Feature Description
CPU Isolation	2023.04	SMI provides a higher level of CPU isolation for VPP workers to support cnUPF. Using the host profile, SMI defines isolcpu to isolate CPUs from the kernel scheduler.

SMI provides a higher level of CPU isolation for VPP workers. With CPU isolation, no other processes can be scheduled on "isolcpu" CPUs where VPP workers are pinned.

SMI uses the host profile to define isolcpu that isolates CPUs from the kernel scheduler. It does not prevent K8s containers from changing their affinities to run on isolcpus. Depending on the deployment, SMI also provides the flexibility to use VPP workers and session managers for CPU isolation.

How it Works

CPU isolation utilizes a containerd Node Resource Interface (NRI) plugin (v0.3.0) that subscribes for pod or container lifecycle events.

NRI is a common framework for plugging extensions into OCI-compatible container runtimes. It provides basic mechanisms for plugins to track the state of containers and to make limited changes to their configuration.

Using the NRI Plugin

The following built-in rules apply for the plugin:

The CPU isolator ignores containers in a pod with prefix name as "vpc-"
If the annotation smi.cisco.com/cpuset exists, CPU isolator adjusts the CPU set using its value
Otherwise the value of environment "CPUSET_KUBEPODS" is used.

The following steps describe how to start the NRI plugin:

Retain the CPU set for K8s to all CPUs
During a CreateContainer event, the plugin adjusts the container's CPU set based on the following conditions:
- if it is a VPC container, it does nothing so that VPP workers can be pinned to isolated CPUs
- for other non-VPC containers, it creates a customized CPU set to exclude the isolated CPU

Sample Configuration

CM calculates the CPU set of kubepods and passes it as the environment value "CPUSET_KUBEPODS" to create a static pod for CPU isolator.

To define isolcpu from host profile, the following is a sample host profile configuration:

---
profile_version: 2
profiles:
  bios:
    name: test_bios_settings
    description: "CIMC BIOS settings"
    pids:
      UCSC-C220-M5SX:
        description: "Copy of Default CIMC BIOS settings for UCSC-C220-M5SX"
        tokens:
          cpuPerformance: hpc
          cpuEnergyPerformance: balanced-performance
          eppProfile: Performance
          packageCstateLimit: C0 C1 State
          usbPortInternal: disabled
          usbPortKvm: enabled
          usbPortRear: disabled
          usbPortSdCard: disabled
  linux:
    cisco_c220_m5_dual_numa:
      # name: no longer used
      description: "Cisco M5 Dual Numa Test Profle"
      grub:
        dedicate-cpus:
          numa-0: '0%'
          numa-1: '26-39'
        hugepages:
          numa-0:
            2m: 0%
            1g: 0%
          numa-1:
            2m: 20%
            1g: 50
        addition-grub-cmd-line: "intel_iommu=on intel_pstate=disable intel_idle.max_cstate=1 pcie_aspm.policy=performance idle=poll clocksource=tsc tsc=reliable skew_tick=1 nosoftlockup"
      sysctl:
        net.ipv4.neigh.default.gc_thresh1: 8192
        net.ipv4.neigh.default.gc_thresh2: 32768
        net.ipv4.neigh.default.gc_thresh3: 65536
        net.ipv6.neigh.default.gc_thresh1: 8192
        net.ipv6.neigh.default.gc_thresh2: 32768
        net.ipv6.neigh.default.gc_thresh3: 65536
      sysfs:
        /sys/module/nvme_core/parameters/io_timeout: 4294967295
      sriov:
      - match:
          pf-name:  enp94s*
        vfs-per-pf: 4
        dpdk-vfs-per-pf: 2
        dpdk-bind: vfio-pci
      - match:
          pf-name:  enp216s*
        vfs-per-pf: 16
 
    cisco_c220_m5_single_numa:
      description: "Cisco M5 Single Numa Test Profle"
      grub:
        dedicate-cpus:
          numa-0: '26-39'
        hugepages:
          numa-0:
            2m: 20%
            1g: 0%
        # single core needs intel_iommu=off
        addition-grub-cmd-line: "intel_iommu=off intel_pstate=disable intel_idle.max_cstate=1 pcie_aspm.policy=performance idle=poll clocksource=tsc tsc=reliable skew_tick=1 nosoftlockup"
      sysctl:
        net.ipv4.neigh.default.gc_thresh1: 8192
        net.ipv4.neigh.default.gc_thresh2: 32768
        net.ipv4.neigh.default.gc_thresh3: 65536
        net.ipv6.neigh.default.gc_thresh1: 8192
        net.ipv6.neigh.default.gc_thresh2: 32768
        net.ipv6.neigh.default.gc_thresh3: 65536
      sysfs:
        /sys/module/nvme_core/parameters/io_timeout: 4294967295
      sriov:
      - match:
          pf-name:  enp94s*
        vfs-per-pf: 4
        dpdk-vfs-per-pf: 2
        dpdk-bind: vfio-pci
      - match:
          pf-name:  enp216s*
        vfs-per-pf: 16

Single-Root Input/Output Virtualization

Overview

The Single Root I/O Virtualization (SR-IOV) specification is a Peripheral Component Interconnect (PCI) passthrough hardware standard that regulates the device assignment. The PCI passthrough natively shares a single resource with multiple guests. PCI passthrough enables the virtual machines to bypass the hypervisor and virtual switch layer, and communicate directly with the PCI devices residing on the host. SR-IOV is an extension of the PCI passthrough specification. In the PCI configuration space, a single physical device that has SR-IOV capability enabled is virtualized as multiple devices. Each device is a secured and distinct unit with isolation at the resource level, which means it has its own set of storage and configurations. This mechanism provides data transfer at the near wire-speed along with low latency.

The SMI on Bare Metal platform supports both, PCF passthrough and the SR-IOV standards in the form of plugins. In the CNDP-KVM-UPF implementation, the VM and network devices are by default configured to support SR-IOV Virtual Functions (VFs). If you want to pass the network resources to network function (UPF) through the PCI passthrough standard, then enable the PCI passthrough for the network device.

How it Works

The SMI on Bare Metal platform lets the NF (UPF) select the PCI passthrough or SR-IOV as a network resource management plugin. The PCI passthrough allows the NF to have an exclusive access to the the physical PCI devices such as network card through a VM. The PCI devices may not be physically attached to the guest host however, they provide an efficient and seamless interaction to the NF.

The SR-IOV enables the virtual machines to have parallel access to the physical network cards installed on the hypervisor. Traditionally, the physical devices had one-to-one mapping with the VMs, which required several hardware resources and memory capacity. SR-IOV has introduced Virtual Functions (VFs) that are lightweight Peripheral Component Interconnect Express (PCIs) function. The PCI is an interface standard that is responsible for connecting high-speed components. The VFs are the self-sufficient components required for processing input/output requests between VMs. Each VF is derived from the Physical Function (PF). The number of VFs that a VM supports is based on the underlying hardware device's capacity. For instance, the hypervisor can map one or more VFs to a virtualized guest. The VF's configuration space is mapped to the space that the virtualized guest has presented to the VF. This is essentially the space that the hypervisor allocates to the virtualized guest.

Following is the sample XML configuration of PCI passthrough and SR-IOV standards:

PCI passthrough:

<hostdev mode='subsystem' type='pci' managed='yes'>
  <driver name='vfio'/>
  <source>
    <address domain='0x0000' bus='0x5e' slot='0x00' function='0x0'/>
  </source>
</hostdev>

SR-IOV:

<interface type='hostdev' managed='yes'>
  <mac address='02:11:11:22:22:04'/>
  <driver name='vfio'/>
  <source>
    <address type='pci' domain='0x0000' bus='0x5e' slot='0x0a' function='0x1'/>
  </source>
</interface>

Enabling Support for PCI Passthrough

This section describes how to enable a VM to support PCI Passthrough.

If the VM uses the SR-IOV mechanism, then ensure that the VF number for the associated PF and MAC address is available.

Use the following configuration to configure the PCF passthrough option on the VM:

cluster cluster_name 
   nodes node_name 
      type kvm 
         os enable-passthrough  {true|false} 
         os num-vfs-per-pf number_of_vf_per_pf 
   exit 
exit

NOTES:

cluster cluster_name – Specify the name of the cluster.
nodes node_name – Specify the name of the node.
os enable-passthrough {true|false} – Configures the PCI passthrough capability for the VM. Default value is "false".
os num-vfs-per-pf number_of_vf_per_pf – Specifies the number of the VF for each physical function. Default value is 16.

IPSec Support for SMF N4 Interfaces

Feature Summary and Revision History

Summary Data

Applicable Product (s) or Functional Area	KVM-based application deployment support
Applicable Platforms	Bare Metal, OpenStack, VMware
Feature Default Setting	Disabled – Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	UCC SMI Operations Guide

Revision History

Revision Details

Release

Added support for the following functionality:

IPSec Monitoring
Configuring IPSec certificates under strongSwan configuration

2023.01.0

First introduced.

2020.02.2.47

Feature Description

This feature introduces strongSwan, a keying daemon, which uses the Internet Key Exchange (IKE) protocols, IKEv1 and IKEv2, to establish security associations (SA) between two peers in a network. Such an IKE session is denoted as IKE_SA in this chapter. The IKE provides strong authentication for both peers and derives unique cryptographic session keys. Besides authentication and key material, IKE also provides the means to exchange configuration information and to negotiate IPsec SAs, which are often called as CHILD_SAs. IPsec SAs define which network traffic is to be secured and how it has to be encrypted and authenticated.

The strongSwan feature is available as an add-on from the Cluster Manager (CM). Use the CM Ops-Center to configure this add-on. In the current release, the SMI uses strongSwan version 5.9.3.

SMI allows monitoring of IPSec certificates—sends certificate expiry alerts and updates certificate through strongSwan configuration.

Configuration Parameters

In this section, see the description for different configuration parameters available for the strongSwan add-on feature. Use the CM Ops-Center to configure these parameters.

name : Specifies the name of the connection, which can be used for connection specific operations, for example, up or down.
auto { ignore | add | route | start }: Specifies the operation, if any, that should be automatically performed at IPsec startup. The add option loads a connection without starting it, whereas route loads a connection and installs kernel traps. If traffic is detected between the leftsubnet and rightsubnet, a connection is established. The start option loads a connection and brings it up immediately. The ignore option ignores the connection and is the same as deleting a connection from the config file.

The default value is ignore.
keyexchange { ikev1 | ikev2 }: Specifies the method of key exchange and the protocol to use to initialize the connection.
type { tunnel | transport | transport_proxy | passthrough | drop } : Specifies the type of the connection. Currently, the accepted values are tunnel, signifying a host-to-host, host-to-subnet, or subnet-to-subnet tunnel. The transport option signifies a host-to-host transport mode, whereas the transport_proxy option signifies the special Mobile IPv6 transport proxy mode. The passthrough option signifies that no IPsec processing should be done at all and drop signifies that packets must be discarded.
left or right { ip address ip_address | fqdn fqdn | %any | %any4 | %any6 | range | subnet }: Specifies the IP address or FQDN of the participant public-network interface. The value %any for the local endpoint signifies an address to be filled in (by automatic keying) during negotiation. If the local peer initiates the connection setup, then the routing table is queried to determine the correct local IP address. If the local peer is responding to a connection setup, then any IP address that is assigned to a local interface is accepted. The value %any4 restricts address selection to IPv4 addresses and %any6 restricts address selection to IPv6 addresses.
leftsubnet or rightsubnet ip subnet: Specifies the private subnet behind the left participant, expressed as either network or netmask.
leftid or rightid id: Specifies how the left or right participant must be identified for authentication. The default values are left or right or the subject of the certificate configured. It must match the full subject DN or one of the subjectAltName extensions contained in the certificate.
leftsendcert { never | no | ifasked | always | yes }: Defines whether a peer must send a certificate request (CR) payload in order to get a certificate in return.
leftauth or rightauth { pubkey | psk | eap | xauth } : Specifies the authentication method to use locally (left) or require from the remote (right) side. The acceptable values are pubkey for public key encryption (RSA/ECDSA), psk for pre-shared key authentication, eap to use the Extensible Authentication Protocol, and xauth for IKEv1 eXtended Authentication.

Pubkey is the default option.
psk pre-shared key: Specifies the required setting if leftauth or rightauth is configured as psk.
esp { cipher suites | aes128-sha256 } : A comma-separated list of ESP encryption or authentication algorithms is used for the connection, for example, aes128-sha256. The notation is encryption-integrity[-dhgroup][-esnmode]. For IKEv2, multiple algorithms (separated by -) of the same type can be included in a single proposal. IKEv1 only includes the first algorithm in a proposal.

aes128-sha256 is the default option.
ike { cipher suites | aes128-sha256-modp3072 } : A comma-separated list of IKE/ISAKMP SA encryption or authentication algorithms is used, for example, aes128-sha256-modp3072. The notation is encryption-integrity[-prf]-dhgroup. In IKEv2, multiple algorithms and proposals might be included, such as aes128-aes256-sha1-modp3072-modp2048 or 3des-sha1-md5-modp1024.
ikelifetime { time time | 3h }: Specifies how long the keying channel of a connection (ISAKMP or IKE SA) must last before being renegotiated.
lifetime { time time | 1h }: Specifies how long a particular instance of a connection should last, from successful negotiation to expiry.
dpdaction { none | clear | hold | restart } : Specifies the action to be taken when dead peer is detected.

none is the default value.
dpddelay { time time | 30s } : Defines the period time interval with which INFORMATIONAL exchanges are sent to the peer. These are only sent if no other traffic is received.
dpdtimeout { time time | 150s }: Defines the timeout interval after which, all the connections to a peer are deleted in case of inactivity.
inactivity time time: Defines the timeout interval after which, a CHILD_SA is closed if it did not send or receive any traffic.
closeaction { none | clear | hold | restart } : Defines the action to take if the remote peer unexpectedly closes a CHILD_SA (see dpdaction for the description of different options). If the peer uses reauthentication or uniqueids checking, closeaction must not be used, these events might trigger the defined action when it's not desired.
nodes list_of_node_names : Specifies the node names on which IPSec connection must be established.

serverCert server_certificate: Specifies the content of Server certificate in the pem format to be used for this connection.

Note

This keyword is not supported under strongSwan configuration.

serverPrivKey server_private_key : Specifies the content of server private key in the pem format to be used for this connection.

Note

This keyword is not supported under strongSwan configuration.

serverPrivKeyPassphrase passphrase : Specifies the passphrase used to encrypt the server-priv-key value.
server-secret : Pass an existing TLS secret for this connection.

Installing strongSwan

This section describes how to install the strongSwan feature.

Install strongSwan as an Add-on from the CM

Use the following steps to install strongSwan as an add-on from the CM Ops-Center:

Use the following CLI commands to enable the strongSwan add-on:

clusters cluster_name addons strongswan enabled
Set all the strongSwan parameters for connection (refer to the Configuration Parameters section for more details on available parameters).

Trigger the cluster sync operation.

Note

The strongSwan pods run on all the nodes, however traffic is accepted only on those nodes, which are configured by using the "nodes" parameter in the CM Ops-Center. strongSwan does not accept or send any traffic on non-configured nodes.

Configuring IPSec Certificates

To configure IPSec certificates under strongSwan configuration, use the following procedure:

Create TLS associated secret for server and CA certificate.

Note: Create strongSwan-related secrets inside the smi-strongswan namespace.

Example:

[test-cm-controlplane] SMI Cluster Deployer# show running-config clusters secrets ca-cert
clusters test-aio
 secrets ca-cert smi-strongswan 134-ca
  certificate "-----BEGIN CERTIFICATE-----\nMIIDqzCzQubm..................1Ac1L+s4M3ug==\n-----END CERTIFICATE-----\n"
 exit
 secrets ca-cert smi-strongswan 135-ca
  certificate "-----BEGIN CERTIFICATE-----\nMIIFqzCCA5Og..................9XdMDiQANHgf7w\n-----END CERTIFICATE-----\n"
 exit
 secrets ca-cert smi-strongswan ca-1
  certificate "-----BEGIN CERTIFICATE-----\nMIID0TCCArmg..................UNvF0nAmIX0qxg4\n-----END CERTIFICATE-----\n"
 exit
 secrets ca-cert smi-strongswan ca-2
  certificate "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADAN..................tbNDzGAnF29nus=\n-----END PRIVATE KEY-----\n"
 exit
exit

Refer the secrets in strongSwan configuration. The strongSwan configuration shows the available TLS and certificates.

Example:

[test-cm-controlplane] SMI Cluster Deployer# show running-config clusters karan-aio strongswan connections server-secret
clusters test-aio
 strongswan connections a-to-b
  server-secret a-to-b
 exit
exit
 
[test-cm-controlplane] SMI Cluster Deployer# show running-config clusters karan-aio strongswan ca-certs
clusters test-aio
strongswan ca-certs [ 134-ca 135-ca ]
exit

Operating the SMI Cluster Manager on vCenter VMware

This section describes how to instantiate the K8s Cluster on the SMI Cluster Manager in a VMware vSphere environment. The SMI Cluster Manager is the de-facto orchestrator for the K8s Cluster. It requires add-ons like Ngnix-Ingress, Istio, and so on. Also, the Ops Center is required for all the Cisco Cloud products using the SMI.

SMI supports datacenters at the root level and within folders. If the datacenter is within a folder, then the entire path from the root until the datacenter is mentioned in the datacenter-path field. If the vSphere cluster is organised within folders, SMI can auto-detect the cluster as long as the name is unique with the datacenter.

The SMI Cluster Manager currently provisions the following:

Base OS, creation of nodes and virtual machines in VMware vSphere virtualization environment.
Deployment of K8s multi-control plane cluster with Calico pod networking.
Deployment of K8s addon applications.
Application provisioning. You can either deploy the Ops Center alone or the entire product.
Product based Host OS customization for nodes with specific labels.

Supported Configurations

The SMI Cluster Manager supports the following three VM configurations:

Table 6. Functional Test HA
	CPU	Cores Per Socket	RAM	Data Disk	Home Disk	Root Disk
Control Plane	2 CPU	1	16 GB	20 GB	5 GB	100 GB
ETCD	1 CPU	1	8 GB	20 GB	5 GB	100 GB
Worker	8 CPU	1	32 GB	200 GB	5 GB	100 GB

Important

These configurations are applicable only for testing in the Lab environment. It is not supported in the production environment.

Table 7. Functional Test AIO
	CPU	Cores Per Socket	RAM	Data Disk	Home Disk	Root Disk
Control Plane	6 CPU	1	32 GB	200 GB	5 GB	100 GB

Important

These configurations are applicable only for testing in the Lab environment. It is not supported in the production environment.

Note

Individual NFs are deployed as K8s workers through SMI. They each have their own VM recommendations. Refer to the NF documentation for details.

Table 8. Production
	CPU	Cores Per Socket	RAM	Data Disk	Home Disk	Root Disk
Control Plane	2 CPU	2	16 GB	20 GB	5 GB	100 GB
ETCD	2 CPU	2	16 GB	20 GB	5 GB	100 GB
Worker	36 CPU	36	164 GB	200 GB	5 GB	100 GB

Prerequisites

The prerequisites for instantiating the K8s Cluster on SMI Cluster Manager are:

SMI Cluster Manager AIO.
Local web server that hosts the products offline TAR ball version(s).

Requirements

SMI Cluster Manager (all versions are supported).

Components Used

The following component is used for deploying and upgrading the product in the offline environment:

SMI Cluster Manager.

Configuring the vCenter Environment

To configure the vCenter environment, use the following configuration:

Configure the vCenter environment with the required configuration parameters through the SMI Cluster Manager CLI. A sample configuration is shown below:

environments laas 
   vcenter server vcenter_server_ipv4_address  
   vcenter port vcenter_port  
   vcenter allow-self-signed-cert true (to allow self signed certs) 
   vcenter user vcenter_username 
   vcenter password vcenter_password 
   vcenter datastore vcenter_host_datastore (the corresponding vcenter host datastore) 
   vcenter cluster vcenter_cluster (the vcenter cluster containing the above host) 
   vcenter datacenter-path datacenter_path 
   vcenter datacenter vcenter_datacenter 
   vcenter host vcenter_host_ipv4_address 
   vcenter nics network_ID 
   exit 
exit

Important

You can add each vCenter environment to one or more K8s Cluster configuration. For VMs managed in the OpenStack environment, you can use the following configuration:

environments openstack 
   manual 
exit

Configure the cluster essentials like node defaults which includes, initial boot, K8s, operating system NTP, and node configurations. In a multi-mode environment, a minimum of 3 control planes, 3 etcd, and 3 OAM (worker or product) nodes are required. The number of worker nodes and its type depends on the product that is being installed. For more information about the worker nodes and labels, see the relevant product documentation. The following example shows the cluster configuration which is not specific to any products.

Note

Based on the customer requirements, you can choose to either include or exclude the following cluster configurations:

Volume provisioning – Configure volume provisioning while using persistent volumes.
Network Proxy – Configure network proxies based on the requirements.
Local NTP with Authentication – For configuring local NTP server with authentication, see Configuring the Local NTP Server with Authentication and Tracking section.
Virtual IPs (VIPs) – Configure virtual IP addresses based on the requirements.
Product registry secrets – Set up secrets to protect product registries.
Node labels – If required, assign specific labels to nodes.

clusters <cluster_name> 
 
         # associating an existing vcenter environment 
         environment <vcenter_environment> #Example:laas 
 
         # General cluster configuration 
         configuration master-virtual-ip <keepalived_ipv4_address>  
         configuration master-virtual-ip-cidr <netmask_of_additional_master_virtual_ip> #Default is 32   
         configuration master-virtual-ip-interface <interface_name>  
         configuration additional-master-virtual-ip <ipv4_address>  
         configuration additional-master-virtual-ip-cidr <netmask_of_additional_master_virtual_ip> #Default is 32  
         configuration additional-master-virtual-ip-interface <interface_name> 
         configuration virtual-ip-vrrp-router-id <virtual_router_id> #To support multiple instances of VRRP in the same subnet 
         configuration pod-subnet <pod_subnet> #To avoid conflict with already existing subnets  
         configuration size <functional_test_ha/functional_test_aio/production>  
         configuration allow-insecure-registry <true> #To allow insecure registries 
 
        # istio and nginx ingress addons 
         addons ingress bind-ip-address <keepalived_ipv4_address>  
         addons istio enabled 
 
         # vsphere volume provider configuration 
         addons vsphere-volume-provider server <vcenter_server_ipv4_address>  
         addons vsphere-volume-provider server-port <vcenter_port> 
         addons vsphere-volume-provider allow-insecure <true> #To allow self signed certs  
         addons vsphere-volume-provider user <vcenter_username> 
         addons vsphere-volume-provider password <vcenter_password> 
         addons vsphere-volume-provider datacenter <vcenter_datacenter> 
         addons vsphere-volume-provider datastore <vcenter_nfs_storage> #Corresponding vcenter nfs storage 
         addons vsphere-volume-provider network <network_id> 
         addons vsphere-volume-provider folder <cluster_folder_containing_the_VMs> 
 
         # Openstack volume provider configuration 
         addons openstack-volume-provider username <username>  
         addons openstack-volume-provider password <password>  
         addons openstack-volume-provider auth-url <auth_url>  
         addons openstack-volume-provider tenant-id <tenant_id>  
         addons openstack-volume-provider domain-id <domain_id> 
 
         # initial-boot section of node-defaults for vmware 
         node-defaults initial-boot default-user <default_username>  
         node-defaults initial-boot default-user-ssh-public-key <public_ssh_key> 
         node-defaults initial-boot netplan template 

 
         # initial-boot section of node-defaults for VMs managed in Openstack 
         node-defaults initial-boot default-user <default_user> 
         node-defaults netplan template 
           #jinja2:variable_start_string:'__DO_NOT_ESCAPE__' , variable_end_string:'__DO_NOT_ESCAPE__' 
           # 
 
         #k8s related config of node-defaults 
         node-defaults k8s ssh-username <default_k8s_ssh_username>  
         node-defaults k8s ssh-connection-private-key 
                 -----BEGIN RSA PRIVATE KEY----- 
                 <SSH_Private_Key> 
                 -----END RSA PRIVATE KEY----- 
 
           # os related config of node-defaults 
           node-defaults os proxy https-proxy <https_proxy>  
           node-defaults os proxy no-proxy <no_proxy_info>  
           node-defaults os ntp servers <local_ntp_server> 
           exit 
 
           # node configuration of multinode cluster. vmware related info overrides the defaults provided in the environment 'laas' associated with the cluster 
 
      nodes node_name #For example, etcd1 
         k8s node-type etcd 
         k8s ssh-ip ipv4address 
         k8s node-ip ipv4address 
         vmware datastore datastore_name 
         vmware host host_name 
         vmware performance latency-sensitivity normal 
         vmware performance memory-reservation false 
         vmware performance cpu-reservation false 
         vmware sizing ram-mb ram_size_in_mb 
         vmware sizing cpus cpu_size 
         vmware sizing disk-root-gb disk_root_size_in_gb 
         vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, etcd2 
       k8s node-type etcd 
       k8s ssh-ip ipv4address 
       k8s node-ip ipv4address 
       vmware datastore datastore_name 
       vmware host host_name 
       vmware performance latency-sensitivity normal 
       vmware performance memory-reservation false 
       vmware performance cpu-reservation false 
       vmware sizing ram-mb ram_size_in_mb 
       vmware sizing cpus cpu_size 
       vmware sizing disk-root-gb disk_root_size_in_gb 
       vmware nics network_ID 
     exit 
   exit 
   nodes node_name #For example, etcd3 
       k8s node-type etcd 
       k8s ssh-ip ipv4address 
       k8s node-ip ipv4address 
       vmware datastore datastore_name 
       vmware host host_name 
       vmware performance latency-sensitivity normal 
       vmware performance memory-reservation false 
       vmware performance cpu-reservation false 
       vmware sizing ram-mb ram_size_in_mb 
       vmware sizing cpus cpu_size 
       vmware sizing disk-root-gb disk_root_size_in_gb 
       vmware nics network_ID 
     exit 
   exit 
   nodes node_name #For example, controlplane1 
       k8s node-type control-plane 
       k8s ssh-ip ipv4address 
       k8s node-ip ipv4address 
       vmware datastore datastore_name 
       vmware host host_name 
       vmware performance latency-sensitivity normal 
       vmware performance memory-reservation false 
       vmware performance cpu-reservation false 
       vmware sizing ram-mb ram_size_in_mb 
       vmware sizing cpus cpu_size 
       vmware sizing disk-root-gb disk_root_size_in_gb 
       vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, controlplane2 
      k8s node-type control-plane 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
    exit 
   nodes node_name #For example, controlplane3 
      k8s node-type control-plane 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, oam1 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, oam2 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, oam3 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, session-data1 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels #For example, smi.cisco.com/cdl-ep true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-1 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-2 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-1 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-2 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-3 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-4 true 
      exit 
      k8s node-labelsnode_labels/node_type #For example, smi.cisco.com/node-type db 
      exit 
      k8s node-labelsnode_labels/vm_type #For example, smi.cisco.com/vm-type session 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
    exit 
    nodes node_name #For example, session-data2 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels #For example, smi.cisco.com/cdl-ep true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-1 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-2 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-1 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-2 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-3 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-4 true 
      exit 
      k8s node-labelsnode_labels/node_type #For example, smi.cisco.com/node-type db 
      exit 
      k8s node-labelsnode_labels/vm_type #For example, smi.cisco.com/vm-type session 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
   exit 
   nodes node_name #For example, session-data3 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels #For example, smi.cisco.com/cdl-ep true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-3 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-4 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-5 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-6 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-7 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-8 true 
      exit 
      k8s node-labelsnode_labels/node_type #For example, smi.cisco.com/node-type db 
      exit 
      k8s node-labelsnode_labels/vm_type #For example, smi.cisco.com/vm-type session 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
    exit 
    nodes node_name #For example, session-data4 
      k8s node-type worker 
      k8s ssh-ip ipv4address 
      k8s node-ip ipv4address 
      k8s node-labels node_labels #For example, smi.cisco.com/cdl-ep true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-3 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-index-4 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-5 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-6 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-7 true 
      exit 
      k8s node-labelsnode_labels #For example, smi.cisco.com/cdl-slot-8 true 
      exit 
      k8s node-labelsnode_labels/node_type #For example, smi.cisco.com/node-type db 
      exit 
      k8s node-labelsnode_labels/vm_type #For example, smi.cisco.com/vm-type session 
      exit 
      vmware datastore datastore_name 
      vmware host host_name 
      vmware performance latency-sensitivity normal 
      vmware performance memory-reservation false 
      vmware performance cpu-reservation false 
      vmware sizing ram-mb ram_size_in_mb 
      vmware sizing cpus cpu_size 
      vmware sizing disk-root-gb disk_root_size_in_gb 
      vmware nics network_ID 
      exit 
   exit 
exit 
           # Virtual IPs 
          virtual-ips <name> #Example: rxdiam 

            vrrp-interface <interface_name> 
            vrrp-router-id <router_id> 

            ipv4-addresses <ipv4_address> 
              mask <netmassk> 
              broadcast <broadcast_ipv4_address> 
              device <interface_name> 
            exit 
            # nodes associated with the virtual-ip 
            hosts <node_name> #Example: smi-cluster-core-protocol1 
              priority <priority_value> 
            exit 
            hosts <node_name> #Example: smi-cluster-core-protocol2 
              priority <priority_value> 
            exit 
          exit 
           # Secrets for product registry 
          secrets docker-registry <secret_name> 
            docker-server <server_name or docker_registry> 
            docker-username <username> 
            docker-password <password> 
            docker-email <email> 
            namespace <k8s_namespace> #Example: cee-voice 
          exit 
          ops-centers <app_name> <instance_name> #Example: cee data 
            repository <artifactory_url>  




            username <username> 
            password <password> 

            initial-boot-parameters use-volume-claims <true/false> #True to use persistent volumes and vice versa 
            initial-boot-parameters first-boot-password <password> #First boot password for product opscenter 
            initial-boot-parameters auto-deploy <true/false> #Auto deploys all the services of the product else deploys the opscenter only 
            initial-boot-parameters single-node <true/false> #True for single node and false for multi node deployments 
            initial-boot-parameters image-pull-secrets <docker_registry_secrets_name> 
            exit 
          exit

Important

For clusters managed within the OpenStack environment, you can exclude the initial-boot section of node-defaults configuration parameters. Also, replace the K8s vSphere-volume-provider configuration with K8s openstack-volume-provider configuration.

Triggering the Cluster Synchronization

You can trigger the cluster synchronization to complete the vCenter configuration. To synchronize the cluster, use the following configurations:

Trigger the cluster synchronization.

configure 
     clusters cluster_name actions sync run 
     clusters cluster_name actions sync run debug true 
     clusters cluster_name actions sync logs 
     monitor sync-logs cluster_name 
     clusters cluster_name actions sync status 
     exit

Example:

SMI Cluster Manager# clusters test1 actions sync run 
SMI Cluster Manager# clusters test1 actions sync run debug true 
SMI Cluster Manager# clusters test1 actions sync logs 
SMI Cluster Manager# monitor sync-logs test1 
SMI Cluster Manager# clusters test1 actions sync status

Upgrade the cluster using the following configuration.


configure 
     clusters cluster_name actions sync run upgrade-strategy concurrent 
     clusters cluster_name actions sync run upgrade-startegy concurrent debug true reset-k8s-nodes true 
     clusters cluster_name actions sync run upgrade-strategy rolling 
     clusters cluster_name actions sync run upgrade-strategy rolling debug true reset-k8s-nodes true 
     exit

Example:

SMI Cluster Manager# clusters test1 actions sync run upgrade-strategy concurrent 
SMI Cluster Manager# clusters test1 actions sync run upgrade-startegy concurrent debug true reset-k8s-nodes true 
SMI Cluster Manager# clusters test1 actions sync run upgrade-strategy rolling 
SMI Cluster Manager# clusters test1 actions sync run upgrade-strategy rolling debug true reset-k8s-nodes true

Redeploy the nodes using the following configuration.


configure 
     clusters cluster_name actions sync run upgrade-strategy rolling force-vm-redeploy true debug true 
     clusters cluster_name actions sync run force-vm-redeploy true purge-data-disks true 
     exit

Example:

SMI Cluster Manager# clusters test1 actions sync run upgrade-strategy rolling force-vm-redeploy true debug true 
SMI Cluster Manager# clusters test1 actions sync run force-vm-redeploy true purge-data-disks true

Notes:

clusters cluster_name – Specifies the information about the nodes to be deployed. cluster_name is the name of the cluster.
actions – Specifies the actions performed on the cluster.
sync run – Triggers the cluster synchronization.
sync logs – Shows the current cluster synchronization logs.
sync status – Shows the current status of the cluster synchronization.
debug true – Enters the debug mode.
monitor sync logs – Monitors the cluster synchronization process.
upgrade-strategy concurrent – This strategy is similar to the existing cluster synchronization where everything is upgraded at once.
- reset-k8s-nodes – Resets the K8s on the node instead of deleting and redeploying them all at once.

upgrade-strategy rolling – The rolling upgrade is a new upgrade strategy where upgrades are performed node-by-node.

Note

You can use the rolling upgrade strategy to upgrade only the K8s cluster and product. If there are no changes in the product charts, the upgrade fails. For upgrading one node at a time, see Upgrading Node-by-Node section.

reset-k8s-nodes – Resets the K8s on that specific node instead of redeploying it.

force-vm-redeploy true – Traverses through each node (one at a time) to delete and upgrade the nodes. The redeploying process is similar to a fresh installation of nodes except for the retention of the data directory, which holds information about the previous installation. Redeploying the node involves:
- Making API calls for draining and replacing the VMs.
- Synchronizing (through the Sync API) the node.
- Verifying the cluster and pod status before proceeding to the next node.
purge-data-disks true – Removes the data disks and makes it as new installation. You can use this option for corrupted etcds. For instance, when you have replaced two etcds and ended up with the one having the old data, you can purge the disk and reset the cluster completely.

Upgrading Node-by-Node (OpenStack)

You can upgrade nodes within a K8s cluster one at a time using the node-by-node upgrade process. But this upgrade process is limited only to the node level. You cannot run the cluster synchronization and node-by-node synchronization in tandem. It is possible to run the synchronization on two independent nodes, which has been replaced and are ready to go back to the cluster, simultaneously. Also, you can run parts of the synchronization specific to that node (control plane, etcd, and worker) without running a cluster wide synchronization. You need to drain and synchronize a combination of etcd, control plane and worker nodes to begin with the upgrade process.

To upgrade node-by-node, use the following configurations:

Drain the node.

clusters cluster_name nodes node_name actions sync drain

Example:

SMI Cluster Manager# clusters test1 nodes etcd1 actions sync drain

Verify the status of the drained node.

clusters cluster_name nodes node_name actions sync status

Example:

SMI Cluster Manager# clusters test1 nodes etcd1 actions sync status

Verify the pod status of the drained node.

clusters cluster_name nodes node_name actions k8s pod-status

Example:

SMI Cluster Manager# clusters test1 nodes etcd1 actions k8s pod-status

Run a synchronization on the node.

clusters cluster_name nodes node_name actions sync run

Example:

SMI Cluster Manager# clusters test1 nodes etcd1 actions sync run

Note

The host OS is upgraded during cluster synchronization automatically on Bare Metal deployments.

Verify the cluster status. Ensure that the parameter all-ok is true before proceeding to the next node.

clusters cluster_name actions k8s cluster-status

Example:

SMI Cluster Manager# clusters test1 actions k8s cluster-status 
	pods-desired-count 26 
	pods-ready-count 26 
	pods-desired-are-ready true 
	etcd-healthy true 
	all-ok true

Verify the status of the pods on a specific node before proceeding to the next node.

clusters cluster_name nodes node_name actions k8s pod-status show-pod-details

Example:

SMI Cluster Manager# clusters test1 nodes cmts-worker1 actions k8s pod-status show-pod-details 
	Value for 'show-pod-details' [false,true]: true 
	pods { 
		name calico-node-c65zf 
		namespace kube-system 
		owner-kind DaemonSet 
		owner-name calico-node 
		ready true 
	} 
	pods { 
		name coredns-6db4464669-k6pqz 
		namespace kube-system 
		owner-kind ReplicaSet 
		owner-name coredns-6db4464669 
		ready true 
	} 
	pods { 
		name kube-proxy-tfxcq 
		namespace kube-system 
		owner-kind DaemonSet 
		owner-name kube-proxy 
		ready true 
	} 
	pods { 
		name nginx-ingress-controller-6f8f8c4cc7-q5b7c 
		namespace nginx-ingress 
		owner-kind ReplicaSet 
		owner-name nginx-ingress-controller-6f8f8c4cc7 
		ready true 
	} 
	pods { 
		name keepalived-cmbnf 
		namespace smi-vips 
		owner-kind DaemonSet 
		owner-name keepalived 
		ready true 
	} 
pods-count 5 
pods-available-to-drain-count 2

Notes:

nodes node_name – Specifies the nodes present in the cluster. node_name is the name of the node.
sync drain – Drains or cordons the selected node in preparation for an upgrade.
sync status – Shows the status of the drained node.
k8s pod-status – Shows the status of the k8s pods scheduled on the node.
sync run – Upgrades or synchronizes the node.
k8s cluster-status – Shows an overall status of the cluster including pod and etcd based statistics.
show-pod-details true – Shows the list of pods in addition to the counts.

Deploying and Upgrading the Products in Offline Environments

Using the SMI Cluster Manager, you can download the product's offline TAR ball and the host its charts and corresponding images in the local registries. The SMI Cluster Manager supports the deployment of the product's Ops Center and all the applications and services associated with it. This section describes the procedures involved in deploying and upgrading the products in offline environment using the SMI Cluster Manager.

Prerequisites

The prerequisites for deploying and upgrading the products in offline environments are:

SMI Cluster Manager AIO.
Local repositories that hosts the product offline TAR ball version(s).

Requirements

SMI Cluster Manager (all versions are supported).

Components Used

The following component is used to orchestrate the K8s Cluster and load the products:

The SMI Cluster Manager.

Deploying the Products in Offline Environments

This section describes the procedures involved in deploying the products in offline environment using the SMI Cluster Manager.

Note

From the SMI Cluster Manager perspective, the product refers to Common Execution Environment (CEE). The deployment procedure mentioned in the subsequent section is specific to CEE. However, you can follow the same procedure to deploy 5G Network Functions (SMF or PCF) using the SMI Cluster Manager.

Deploying the Product

To deploy the product, perform the following:

Use the following configuration to install the product

configure 
  software cnf software_name 
    url HTTP_HTTPS_File_URL 
    user username 
    password password 
    sha256 sha256_hash 
    exit

Link the CEE into the desired cluster in the ops-centers section.

configure 
  clusters cluster_name ops-center app_name instance_name 
  repository-local cnf_repo 
  exit

Download the TAR ball from the URL.

software-packages download URL

Example:

SMI Cluster Manager# software-packages download http://<ipv4address>:<port_number>/packages/cee-2019-08-21.tar

Verify whether the TAR balls are loaded.

software-packages list

Example:

SMI Cluster Manager# software-packages list 
[ cee-2019-08-21 ] 
[ sample ]

Configure the necessary Ops Center parameters in the required cluster to deploy the product.

configure 
   cluster cluster_name 
      ops-centers app_name instance_name 
         repository url 
         netconf-ip ipv4_address 
         netconf-port port 
         ssh-ip ipv4_address 
         ssh-port port 
         ingress-hostname  <ipv4_address>.<customer_specific_domain_name> 
         initial-boot-parameters use-volume-claims true/false 
         initial-boot-parameters first-boot-password password 
         initial-boot-parameters auto-deploy true/false 
         initial-boot-parameters single-node true/false 
         initial-boot-parameters image-pull-secrets 
         exit 
   exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# clusters test2 
SMI Cluster Manager(config-clusters-test2)# ops-centers cee data 
SMI Cluster Manager(config-ops-centers-cee/data)# repository http://charts.<ipv4address>.<domain_name>/cee-2019-08-21/ 
SMI Cluster Manager(config-ops-centers-cee/data)# initial-boot-parameters use-volume-claims false 
SMI Cluster Manager(config-ops-centers-cee/data)# initial-boot-parameters first-boot-password Cisco@123 
SMI Cluster Manager(config-ops-centers-cee/data)# initial-boot-parameters auto-deploy true 
SMI Cluster Manager(config-ops-centers-cee/data)# initial-boot-parameters single-node false 
SMI Cluster Manager(config-ops-centers-cee/data)# exit 
SMI Cluster Manager(config-clusters-test2)# exit 
SMI Cluster Manager(config)#

Configure the secrets, if your local registry contains secrets.

configure 
   cluster cluster_name 
      secrets docker-registry secret_name 
         docker-server server_name 
         docker-username username 
         docker-password password 
         docker-email email 
         namespace k8s namespace 
         commit 
         exit 
      exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# clusters test2 
SMI Cluster Manager(config-clusters-test2)# secrets docker-registry sec1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-server serv1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-username user1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-password Cisco@123 
SMI Cluster Manager(config-docker-registry-sec1)# docker-email reg@cisco.com 
SMI Cluster Manager(config-docker-registry-sec1)# namespace ns1 
SMI Cluster Manager(config-docker-registry-sec1)# exit 
SMI Cluster Manager(config-clusters-test2)# exit 
SMI Cluster Manager(config)#

Run the cluster synchronization.

clusters cluster_name actions sync run

Example:

SMI Cluster Manager# clusters test2 actions sync run

Notes:

software-packages download url – Specifies the software packages to be downloaded through HTTP/HTTPS.
software-packages list – Specifies the list of available software packages.
ops-centers app_name instance_name – Specifies the product Ops Center and instance. app_name is the application name. instance_name is the name of the instance.
repository url - Specifies the local registry URL for downloading the charts.
netconf-ip ipv4_address – Specifies the Ops Center netconf IPv4 address.
netconf-port port – Specifies the Ops Center netconf port number.
ssh-ip ipv4_address – Specifies the SSH IPv4 address for the Ops Center.
ssh-port port - Specifies the SSH port number for the Ops Center.
ingress-hostname <ipv4_address>.<customer_specific_domain_name> – Specifies the ingress hostname to be set to the Ops Center. <customer_specific_domain_name> specifies the customer's domain name.
initial-boot-parameters – Specifies the initial boot parameters for deploying the helm charts.
- use-volume-claims true/false – Specifies the usage of persistent volumes. Set this option to True to use persistent volumes. The default value is true.
- first-boot-password password – Specifies the first boot password for the product's Ops Center.
- auto-deploy true/false – Auto deploys all the services of the product. Set this option to false to deploy only the product's Ops Center.
- single-node true/false – Specifies the product deployment on a single node. Set this option to false for multi node deployments.
- image-pull-secrets – Specifies the docker registry secret name to be used.
secrets docker-registry secret_name – Specifies the secret name for your docker registry.
- docker-server server_name – Specifies the docker server name.
- docker-username username – Specifies the docker registry user name.
- docker-password password – Specifies the docker registry password.
- docker-email email – Specifies the docker registry email.
- namespace namespace – Specifies the docker registry namespace.

NOTES:

software cnf software_name - Specifies the Cisco's Cloud Native software. software_name is the name of the Cloud Native software.
- url HTTP_HTTPS_File_URL - Specifies the repository URL.
- user username - Specifies the username for HTTP/HTTPS authentication.
- password password - Specifies the password used for downloading the software package.
- sha256 sha256_hash - Specifies the SHA256 hash of the software download.

To deploy the product, perform the following:

Login to the SMI Cluster Manager CLI using the ingress URL.
```
https://cli.smi-cluster-manager.<IP_address>.<customer_specific_domain_name> 
```
NOTES:
- customer_specific_domain_name - Specifies the customer's domain name.
Use the following configuration to install the CEE
```
configure 
  software cnf software_name 
    url HTTP_HTTPS_File_URL 
    user username 
    password password 
    sha256 sha256_hash 
    exit 
```
NOTES:
- software cnf software_name - Specifies the Cisco's Cloud Native software. software_name is the name of the Cloud Native software.
  - url HTTP_HTTPS_File_URL - Specifies the repository URL.
  - user username - Specifies the username for HTTP/HTTPS authentication.
  - password password - Specifies the password used for downloading the software package.
  - sha256 sha256_hash - Specifies the SHA256 hash of the software download.
Link the product (CEE or Network Functions) into the desired cluster in the ops-centers .
```
configure 
  clusters cluster_name ops-center app_name instance_name 
  repository-local cnf_repo 
  exit 
```
NOTES:
- repository-local cnf_repo - Specifies the CNF repository.
Run the cluster synchronization to deploy the CEE Ops Center and wait for the synchronization to complete.
```
clusters cluster_name actions sync run 
```
NOTES:
- clusters cluster_name actions sync run - Synchronizes the committed changes to the cluster.
Verify the cluster synchronization through cluster sync status or log commands.
```
clusters cluster_name actions sync status 
clusters cluster_name actions sync logs 
```
NOTES:
- clusters cluster_name actions sync status - Displays the status of the cluster synchronization.
- clusters cluster_name actions sync logs - Displays the logs generated during the cluster synchronization process.

Verifying the Product Deployment

You can verify the status of the product deployment through the product CLI. To verify, use the following commands:

Log in to the SMI product CLI. For example, CEE.

Verify whether the charts are loaded in the specific instance (verify the namespace).

show helm charts

Example:

cee# show helm charts 
CHART         INSTANCE                          STATUS    VERSION     REVISION RELEASE                                   NAMESPACE 
------------------------------------------------------------------------------------------------------------------------------------------- 
cee-ops-center         cee-global-ops-center             deployed  2023.02.1.d249  1   0.7.0-2023-02-1-0513-230331051211-dec612f  cee-global 
cnat-monitoring        cee-global-cnat-monitoring        deployed  2023.02.1.d249  1   0.7.0-2023-02-1-0031-230331183330-58ec41c  cee-global 
product-documentation  cee-global-product-documentation  deployed  2023.02.1.d249  1   0.8.0-2023-02-1-0131-230321085503-2699cb5  cee-global 
pv-manager             cee-global-pv-manager             deployed  2023.02.1.d249  1   0.3.0-2023-02-1-0029-230320155437-e484272  cee-global 
smi-autoheal           cee-global-smi-autoheal           deployed  2023.02.1.d249  1   0.2.0-2023-02-1-0030-230330084451-99684bf  cee-global 
smi-show-tac           cee-global-smi-show-tac           deployed  2023.02.1.d249  1   0.4.0-2023-02-1-0189-230331050005-81130f1  cee-global 
storage-provisioner    cee-global-storage-provisioner    deployed  2023.02.1.d249  1   0.3.0-2023-02-1-0120-230320160505-1597fdb  cee-global 
telegraf-monitoring    cee-global-telegraf-monitoring    deployed  2023.02.1.d249  1   0.1.0-2023-02-1-0048-230330084426-9b02da0  cee-global

Verify the status of the system.

show system status

Example:

cee# show system status 
system status deployed true 
system status percent-ready 100.0

Notes:

show helm charts – Displays the helm release details.
show system status – Displays the status of the system.

Upgrading the Products in Offline Environments

Using the SMI Cluster Manager, you can upgrade the product's Ops Center and all the applications and services associated with it. This section describes the procedures involved in upgrading the products in offline environment using the SMI Cluster Manager.

Deploying the Helm Charts in Sequential Order

Table 9. Feature History

Feature Name

Release Information

Feature Description

Sequential Deployment of Helm Charts in Ops-Center

2024.02.1

This feature provides NF users the flexibility to manage the deployment of helm charts by introducing chart priority groups.

It allows users to control the order in which the helm charts are deployed, ensuring that pods related to high-priority charts are available before deploying the lower-priority charts.

This feature introduces the following new CLI commands:

helm ordered_deployment enable true —Use this command to enable sequential deployment.
helm ordered_deployment chart chart_name priority_group priority_id —Use this command to order the helm charts.

Prior to executing this command, you must enable sequential deployment.

Important

When you configure this feature through NSO, you will observe that helm charts are removed from the device. This issue happens as the chart names that are loaded in Ops-center are not synced to NSO, thereby causing an illegal reference error.

It is recommended that you sync the configuration from the device (using sync-from CLI) before initiating the ordered deployment configuration.

SMI supports the sequential helm chart deployment feature in the base ops-center through op-center configuration. NFs should use the specific ops-center branch and a minimum version of 7.71400.7 as the ops-center-yang-compiler.

To deploy the helm charts in a sequential manner, use the following steps:

Before upgrading NF product through cluster-sync, disable the config sync at NF ops-center.

[tb2-smi-blr-s1-c1/global] cee# system synch stop 
Wed Mar  27 04:05:54.301 UTC+00:00
[tb2-smi-blr-s1-c1/global] cee#
Message from confd-api-manager at 2024-03-27 04:05:57...
Synchronization process is not running - config changes are not processed.  Enable synch with: system synch start
[tb2-smi-blr-s1-c1/global] cee#

Note

You are required to perform config sync stop or start operation only when

the sequential deployment feature is not available in the existing ops-center.
the helm charts are added or deleted in the new NF software.

Upgrade the NF software through cluster sync. Note that the cluster sync will upgrade only ops-center. Rest of the charts will not be upgraded due to config sync stop.

[tb2-smi-blr-s1-c1/global] cee# show helm charts

Wed Mar  27 04:08:14.528 UTC+00:00

CHART          INSTANCE         STATUS    VERSION   REVISION  RELEASE      NAMESPACE
------------------------------------------------------------------------------------------------------
cee-ops-center    cee-global-ops-center      deployed  2024.02.1.d181 10  0.7.0-2024-02-1 cee-global
cnat-monitoring   cee-global-cnat-monitoring deployed  2024.02.1.i05  9   0.7.0-2024-02-1 cee-global
product-documentation cee-global-product-documentation deployed 2024.02.1.i05  9  0.8.0-2024-02-1 cee-global
pv-manager        cee-global-pv-manager      deployed  2024.02.1.i05  10  0.3.0-2024-02-1 cee-global
smi-autoheal      cee-global-smi-autoheal    deployed  2024.02.1.i05  10  0.2.0-2024-02-1 cee-global
smi-show-tac      cee-global-smi-show-tac    deployed  2024.02.1.i05  9   0.4.0-2024-02-1 cee-global
storage-provisioner cee-global-storage-provisioner deployed 2024.02.1.i05  9  0.3.0-2024-02-1 cee-global
telegraf-monitoring cee-global-telegraf-monitoring deployed 2024.02.1.i05  9  0.1.0-2024-02-1 cee-global

Configure the parameters that are used to sequentialize the deployment of helm charts.

[tb2-smi-blr-s1-c1/global] cee(config)# helm ordered_deployment
Possible completions:
chart                Chart specific deployment configuration
continue_on_timeout  Continue/Stop after Chart bringup timeout
enable               Enable/Disable Ordered Helm Chart Deployment On Upgrade
timeout              Wait interval (in seconds) for Chart bringup (all pods Ready)

[tb2-smi-blr-s1-c1/global] cee(config)# helm ordered_deployment enable true
Wed Mar  27 04:23:10.852 UTC+00:00

[tb2-smi-blr-s1-c1/global] cee(config)# helm ordered_deployment chart
Possible completions:
cnat-monitoring kvm-metrics      product-documentation      pv-manager      smi-autoheal      smi-logs-forwarder      smi-show-tac      storage-provisioner      telegraf-monitoring

[tb2-smi-blr-s1-c1/global] cee(config)# helm ordered_deployment chart cnat-monitoring priority_group 20
Wed Mar  27 04:24:10.852 UTC+00:00
[tb2-smi-blr-s1-c1/global] cee(config)# helm ordered_deployment chart storage-provisioner priority_group 10

NOTES:

helm ordered_deployment enable { false | true } : Enable or disable the sequential deployment of helm charts. Default: false

helm ordered_deployment chart chart_name priority_group priority_id : Specify the chart name and the chart priority group ID. The priority group ID decides the helm chart deployment order. The priority group configuration is specific to the helm charts and not the sub-charts.

The priority_id is an integer ranging from 1 to 99 with 1 being the highest priority and 99 being the lowest priority. Default: 99

The chart assigned the priority group ID of 1 gets deployed first before the other charts assigned the priority group ID of 2,3,4, and so on. Note that the other charts will not be deployed until all the pods associated with the first chart are deployed. Charts with the same priority group will be deployed simultaneously.

Note

You must use helm ordered_deployment enable true command before configuring helm ordered_deployment chart name priority_group priority_id command.

If the ordered_deployment is not enabled or if the priority_group is not defined for charts, NF follows the default deployment that is bringing up all the charts in parallel.

helm ordered_deployment continue_on_timeout : Set this configuration to true or false to continue or stop the helm chart deployment after the chart bring-up timeout. Default: true
helm ordered_deployment timeout time_interval : Specify the timeout interval, in seconds, until then the chart waits for all its pods to get deployed.

The time_interval is an integer ranging from 0 to 3600. Default: 600

If any failure is encountered in higher priority group chart deployment, the deployment stops or continues depending on the continue_after_timeout parameter.
If the sequential deployment feature is not available in any of the ops-centers, enable this feature and perform the NF software upgrade.
If there are any new helm charts available through the NF software upgrade, update the ordered deployment configuration parameters for the new charts. Similarly, if there are any charts to be removed as part of software upgrade, make sure to first remove the chart from ordered-deployment configuration and then proceed with the software upgrade through cluster sync.

Enable the config sync. It will upgrade the chart as per the ordered deployment configuration.

[tb2-smi-blr-s1-c1/global] cee# system synch start
Wed Mar  27 04:23:10.852 UTC+00:00
[tb2-smi-blr-s1-c1/global] cee#

[tb2-smi-blr-s1-c1/global] cee# show helm charts

Wed Mar  27 04:24:54.336 UTC+00:00

CHART          INSTANCE         STATUS    VERSION   REVISION  RELEASE      NAMESPACE
------------------------------------------------------------------------------------------------------
cee-ops-center    cee-global-ops-center      deployed  2024.02.1.d181 10  0.7.0-2024-02-1 cee-global
cnat-monitoring   cee-global-cnat-monitoring deployed  2024.02.1.i05  9   0.7.0-2024-02-1 cee-global
product-documentation cee-global-product-documentation deployed 2024.02.1.i05  9  0.8.0-2024-02-1 cee-global
pv-manager        cee-global-pv-manager      deployed  2024.02.1.i05  10  0.3.0-2024-02-1 cee-global
smi-autoheal      cee-global-smi-autoheal    deployed  2024.02.1.i05  10  0.2.0-2024-02-1 cee-global
smi-show-tac      cee-global-smi-show-tac    deployed  2024.02.1.i05  9   0.4.0-2024-02-1 cee-global
storage-provisioner cee-global-storage-provisioner deployed 2024.02.1.i05  9  0.3.0-2024-02-1 cee-global
telegraf-monitoring cee-global-telegraf-monitoring deployed 2024.02.1.i05  9  0.1.0-2024-02-1 cee-global

[tb2-smi-blr-s1-c1/global] cee#

cloud-user@tb2-smi-blr-s1-c1-master1:~$ helm ls -n cee-global

NAME         NAMESPACE     REVISION  UPDATED               STATUS     CHART                           APP VERSION
cee-global-cnat-monitoring  cee-global   10  2024-03-27 04:24:17.178670318 +0000 UTC   deployed  cnat-monitoring-0.7.0-2024-02-1-0032-240321173925-2c8a968   2024.02.1.d181
cee-global-ops-center       cee-global   10  2024-03-27 04:07:39.542317449 +0000 UTC   deployed  cee-ops-center-0.7.0-2024-02-1-0547-240326113917-f8d2d25    2024.02.1.d181
cee-global-product-documentation cee-global  10  2024-03-27 04:24:33.537482349 +0000 UTC  deployed product-documentation-0.8.0-2024-02-1-0149-240216180844-dd04f43 2024.02.1.d181
cee-global-pv-manager       cee-global  11    2024-03-27 04:24:33.491414622 +0000 UTC   deployed  pv-manager-0.3.0-2024-02-1-0036-240216180715-0677f38       2024.02.1.d181
cee-global-smi-autoheal     cee-global  11    2024-03-27 04:24:33.495848154 +0000 UTC   deployed  smi-autoheal-0.2.0-2024-02-1-0040-240216180748-13dd903     2024.02.1.d181
cee-global-smi-show-tac     cee-global  10    2024-03-27 04:24:33.550657024 +0000 UTC   deployed  smi-show-tac-0.4.0-2024-02-1-0207-240321173848-84f417c     2024.02.1.d181
cee-global-storage-provisioner cee-global  10 2024-03-27 04:24:11.514250682 +0000 UTC   deployed  storage-provisioner-0.3.0-2024-02-1-0127-240216180827-d0e9159 2024.02.1.d181
cee-global-telegraf-monitoring cee-global  10 2024-03-27 04:24:33.494447625 +0000 UTC   deployed  telegraf-monitoring-0.1.0-2024-02-1-0064-240216180755-97cf3c1 2024

See the updated timestamp to confirm the sequential deployment of charts.

Upgrading the Product

To upgrade the product, perform the following:

Download the latest TAR ball from the URL.

software-packages download URL

Example:

SMI Cluster Manager# software-packages download http://<ipv4address>:<port_number>/packages/cee-2019-08-21.tar

Verify whether the TAR balls are loaded.

software-packages list

Example:

SMI Cluster Manager# software-packages list 
[ cee-2019-08-21 ] 
[ sample ]

Update the repository URL to point the correct product chart release to upgrade the product.

configure 
   cluster cluster_name 
      ops-centers app_name instance_name 
         repository url 
         exit 
      exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# clusters test2 
SMI Cluster Manager(config-clusters-test2)# ops-centers cee data 
SMI Cluster Manager(config-ops-centers-cee/data)# repository http://charts.<ipv4address>.<domain_name>/cee-2019-08-21/ 
SMI Cluster Manager(config-ops-centers-cee/data)# exit 
SMI Cluster Manager(config-clusters-test2)# exit

Configure the secrets, if your local registry contains secrets.

configure 
   cluster cluster_name 
      secrets docker-registry secret_name 
         docker-server server_name 
         docker-username username 
         docker-password password 
         docker-email email 
         namespace k8s namespace 
         commit 
         exit 
      exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# clusters test2 
SMI Cluster Manager(config-clusters-test2)# secrets docker-registry sec1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-server serv1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-username user1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-password Cisco@123 
SMI Cluster Manager(config-docker-registry-sec1)# docker-email reg@cisco.com 
SMI Cluster Manager(config-docker-registry-sec1)# namespace ns1 
SMI Cluster Manager(config-docker-registry-sec1)# exit 
SMI Cluster Manager(config-clusters-test2)# exit 
SMI Cluster Manager(config)#

Run the cluster synchronization.

clusters cluster_name actions sync run

Example:

SMI Cluster Manager# clusters test2 actions sync run

Verifying the Product Upgrade

You can verify the status of the product upgrade through the product CLI. To verify, use the following commands:

Log in to the SMI product CLI. For example, CEE.

Verify whether the charts are loaded in the specific instance (verify the namespace).

show helm charts

Example:

cee# show helm charts 
CHART         INSTANCE                          STATUS    VERSION     REVISION RELEASE                                   NAMESPACE 
------------------------------------------------------------------------------------------------------------------------------------------- 
cee-ops-center         cee-global-ops-center             deployed  2023.02.1.d249  1   0.7.0-2023-02-1-0513-230331051211-dec612f  cee-global 
cnat-monitoring        cee-global-cnat-monitoring        deployed  2023.02.1.d249  1   0.7.0-2023-02-1-0031-230331183330-58ec41c  cee-global 
product-documentation  cee-global-product-documentation  deployed  2023.02.1.d249  1   0.8.0-2023-02-1-0131-230321085503-2699cb5  cee-global 
pv-manager             cee-global-pv-manager             deployed  2023.02.1.d249  1   0.3.0-2023-02-1-0029-230320155437-e484272  cee-global 
smi-autoheal           cee-global-smi-autoheal           deployed  2023.02.1.d249  1   0.2.0-2023-02-1-0030-230330084451-99684bf  cee-global 
smi-show-tac           cee-global-smi-show-tac           deployed  2023.02.1.d249  1   0.4.0-2023-02-1-0189-230331050005-81130f1  cee-global 
storage-provisioner    cee-global-storage-provisioner    deployed  2023.02.1.d249  1   0.3.0-2023-02-1-0120-230320160505-1597fdb  cee-global 
telegraf-monitoring    cee-global-telegraf-monitoring    deployed  2023.02.1.d249  1   0.1.0-2023-02-1-0048-230330084426-9b02da0  cee-global

Verify the status of the system.

show system status

Example:

cee# show system status 
system status deployed true 
system status percent-ready 100.0

Notes:

show helm charts – Displays the helm release details.
show system status – Displays the status of the system.

Rollback to an Earlier Version

To rollback to an earlier version of the product, perform the following:

Update the repository URL to point an earlier version of the product chart release to rollback the product.

configure 
   cluster cluster_name 
      ops-centers app_name instance 
         repository url 
         exit 
      exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# clusters test2 
SMI Cluster Manager(config)# ops-centers cee data 
SMI Cluster Manager(config-clusters-test2)# ops-centers cee data 
SMI Cluster Manager(config-ops-centers-cee/data)# repository http://charts.<ipv4address>.<domain_name>/cee-2019-07-22/ 
SMI Cluster Manager(config-ops-centers-cee/data)# exit 
SMI Cluster Manager(config-clusters-test2)# exit

Configure the secrets, if your local registry contains secrets.

configure 
   cluster cluster_name 
      secrets docker-registry secret_name 
         docker-server server_name 
         docker-username username 
         docker-password password 
         docker-email email 
         namespace k8s namespace 
         commit 
         exit 
      exit

Example:

SMI Cluster Manager# config 
SMI Cluster Manager(config)# clusters test2 
SMI Cluster Manager(config-clusters-test2)# secrets docker-registry sec1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-server serv1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-username user1 
SMI Cluster Manager(config-docker-registry-sec1)# docker-password Cisco@123 
SMI Cluster Manager(config-docker-registry-sec1)# docker-email reg@cisco.com 
SMI Cluster Manager(config-docker-registry-sec1)# namespace ns1 
SMI Cluster Manager(config-docker-registry-sec1)# exit 
SMI Cluster Manager(config-clusters-test2)# exit 
SMI Cluster Manager(config)#

Run the cluster synchronization.

clusters cluster_name actions sync run

Example:

SMI Cluster Manager# clusters test2 actions sync run

Viewing Pod Details

You can view the details of the current pods through CEE Ops Center. To view the pod details, use the following command (in CEE Ops Center CLI):

cluster pods instance_name pod_name detail

Note

cluster – Specifies the K8s cluster.
pods – Specifies the current pods in the cluster.
instance_name – Specifies the name of the instance.
pod_name – Specifies the name of the pod.
detail – Displays the details of the specified pod.

The following example displays the details of the pod named alertmanager-0 in the cee-data instance.

Example:

cee# cluster pods cee-data alertmanager-0 detail
details apiVersion: "v1"
kind: "Pod"
metadata:
  annotations:
    alermanager.io/scrape: "true"
    cni.projectcalico.org/podIP: "192.168.1.137/32"
    config-hash: "5532425ef5fd02add051cb759730047390b1bce51da862d13597dbb38dfbde86"
  creationTimestamp: "2020-02-26T06:09:13Z"
  generateName: "alertmanager-"
  labels:
    component: "alertmanager"
    controller-revision-hash: "alertmanager-67cdb95f8b"
    statefulset.kubernetes.io/pod-name: "alertmanager-0"
  name: "alertmanager-0"
  namespace: "cee"
  ownerReferences:
  - apiVersion: "apps/v1"
    kind: "StatefulSet"
    blockOwnerDeletion: true
    controller: true
    name: "alertmanager"
    uid: "82a11da4-585e-11ea-bc06-0050569ca70e"
  resourceVersion: "1654031"
  selfLink: "/api/v1/namespaces/cee/pods/alertmanager-0"
  uid: "82aee5d0-585e-11ea-bc06-0050569ca70e"
spec:
  containers:
  - args:
    - "/alertmanager/alertmanager"
    - "--config.file=/etc/alertmanager/alertmanager.yml"
    - "--storage.path=/alertmanager/data"
    - "--cluster.advertise-address=$(POD_IP):6783"
    env:
    - name: "POD_IP"
      valueFrom:
        fieldRef:
          apiVersion: "v1"
          fieldPath: "status.podIP"
    image: "<path_to_alert_manager_image>"
    imagePullPolicy: "IfNotPresent"
    name: "alertmanager"
    ports:
    - containerPort: 9093
      name: "web"
      protocol: "TCP"
    resources: {}
    terminationMessagePath: "/dev/termination-log"
    terminationMessagePolicy: "File"
    volumeMounts:
    - mountPath: "/etc/alertmanager/"
      name: "alertmanager-config"
    - mountPath: "/alertmanager/data/"
      name: "alertmanager-store"
    - mountPath: "/var/run/secrets/kubernetes.io/serviceaccount"
      name: "default-token-kbjnx"
      readOnly: true
  dnsPolicy: "ClusterFirst"
  enableServiceLinks: true
  hostname: "alertmanager-0"
  nodeName: "for-smi-cdl-1b-worker94d84de255"
  priority: 0
  restartPolicy: "Always"
  schedulerName: "default-scheduler"
  securityContext:
    fsGroup: 0
    runAsUser: 0
  serviceAccount: "default"
  serviceAccountName: "default"
  subdomain: "alertmanager-service"
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: "NoExecute"
    key: "node-role.kubernetes.io/oam"
    operator: "Equal"
    value: "true"
  - effect: "NoExecute"
    key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    tolerationSeconds: 300
  - effect: "NoExecute"
    key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: "alertmanager"
    name: "alertmanager-config"
  - emptyDir: {}
    name: "alertmanager-store"
  - name: "default-token-kbjnx"
    secret:
      defaultMode: 420
      secretName: "default-token-kbjnx"
status:
  conditions:
  - lastTransitionTime: "2020-02-26T06:09:02Z"
    status: "True"
    type: "Initialized"
  - lastTransitionTime: "2020-02-26T06:09:06Z"
    status: "True"
    type: "Ready"
  - lastTransitionTime: "2020-02-26T06:09:06Z"
    status: "True"
    type: "ContainersReady"
  - lastTransitionTime: "2020-02-26T06:09:13Z"
    status: "True"
    type: "PodScheduled"
  containerStatuses:
  - containerID: "docker://821ed1a272d37e3b4c4c9c1ec69b671a3c3fe6eb4b42108edf44709b9c698ccd"
    image: "<path_to_alert_manager_image>"
    imageID: "docker-pullable:/<path_to_alert_manager_directory>@sha256:c4bf05aa677a050fba9d86586b04383ca089bd784d2cb9e544b0d6b7ea899d9b"
    lastState: {}
    name: "alertmanager"
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2020-02-26T06:09:05Z"
  hostIP: "<host_ipv4address>"
  phase: "Running"
  podIP: "<pod_ipv4address>"
  qosClass: "BestEffort"
  startTime: "2020-02-26T06:09:02Z"
cee#

Configuring the Local NTP Server with Authentication and Tracking

This section describes how to configure the local NTP server with authentication and tracking.

Configuring the Local NTP Server

On a network with multiple systems, it’s always recommended to set up a single system as the NTP server for all the other local systems. The cloud providers follow the same model to run their own NTP pools within their data centers. The benefits of following this model include, reduced load on external connections and remote NTP servers, proper synchronization of the local systems with each other even when the external connection or servers goes down.

You can enable a local server in the configuration file.

Specify the network and subnet from where the connections arrive to enable the local server. In addition, you can create an access list and test it on the server using the following command:
```
accheck  address
```
For instance, you can use the following configuration to allow connections from 192.168.2.0/24 and all of the 10.0.0.0/8 subnet:
```
allow 192.168.2.0/24
allow 10.0.0.0/8
```

Restart the Chrony service for the configuration to take effect as shown in the following sample configuration file.

user1@cluster-manager:~$ vi /etc/chrony/chrony.conf
user1@cluster-manager:~$ sudo vi /etc/chrony/chrony.conf
user1@cluster-manager:~$ sudo systemctl daemon-reload
user1@cluster-manager:~$ sudo systemctl restart chrony 
user1@cluster-manager:~$ sudo systemctl status chrony 
 chrony.service - chrony, an NTP client/server
   Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-11-19 17:54:26 UTC; 10s ago
     Docs: man:chronyd(8)
           man:chronyc(1)
           man:chrony.conf(5)
  Process: 14237 ExecStartPost=/usr/lib/chrony/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 14196 ExecStart=/usr/lib/systemd/scripts/chronyd-starter.sh $DAEMON_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 14233 (chronyd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/chrony.service
           └─14233 /usr/sbin/chronyd

Nov 19 17:54:26 cluster-manager systemd[1]: Starting chrony, an NTP client/server...
Nov 19 17:54:26 cluster-manager chronyd[14233]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 -DEBUG)
Nov 19 17:54:26 cluster-manager chronyd[14233]: Frequency -12.134 +/- 0.024 ppm read from /var/lib/chrony/chrony.drift
Nov 19 17:54:26 cluster-manager systemd[1]: Started chrony, an NTP client/server.
Nov 19 17:54:31 cluster-manager chronyd[14233]: Selected source 171.68.38.65

Verify the connection to the server using the chronyc activity command. Also, the chronyc clients command allows you to view the list of clients connected to the server. In the following example, you can verify the server connection and clients connected to it.

user1@cluster-manager:~$ sudo chronyc activity
200 OK
2 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
0 sources with unknown address
user1@cluster-manager:~$ sudo chronyc clients
Hostname                      NTP   Drop Int IntL Last     Cmd   Drop Int  Last
===============================================================================

Note

There are no clients displayed in the clients list since none of them are configured currently.

Enter the local server in Cluster Manger configuration.

Run the cluster synchronization as follows.

 configure
    node-defaults os ntp servers clock.cisco.com  
    exit

Verify the clients on the local server after all the nodes synchronize successfully.

user1cluster-manager:~$ sudo chronyc clients
Hostname                      NTP   Drop Int IntL Last     Cmd   Drop Int  Last
===============================================================================
192.168.2.109                  11      0   6   -     7       0      0   -     -
192.168.2.110                   9      0   6   -    49       0      0   -     -
192.168.2.111                   8      0   6   -    52       0      0   -     -
192.168.2.107                   4      0   1   -    59       0      0   -     -
192.168.2.108                   4      0   1   -    59       0      0   -     -
192.168.2.106                   4      0   1   -    58       0      0   -     -
192.168.2.51                    4      0   1   -    58       0      0   -     -
192.168.2.53                    4      0   1   -    58       0      0   -     -
192.168.2.52                    4      0   1   -    58       0      0   -     -

Alternatively, you can verify the sources on any of the nodes in the cluster and track the status of the synchronization using the chronyc sources and chronyc tracking commands.

user1@kali-worker3:~$ sudo chronyc sources
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================

192.168.2.56                  2   6   377    25    -13us[  -18us] +/- 1106us

user1-cloud@kali-worker3:~$ sudo chronyc tracking
Reference ID : AC161238 (192.168.2.56)
Stratum : 3
Ref time (UTC) : Tue Nov 19 19:07:07 2019
System time : 0.000000037 seconds slow of NTP time
Last offset : +0.000035999 seconds
RMS offset : 0.000020778 seconds
Frequency : 13.682 ppm slow
Residual freq : +0.084 ppm
Skew : 0.228 ppm
Root delay : 0.001795322 seconds
Root dispersion : 0.000163470 seconds
Update interval : 64.2 seconds
Leap status : Normal

Configuring the Local NTP Server with Authentication

Generally, all cloud providers set up authentication for their local servers. To verify the authentication status of the local servers, use the chronyc command as follows:

user1@kali-worker3:~$ sudo chronyc ntpdata 192.168.2.56 | grep Authenticated
Authenticated   : No

To secure your local servers:

Select the key-id and n-bit Secure Hash Algorithm (SHA) key.

The following example shows the default key generation.

user1@cluster-manager:~$ sudo chronyc keygen
1 SHA1 HEX:959623F106595B9E75BE328C265CA9C86560D88E

The following example shows the key generation with key-id 27 and 512 bit SHA key.

user1@cluster-manager:~$ sudo chronyc keygen 27 SHA512 512
27 SHA512 HEX:80E68E6AEB1B994217282568AF2A0EA8E4731F6CDC5CC5635C799676864BD68B4317FA897B54F10DCFE8F5F36
7E03626ACD0A5048BAA8E1A615A44C4FCF731B3

Add the keys to the /etc/chrony/chrony.keys file to configure the authentication.

Restart the Chrony as follows.

user1@cluster-manager:~$ sudo systemctl daemon-reload
user1-cloud@cluster-manager:~$ sudo systemctl restart chrony 
user1-cloud@cluster-manager:~$ sudo systemctl status chrony
● chrony.service - chrony, an NTP client/server
   Loaded: loaded (/lib/systemd/system/chrony.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-11-19 19:29:08 UTC; 8s ago
     Docs: man:chronyd(8)
           man:chronyc(1)
           man:chrony.conf(5)
  Process: 20452 ExecStartPost=/usr/lib/chrony/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
  Process: 20406 ExecStart=/usr/lib/systemd/scripts/chronyd-starter.sh $DAEMON_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 20445 (chronyd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/chrony.service
           └─20445 /usr/sbin/chronyd

Nov 19 19:29:08 cluster-manager systemd[1]: Starting chrony, an NTP client/server...
Nov 19 19:29:08 cluster-manager chronyd[20445]: chronyd version 3.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SECHASH +SIGND +ASYNCDNS +IPV6 -DEBUG)
Nov 19 19:29:08 cluster-manager chronyd[20445]: Frequency -12.095 +/- 0.044 ppm read from /var/lib/chrony/chrony.drift
Nov 19 19:29:08 cluster-manager systemd[1]: Started chrony, an NTP client/server.
Nov 19 19:29:13 cluster-manager chronyd[20445]: Selected source 171.68.38.65

Update the NTP configuration in the cluster manager.

Run the cluster synchronization as follows.

node-defaults os ntp servers 192.168.2.56
  key-id   27
  sha-type SHA512
  sha-key  80E68E6AEB1B994217282568AF2A0EA8E4731F6CDC5CC5635C799676864BD68B4317FA897B54F10DCFE8F5F36
7E03626ACD0A5048BAA8E1A615A44C4FCF731B3
 exit

Verify the authentication status on all the nodes connected to the local server after synchronization.
```
user1@kali-worker3:~$ sudo chronyc ntpdata 192.168.2.56 | grep Authenticated
Authenticated   : Yes
```

K8s Certificates Auto-Renewal

Certificate Management with Kubeadm

In kubeadm v1.21.0, client certificates generated by kubeadm expire after 1 year. The root certificates expires in 10 years. This feature enables monitoring and automatic renewal of kubeadm certificates before the expiry date from the CM or CEE. The CEE triggers an alert to notify the user of any certificate that is going to expire in 30 days.

The smi-cluster-maintainer pod monitors the k8s certificates and automate the renewal process, regardless of the cluster sync.

How it Works

This section describes the sequence of operation for the feature.

The certificates in CM managed K8s clusters, control planes, workers, and external ETCD nodes is checked every 12 hours.
If any certificate is expiring in 60 days on the nodes, then the auto-renew process is triggered.
- If the renewal is successful, then the following checks shows all the certificates as valid.
- If the renewal is unsuccessful, then the auto-renew process is re-initiated for the next cycle or iteration of validating the certificates.
If any certificate is expiring in 30 days on the nodes, then the auto-renew process is triggered along with sending an alert to the user.

In such cases, a manual intervention might be required to renew the certificates, which are nearing their expiry date.

The kubernetes certificate expiry alert is show below.

Rules:
- Alert: kube_certificate_expiring
  - Annotations:
    - Type: Kubernetes Certificate Expiring Alarm
    - Summary: "Kubernetes certificate {{ $labels.cert_path }} on host: {{ $labels.node_name }} is expiring in {{ $labels.days_to_expiry }} days."
  - Expression:
```
 | 
```
```
      kube_certificate_expiring != 0 
```
  - Labels:
    - Severity: critical

Note

The certificate auto-renewal process must restart the api-server. You might experience a temporary k8s API downtime during the certificate auto-renewal process.

Ops Center Converged Core Naming Convention Support

Feature Description

The SMF Ops Center name has changed to 'cn' from 'SMF' to accommodate the Ops Centers for SMF, cnSGW, or a combination of both. This name change breaks backward compatibility for the pods with earlier versions in relation to namespaces and IMS nodes.

This feature provides a new optional CLI configuration 'app-name-override' which ensures backward compatibility. The package name is derived by using this new field and is used in the URL to download the correct package. The previous method to derive namespaces is used to ensure that the existing namespaces remain unchanged.

Configuring the Ops Center Converged Core Naming Convention Support

This section describes a sample CLI command configuration for configuring the feature.

ops-centers smf data
  app-name-override       cn
  repository-local        smfi25
  sync-default-repository true
  netconf-ip              209.165.200.224
  netconf-port            2026
  ssh-ip                  209.165.200.224
  ssh-port                2028
  ingress-hostname        209.165.200.224.nip.io
  initial-boot-parameters use-volume-claims false
  initial-boot-parameters first-boot-password $8$5WnH/gUgKtPAPfXdU8CaURcKnaQOAc9imkIBHnjZLM=
  initial-boot-parameters auto-deploy true
  initial-boot-parameters single-node true
exit

NOTES:

<app-name> : Mention the application name. In the above code sample, it is smf.
<instance> : Mention the instance name. In the above code sample, it is data.
app-name-override <override_value> : Mention the override value. In the above code sample, it is cn.

To download the ops center packages, the default behavior is to derive the ops center package name as smf-ops-center. In this release, if the app-name-override value is set to 'cn', the new behaviour is to derive the package name as cn-ops-center.

Note

This feature ensures that you keep the existing namespaces, yet use the converged ops center name 'cn' from the override field to ensure backward compatibility for the NF pods, which were deployed in previous releases.

Docker Subnet Override Support

Feature Description

By default, Docker uses the subnet range, 172.17.0.0/16 for container networking. If the same subnet range or an IP address from the range is already being used by some other resource in the same cluster environment, it might lead to a conflict.

This feature enables the user to configure and override the default value for the Docker subnet used by the SMI Cluster Manager (CM) or Inception VM. For the CM, this configuration is set by using the CM Ops-Center, whereas the Inception VM uses the deploy.yaml file to achieve the same configuration.

The deploy.yaml is enhanced to contain additional parameter, configuration with a sub-parameter, docker-address-pools. This YAML file contains a base for the CIDR range to use and a size for the size of the subnet to reserve for the new network.

Configuring the Docker Subnet Override

This section describes the configuration details for the Docker subnet override feature.

Use the following command to configure the Docker subnet override feature.

configuration docker-address-pools pool-name docker_bridge_address_pool_name [ base docker_bridge_subnet | size size ]

base `docker_bridge_subnet`

Specify the docker bridge subnet.

Must be a string in the ipv4-address-and-prefix-length pattern.

-Or-

Must be a string in the ipv6-address-and-prefix-length pattern.

Default Value: 172.17.0.0/16.

pool-name docker_bridge_address_pool_name

Specify the pool name of the docker bridge address pool.

Must be a string.

size size

Specify the size. For example, 16, 24, etc.

Must be an integer in the range of 8-24.

Default Value: 24.

IPSec Support for SMF N4 Interfaces

Feature Summary and Revision History

Summary Data

Applicable Product (s) or Functional Area	KVM-based application deployment support
Applicable Platforms	Bare Metal, OpenStack, VMware
Feature Default Setting	Disabled – Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	UCC SMI Operations Guide

Revision History

Revision Details

Release

Added support for the following functionality:

IPSec Monitoring
Configuring IPSec certificates under strongSwan configuration

2023.01.0

First introduced.

2020.02.2.47

Feature Description

This feature introduces strongSwan, a keying daemon, which uses the Internet Key Exchange (IKE) protocols, IKEv1 and IKEv2, to establish security associations (SA) between two peers in a network. Such an IKE session is denoted as IKE_SA in this chapter. The IKE provides strong authentication for both peers and derives unique cryptographic session keys. Besides authentication and key material, IKE also provides the means to exchange configuration information and to negotiate IPsec SAs, which are often called as CHILD_SAs. IPsec SAs define which network traffic is to be secured and how it has to be encrypted and authenticated.

The strongSwan feature is available as an add-on from the Cluster Manager (CM). Use the CM Ops-Center to configure this add-on. In the current release, the SMI uses strongSwan version 5.9.3.

SMI allows monitoring of IPSec certificates—sends certificate expiry alerts and updates certificate through strongSwan configuration.

Configuration Parameters

In this section, see the description for different configuration parameters available for the strongSwan add-on feature. Use the CM Ops-Center to configure these parameters.

name : Specifies the name of the connection, which can be used for connection specific operations, for example, up or down.
auto { ignore | add | route | start }: Specifies the operation, if any, that should be automatically performed at IPsec startup. The add option loads a connection without starting it, whereas route loads a connection and installs kernel traps. If traffic is detected between the leftsubnet and rightsubnet, a connection is established. The start option loads a connection and brings it up immediately. The ignore option ignores the connection and is the same as deleting a connection from the config file.

The default value is ignore.
keyexchange { ikev1 | ikev2 }: Specifies the method of key exchange and the protocol to use to initialize the connection.
type { tunnel | transport | transport_proxy | passthrough | drop } : Specifies the type of the connection. Currently, the accepted values are tunnel, signifying a host-to-host, host-to-subnet, or subnet-to-subnet tunnel. The transport option signifies a host-to-host transport mode, whereas the transport_proxy option signifies the special Mobile IPv6 transport proxy mode. The passthrough option signifies that no IPsec processing should be done at all and drop signifies that packets must be discarded.
left or right { ip address ip_address | fqdn fqdn | %any | %any4 | %any6 | range | subnet }: Specifies the IP address or FQDN of the participant public-network interface. The value %any for the local endpoint signifies an address to be filled in (by automatic keying) during negotiation. If the local peer initiates the connection setup, then the routing table is queried to determine the correct local IP address. If the local peer is responding to a connection setup, then any IP address that is assigned to a local interface is accepted. The value %any4 restricts address selection to IPv4 addresses and %any6 restricts address selection to IPv6 addresses.
leftsubnet or rightsubnet ip subnet: Specifies the private subnet behind the left participant, expressed as either network or netmask.
leftid or rightid id: Specifies how the left or right participant must be identified for authentication. The default values are left or right or the subject of the certificate configured. It must match the full subject DN or one of the subjectAltName extensions contained in the certificate.
leftsendcert { never | no | ifasked | always | yes }: Defines whether a peer must send a certificate request (CR) payload in order to get a certificate in return.
leftauth or rightauth { pubkey | psk | eap | xauth } : Specifies the authentication method to use locally (left) or require from the remote (right) side. The acceptable values are pubkey for public key encryption (RSA/ECDSA), psk for pre-shared key authentication, eap to use the Extensible Authentication Protocol, and xauth for IKEv1 eXtended Authentication.

Pubkey is the default option.
psk pre-shared key: Specifies the required setting if leftauth or rightauth is configured as psk.
esp { cipher suites | aes128-sha256 } : A comma-separated list of ESP encryption or authentication algorithms is used for the connection, for example, aes128-sha256. The notation is encryption-integrity[-dhgroup][-esnmode]. For IKEv2, multiple algorithms (separated by -) of the same type can be included in a single proposal. IKEv1 only includes the first algorithm in a proposal.

aes128-sha256 is the default option.
ike { cipher suites | aes128-sha256-modp3072 } : A comma-separated list of IKE/ISAKMP SA encryption or authentication algorithms is used, for example, aes128-sha256-modp3072. The notation is encryption-integrity[-prf]-dhgroup. In IKEv2, multiple algorithms and proposals might be included, such as aes128-aes256-sha1-modp3072-modp2048 or 3des-sha1-md5-modp1024.
ikelifetime { time time | 3h }: Specifies how long the keying channel of a connection (ISAKMP or IKE SA) must last before being renegotiated.
lifetime { time time | 1h }: Specifies how long a particular instance of a connection should last, from successful negotiation to expiry.
dpdaction { none | clear | hold | restart } : Specifies the action to be taken when dead peer is detected.

none is the default value.
dpddelay { time time | 30s } : Defines the period time interval with which INFORMATIONAL exchanges are sent to the peer. These are only sent if no other traffic is received.
dpdtimeout { time time | 150s }: Defines the timeout interval after which, all the connections to a peer are deleted in case of inactivity.
inactivity time time: Defines the timeout interval after which, a CHILD_SA is closed if it did not send or receive any traffic.
closeaction { none | clear | hold | restart } : Defines the action to take if the remote peer unexpectedly closes a CHILD_SA (see dpdaction for the description of different options). If the peer uses reauthentication or uniqueids checking, closeaction must not be used, these events might trigger the defined action when it's not desired.
nodes list_of_node_names : Specifies the node names on which IPSec connection must be established.

serverCert server_certificate: Specifies the content of Server certificate in the pem format to be used for this connection.

Note

This keyword is not supported under strongSwan configuration.

serverPrivKey server_private_key : Specifies the content of server private key in the pem format to be used for this connection.

Note

This keyword is not supported under strongSwan configuration.

serverPrivKeyPassphrase passphrase : Specifies the passphrase used to encrypt the server-priv-key value.
server-secret : Pass an existing TLS secret for this connection.

Installing strongSwan

This section describes how to install the strongSwan feature.

Install strongSwan as an Add-on from the CM

Use the following steps to install strongSwan as an add-on from the CM Ops-Center:

Use the following CLI commands to enable the strongSwan add-on:

clusters cluster_name addons strongswan enabled
Set all the strongSwan parameters for connection (refer to the Configuration Parameters section for more details on available parameters).

Trigger the cluster sync operation.

Note

The strongSwan pods run on all the nodes, however traffic is accepted only on those nodes, which are configured by using the "nodes" parameter in the CM Ops-Center. strongSwan does not accept or send any traffic on non-configured nodes.

Configuring IPSec Certificates

To configure IPSec certificates under strongSwan configuration, use the following procedure:

Create TLS associated secret for server and CA certificate.

Note: Create strongSwan-related secrets inside the smi-strongswan namespace.

Example:

[test-cm-controlplane] SMI Cluster Deployer# show running-config clusters secrets ca-cert
clusters test-aio
 secrets ca-cert smi-strongswan 134-ca
  certificate "-----BEGIN CERTIFICATE-----\nMIIDqzCzQubm..................1Ac1L+s4M3ug==\n-----END CERTIFICATE-----\n"
 exit
 secrets ca-cert smi-strongswan 135-ca
  certificate "-----BEGIN CERTIFICATE-----\nMIIFqzCCA5Og..................9XdMDiQANHgf7w\n-----END CERTIFICATE-----\n"
 exit
 secrets ca-cert smi-strongswan ca-1
  certificate "-----BEGIN CERTIFICATE-----\nMIID0TCCArmg..................UNvF0nAmIX0qxg4\n-----END CERTIFICATE-----\n"
 exit
 secrets ca-cert smi-strongswan ca-2
  certificate "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADAN..................tbNDzGAnF29nus=\n-----END PRIVATE KEY-----\n"
 exit
exit

Refer the secrets in strongSwan configuration. The strongSwan configuration shows the available TLS and certificates.

Example:

[test-cm-controlplane] SMI Cluster Deployer# show running-config clusters karan-aio strongswan connections server-secret
clusters test-aio
 strongswan connections a-to-b
  server-secret a-to-b
 exit
exit
 
[test-cm-controlplane] SMI Cluster Deployer# show running-config clusters karan-aio strongswan ca-certs
clusters test-aio
strongswan ca-certs [ 134-ca 135-ca ]
exit

Parallel Node Upgrade with Deployment Zone Strategy

Feature Summary and Revision History

Summary Data

Applicable Product (s) or Functional Area	KVM-based application deployment support K8s-based application deployment support
Applicable Platforms	Bare Metal, OpenStack, VMware
Feature Default Setting	Disabled – Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	UCC SMI Operations Guide

Revision History

Revision Details	Release
First introduced.	2020.02.3.10

Feature Description

The current in-service upgrade strategy only supports upgrading one node at a time. For bigger clusters, more than six nodes, this upgrade strategy leads to longer upgrade periods, which mostly exceed the maintenance window (MW) limits.

This feature enables you to perform parallel upgrades for multiple nodes concurrently for faster in-service upgrades without impacting the availability and replication for any NF.

Architecture

This following images show the high-level design of the group upgrade flow for K8s and KVMs.

How it Works

This section describes how the feature works.

This feature enables users to group servers into upgrade groups, which are similar to availability zones. The nodes in each upgrade group are upgraded in parallel (the maximum number of parallel nodes supported is 15).

The upgrade groups are upgraded in a sequential manner. For example, the control plane groups are not upgraded concurrently with the worker groups, but one at a time.

Requirements and Limitations

Some requirements and limitations associated with this feature are as follows.

If the feature is disabled, the SMI reverts to the previous method of performing consecutive upgrade for the nodes.
If the feature is enabled, the upgrade group configuration becomes mandatory for all nodes.
There must be a majority of control plane or etcd nodes running at all time (for example, two out of three control planes should be always running).
All the worker or KVM nodes should be distributed among the upgrade zone in a manner that ensures the majority of nodes never gets upgraded at the same time.
The upgrade groups feature applies to the control plane, worker, KVM, and etcd node types,but doesn't apply to the CM-HA nodes.
For the K8s clusters, the nodes include the upgrade group name as a new label. This label enables the NFs to use the affinity and anti-affinity rules to achieve proper HA and replication. The NFs can use the upgrade-zone provisioned node label or use custom defined ones to enable the application to align with the affinity rule.

Configuring the Deployment Zone Strategy

This section describes how to configure the upgrade groups for different nodes.

Use the following command to configure the upgrade groups for different nodes.

configuration enable-upgrade-zones true 
   upgrade-zones zone_name 
   exit 
   nodes node_name 
   upgrade-zone zone_name 
   exit

Note

In this release, the zone upgrade strategy is applicable for only the auto option for cluster upgrade-strategy. See the following example configuration:

clusters foo actions sync run upgrade-strategy
Possible completions:
auto concurrent rolling

When upgrade-strategy is set to auto and calculated as rolling, Cluster Manager evaluates the upgrade zone configuration and performs a zone-based upgrade. If the upgrade-strategy is auto and calculated as concurrent, then it performs a concurrent upgrade regardless of the initial configuration.

Configuration Example:

clusters ott-bm1-c1
 
configuration enable-upgrade-zones true
 
upgrade-zones zone1
exit
upgrade-zones zone2
exit
upgrade-zones zone3
exit
 
nodes mm1-controlplane1
upgrade-zone zone1
exit
nodes mm1-controlplane2
upgrade-zone zone2
exit
nodes mm1-controlplane3
upgrade-zone zone3
exit
nodes mm1-etcd1
upgrade-zone zone1
exit
nodes mm1-etcd2
upgrade-zone zone2
exit
nodes mm1-etcd3
upgrade-zone zone3
exit
nodes mm1-worker1
upgrade-zone zone1
exit
nodes mm1-worker2
upgrade-zone zone2
exit
nodes mm1-worker3
upgrade-zone zone3
exit
nodes mm1-worker4
upgrade-zone zone1
exit
nodes mm1-worker5
upgrade-zone zone2
exit
nodes mm1-worker6
upgrade-zone zone3
exit
nodes mm1-worker7
upgrade-zone zone1
exit
commit
end

Path Based Routing for Inception Server

Feature Summary and Revision History

Summary Data

Applicable Product (s) or Functional Area	KVM-based application deployment support K8s-based application deployment support
Applicable Platforms	Bare Metal, OpenStack, VMware
Feature Default Setting	Disabled – Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	UCC SMI Operations Guide

Revision History

Revision Details	Release
First introduced.	2022.02.1

Feature Description

This feature enables the SMI to support path based URL routing for its nginx routing (external traffic) in Inception VM from the traditional host based approach for the following ingress.

cli.smi-deployer.deployer.example.com
restconf.smi-deployer.deployer.example.com

Configuring the Path Based Routing for Inception Server

This section describes how to enable the path based routing for Inception server.

Use the following argument in the deploy script to enable path based ingress for RESTCONF and CLI:

-i or --path-based-ingress

Configuration Example:

./deploy -p 209.165.200.224 -f Passwd@123 -i

After you enable the path based ingress, SSH and RESTCONF of inception server are accessible using the following URLs:


API: https://209.165.200.224/smi-deployer/restconf

If you provide the hostname in "--external-zone-name" along with the path based ingress argument, then the entire hostname is replaced with the provided host name.

Configuration Example:


./deploy -p 209.165.200.224 -f Csco@123 -i --external-zone-name abc.com

After you enable the path based ingress, SSH and RESTCONF of inception server are accessible using the following URLs:

SSH (cli): ssh admin@127.0.0.1 -p 2022

API: https://abc.com/smi-deployer/restconf

CA Signed Certificate for Path-based Ingress

Feature Summary and Revision History

Summary Data

Applicable Product (s) or Functional Area	KVM-based application deployment support K8s-based application deployment support
Applicable Platforms	Bare Metal, OpenStack, VMware
Feature Default Setting	Disabled – Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	UCC SMI Operations Guide

Revision History

Revision Details	Release
Added support for provisioning CA certificates.	2023.01.0
First introduced.	2022.02.1

Feature Description

This feature enables you to configure certificates signed by your own CA or external CA for path-based ingress URLs.

You can provision the certificates used for both REST APIs and K8s APIs through cluster manager and Ops-center. The recommended method to configure a certificate and its corresponding private key is to provision the certificate as a TLS secret using the existing yang container.

Certificate Expiry Check

The provisioned certificates must be monitored for expiry. The kube-certificate-expiring alert is automatically raised in advance to renew and update the certificate and key.

The alerts have the following severity levels:

30 days before expiry—Raise alert with Info severity
20 days before expiry—Raise alert with Major severity
15 days before expiry—Raise alert with Critical severity

Configuring Certificate for Path-based Ingress

This section describes how to configure TLS and CA certificates for path-based ingress.

Configuring TLS Certificate

Use the following procedure from cluster deployer to configure the certificates for path-based ingress.

Create a secret.

Use the following sample configuration to populate a certificate and its corresponding private key. The provided certificate and private key is stored as K8s TLS secret on the cluster under the mentioned namespace.

cluster cluster_name 
   secrets tls namespace secret_name 
      private-key private_key_content 
      certificate certificate_content 
      exit 
   exit

Example:

clusters sample-cluster
  secrets tls cee-global sample-secret
    private-key "$8$9n3U7OLEclVQoDpp/4VqkSLkeSmFbjx/
Mt6eEGN4EWoKPY1r9nqSWSZ40advmhDFsPFQZWfM\nhq/wpRzHXBZGp/
dNtNO+wpaQuxsT3CmkmRKFIHviUn4bEwBKfTCCsw7a5+66q3rm5vX4/nSw\
nNy4DrgTu4iFDzVYVKAYzoxWGzCqhKIaSqELjsW7gchEowC\n
    certificate "-----BEGIN CERTIFICATE-----\nMIID0zCCArugAw
IBAgIUPHTzpMTVUNVDQzJ/FM9tfCsAG2AwDQYJKoZIhvcNAQEL
\nBQAwaDELMAkGA1UEBhMCVVMxCzAJBgNVBAgMAkNBMQsw
\n-----END CERTIFICATE-----\n"
    exit
  exit
exit

Configure path-based ingress secret.

Use the following sample configuration to add the secret name for path-based ingresses.

clusters <cluster_name> 
   ops-centers <opscenter_name> <instance_name> 
      initial-boot-parameters path-based-ingress true 
      initial-boot-parameters path-based-ingress-secret <secret_name> 
      exit 
   exit 
exit

Note

You must set path-based-ingress to true for getting the option to configure path-based-ingress-secret.

Example:

clusters sample-cluster
 ops-centers cee global
  initial-boot-parameters path-based-ingress true
  initial-boot-parameters path-based-ingress-secret sample-secret
  exit
 exit
exit

Run cluster sync to create and configure the secret as well as configure ingress to use the secret.

Verifying the Certificate for Path-based Ingress Configuration

This section describes how to verify the certificate for path-based ingress configuration.

Use the following CLI command to get the ingress in YAML and verify the configured secret name:

kubectl get ing -n <namespace> <ingress-name> -o yaml

Command Output Example:

cloud-user@sample-aio-controlplane:~$ kubectl get ing -n cee-global cli-ingress-cee-global-ops-center -o yaml
 
apiVersion: networking.k8s.io/v1
kind: Ingress
...
spec:
  rules:
  - host: 10.x.x.x
    http:
      paths:
      - backend:
          service:
            name: ops-center-cee-global-ops-center
            port:
              number: 7681
        path: /cee-global/cli
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - 10.x.x.x.
    secretName: sample-secret

Run the curl command to verify the section "Server certificate:" to check whether the certificate is used properly.

cloud-user@satya-aio-controlplane:~$ curl -k -v https://10.x.x.x.nip.io/cee-global/cli
*   Trying 10.x.x.x...
* TCP_NODELAY set
* Connected to 10.x.x.x.nip.io (10.x.x.x) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=CA; L=SF; O=sample-signed.cisco.com; CN=10.x.x.x
*  start date: Jul 12 04:19:56 2022 GMT
*  expire date: Jul 11 04:19:56 2024 GMT
*  issuer: C=US; ST=CA; L=SF; O=sample-signed.cisco.com; CN=10.x.x.x
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x56498909f550)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> GET /cee-global/cli HTTP/2
> Host: 10.x.x.x
> User-Agent: curl/7.58.0
> Accept: */*

Configuring CA Certificate

To configure the CA certificate, use the following configuration in Ops-center:

secrets ca-cert secret_name 
   certificate certificate_content 
   exit

To configure the CA certificate, use the following configuration in cluster-manager:

cluster cluster_name 
   secrets ca-cert namespace secret_name 
      private-key private_key_content 
      certificate certificate_content 
      exit 
   exit

NOTES:

If you add invalid certificate content and expired certificate, you will be prompted to correct the configuration.
CA certificate is stored in generic (Opaque) secret type.
The secrets are monitored and auto-healed if the user deletes the data by mistake.

OnDemand LDAP Connectivity Check

Feature Summary and Revision History

Summary Data

Applicable Product (s) or Functional Area	KVM-based application deployment support K8s-based application deployment support
Applicable Platforms	Bare Metal, OpenStack, VMware
Feature Default Setting	Disabled – Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	UCC CEE Configuration and Administration Guide UCC SMI Operations Guide

Revision History

Revision Details	Release
First introduced.	2022.02.1

Feature Description

The SMI Ops Center provides an external authentication using LDAP support. The LDAP configuration can be configured in the SMI Ops Center using CLI or the RESTCONF APIs.

This feature enables you to validate a new LDAP configuration before adding it to the system or an existing LDAP configuration.

How it Works

This section describes how the feature works.

How to Validate a New Configuration

The steps to validate a new LDAP configuration are as follows.

Login to the SMI Ops Center.

Provide the LDAP new configuration inputs to validate (see the following example ).

[pv/global] cee# smildap validate-security-config validate-new-security-config { ?
Possible completions:
  base-dn                LDAP Base DN
  bind-dn                LDAP Bind DN
  group-attr             Group attribute
  group-mapping          LDAP group to application security mapping
  ldap-filter            LDAP Filter - use %s to sub username
  ldap-server-url        LDAP Server URL (https://tools.ietf.org/html/rfc2255)
  ldap-username-domain   LDAP Username Domain
  password               Password
  username               Existing User name in LDAP server

Validate the LDAP new configuration (see the following example configuration).

cee(config)# smildap validate-security-config validate-new-security-config 
{ base-dn dc=smi-lab,dc=com bind-dn cn=%s,ou=people,dc=smi-lab,dc=com group-attr 
memberOf group-mapping { group admin ldap-group group1 } username user5 password 
Passwd@123 ldap-filter cn=%s ldap-server-url ldap://209.165.200.224 }
Mon Jun  20 05:02:24.635 UTC+00:00
message accept "admin" external-user-group 1117 1117 /tmp

How to Validate an Existing LDAP Configuration

Use the following example configuration to validate an existing LDAP configuration.

cee# smildap validate-security-config validate-current-security-config
 
Mon Jun  20 05:07:41.765 UTC+00:00
 
Value for 'username' (<string>): user5
 
Value for 'password' (<string>): ********
 
message accept "admin" external-user-group 1117 1117 /tmp

Provisioning Local Users

A new YANG model is introduced in SMI to support user management in compliance with Cisco Secure Development Life-cycle (CSDL) requirements.

Note

This new YANG model is applicable to SMI Cluster Manager and all other product Ops Centers.

User Management

This chapter describes how to create and manage local users using the Ops Center CLI (for both the products and SMI Cluster Manager Ops Center).

Important

Users with administrator privileges can add, modify, and delete other users and groups. All the other users only have privileges to change their own password.

Adding a User

To add a new user, use the following configurations:

configure
  smiuser add-user username username password password
  exit

NOTES:

smiuser add-user : Adds a new local user.
username username : Specifies the name of the user.
username must be alphanumeric string.

password password : Specifies the password. The password must meet the following criteria:

Minimum 8 characters in length.
Contain at least one lowercase character.
Contain at least one uppercase character.
Contain at least one numeric character.

Contain at least one special character, which includes the following:

[ '~', '@', '#', '%', '^', '&', '*', '(', ')', '_', '+', '`', '-', '=', '[', ']', ':', '"', ';', '\'', '|', '<', '>', '?', ',', '.', '/', '$']

Important

Your password must not contain the '{' and '}' special characters.

Password must not start with '$'.

Password must not be simple or based on dictionary word.
Do not reuse passwords.

Use the following command to configure the number of passwords to keep in history:
```
password requisite pam_pwhistory.so debug enforce_for_root remember=12
```
Minimum number of days that are allowed between password changes is seven.

The following example adds a new user called 'user1' and assigns the password for the new user.

cee# configure terminal
  smiuser add-user username user1 password Cisco@123
message User added

The following example adds a new user called 'user2' and assigns the password for the new user.

cee# configure terminal
  smiuser add-user username user2 password Cisco@12345
message User added

In the following example, when an existing user name (user2) is added as a new user, the Ops Center displays an error message.

cee# configure terminal
  smiuser add-user username user2 password Cisco@12345
message User already exists

Creating Unprivileged Users with SSH Key

The SMI Cluster Manager allows creating unprivileged users on cluster nodes with SSH key access. These users will remain even after the SMI Cluster Manager is upgraded. Also, the SMI Cluster Manager considers the users created with the comment smi.user to be managed by the Cluster Manager. If an existing user, who is not an smi.user, is added to the configuration, the SMI Cluster Manager throws an error during cluster synchronization to prevent damaging or blocking communication to the system.

To add a SSH key and password to an user on all the nodes, use the following configuration:

configure 
  node-defaults os users username 
    password password 
    authorized-keys key_name 
    algorithm ssh_algorithm 
    key-data  key_data 
    exit 
  authorized-keys key_name 
   algorithm ssh_algorithm 
   key-data key_data 
   exit 
 exit

To add a SSH key and password to an user on a specific node, use the following configuration:

configure 
  node node_name os users username 
    password password 
    authorized-keys key_name 
    algorithm ssh_algorithm 
    key-data  key_data 
    exit 
   authorized-keys key_name 
    algorithm ssh_algorithm 
    key-data key_data 
    exit 
  exit

NOTES:

node-defaults os users username - Specifies the default value applicable to all the nodes for the selected user. username is the name of the user to be created.
node node_name os users username - Specifies the default value applicable to the specific node for the selected user. node_name is the name of the specific node. username is the name of the user to be created.
password password - Specifies the password used for authentication.
authorized-keys key_name - Specifies the name of the SSH key.
algorithm ssh_algorithm - Specifies the SSH algorithm used for generating the SSH key. For example, SSH-RSA or SSH-Ed25519 algorithm.
key-data key_data - Specifies the generated SSH key.

Deleting a User

To delete a user, use the following configuration:

configure
  smiuser delete-user username username
  exit

Note

smiuser delete-user - Deletes a local user.
username username - Specifies the name of the user.
username must be alphanumeric string.

The following example deletes a user called 'user2'.

cee# configure terminal
  smiuser delete-user username user2
message User deleted

In the following example, when a non-existing user is deleted, the Ops Center displays an error message.

cee# configure terminal
  smiuser delete-user username user2
message User does not exist

Modifying the Password

To modify the password (for self), use the following configuration:

configure
  smiuser change-self-password current_password current password new_password new_password 
  confirm_password new_password password_expire_days number_of_days
  exit

Note

smiuser change-password - Modifies the password for an user.
current_password current_password - Specifies the current password for an user.
new_password new_password - Assign a new password for the user. For information on password policy, see Adding a User section.
confirm_password new_password - Enter the newly assigned password one more time.
password_expire_days number_of_days - (Optional) Specifies the expiry date of the password. The default value is 180 days.

The following example updates the password for the current user.

cee# configure terminal
  smiuser change-self-password current_password Cisco@123 new_password Cisco@345 
  confirm_password Cisco@345 password_expire_days 180
message Password updated successfully

The following example updates the password for the user called 'user1' without assigning the password expiry date.

cee# configure terminal
  smiuser change-self-password current_password Cisco@123 new_password Cisco@345 confirm_password Cisco@345
message Password updated successfully

Reset the Administrator Password

You can reset the administrator password if you have access to the K8s Cluster through kubectl command-line utility.

To reset the administrator password:

Enter the Ops Center Pod's EXEC mode.

Use the following command to reset the administrator password.

kubectl exec -it <pod_name> -n <pod_namespace> /usr/local/bin/reset-admin

Enter the new password when prompted.

NOTES:

kubectl exec -it - Executes a command inside a container. -it passes the standard input stream to the container or TTY.
<pod_name> -n - Specifies the name of the Pod. -n specifies the namespace scope for this CLI request.
<pod_namespace> - Specifies the namespace of the Pod.
/usr/local/bin/reset-admin - Resets the administrator password.

Modifying the Password for Other Users

You can modify the password for other users using the following configuration:

configure
  smiuser change-password username username current_password current_password new_password new_password 
  confirm_password new_password password_expire_days number_of_days
  exit

Note

smiuser change-password - Modifies the password for an user.
username username - Specifies the name of the user.
username must be alphanumeric string.
current_password current_password - Specifies the current password for an user.
new_password new_password - Assign a new password for the user. For information on password policy, see Adding a User section.
confirm_password new_password - Enter the newly assigned password one more time.
password_expire_days number_of_days - (Optional) Specifies the expiry date of the password. The default value is 180 days.

The following example updates the password for the user called 'user1'.

cee# configure terminal
  smiuser change-password username user1 current_password Cisco@123 new_password Cisco@345 confirm_password Cisco@345 password_expire_days 180
message Password updated successfully

The following example updates the password for the user called 'user1' without assigning the optional password expiry date.

cee# configure terminal
  smiuser change-password username user1 current_password Cisco@123 new_password Cisco@345 confirm_password Cisco@345
message Password updated successfully

The following example updates the password for the user called 'user1' without assigning the password expiry date.

cee# configure terminal
  smiuser change-password username user1 current_password Cisco@123 new_password Cisco@345 confirm_password Cisco@345
message Password updated successfully

The following example updates the password for the user called 'user1' with an existing password.

cee# configure terminal
  smiuser change-password username user1 current_password Cisco@345 new_password Cisco@345 confirm_password Cisco@345
message Password has been already used

The following example updates the password for the user called 'user1' with different values for new password and confirm password parameters.

cee# configure terminal
  smiuser change-password username user1 current_password Cisco@345 new_password Cisco@345 confirm_password Cisco@567
message Passwords do not match

Updating the Password Length

To update the length of the password, use the following configuration:

configure
smiuser update-password-length length number_of_characters 
 exit

Note

smiuser update-password-length - Updates the length of the password.
length number_of_characters - Specifies the length of the password. number_of_characters must be a numeric value.

The following example updates the minimum length of the password to 10 characters.

cee# configure terminal
  smiuser update-password-length length 10
message Password updated successfully

Group Management

This chapter describes how to create and manage user groups using the Ops Center CLI (of both the products and SMI Cluster Manager).

Adding a User Group

To add a user group, use the following configuration:

configure
smiuser add-group groupname group_name 
 exit

Note

smiuser add-group - Adds a new user group.
groupname group_name - Specifies the name of the user group. group_name must be a alphanumeric value.

The following example adds a new user group called 'group1'.

cee# configure terminal
  smiuser add-group groupname group1
message Group added

In the following example, when a user group that already exists is added, the Ops Center displays an error message.

cee# configure terminal
  smiuser add-group groupname group1
message Group already exists

Deleting a User Group

To delete a user group, use the following configuration:

configure
smiuser delete-group groupname group_name 
 exit

Note

smiuser delete-group - Deletes a user group.
groupname group_name - Specifies the name of the user group. group_name must be a alphanumeric value.

The following example deletes a new user group called 'group2'.

cee# configure terminal
  smiuser delete-group groupname group2
message Group deleted

In the following example, when a user group that does not exist is deleted, the Ops Center displays an error message.

cee# configure terminal
  smiuser delete-group groupname group2
message Group does not exist

Assigning an User to an User Group

To assign an user to an user group, use the following configuration:

configure
  smiuser assign-user-group username username group group_name 
   exit

Note

smiuser assign-user-group - Assigns an user to a user group.
username username - Specifies the name of the user. username must be alphanumeric value.
groupname group_name - Specifies the name of the user group. group_name must be a alphanumeric value.

The following example assigns an user called 'user1' to a group called 'group1'.

cee# configure terminal
  smiuser assign-user-group username user1 group group1
message User assigned to group successfully

The following example assigns a non-existing user to an existing group.

cee# configure terminal
  smiuser assign-user-group username user20 group group1
message User does not exist

The following example assigns a non-existing group to an existing user.

cee# configure terminal
  smiuser assign-user-group username user1 group group10
message Group does not exist

Unassigning a User from a User Group

To unassign a user from a user group, use the following configuration:

configure
  smiuser unassign-user-group username username group group_name 
   exit

Note

smiuser unassign-user-group - Removes an user from a user group.
username username - Specifies the name of the user. username must be alphanumeric value.
groupname group_name - Specifies the name of the user group. group_name must be a alphanumeric value.

The following example removes an user from a group.

cee# configure terminal
  smiuser unassign-user-group username user1 group group1
message User un-assigned from group successfully

The following example removes a non-existing user from a group.

cee# configure terminal
  smiuser unassign-user-group username user10 group group1
message User is not a member of this group

The following example removes an user from an non-existing group.

cee# configure terminal
  smiuser unassign-user-group username user1 group group10
message Group does not exist

Resiliency and Redundancy

For resiliency and redundancy, SMI utilizes Kubernetes version 1.16. For more information on the various Kubernetes components, see https://v1-16.docs.kubernetes.io/docs/concepts/overview/components/

Note

SMI implements Kubernetes with a highly available three ETCD and three control plane nodes setup.

SMI User and Audit Tracking Commands

This section provides the list of user and audit tracking commands used in SMI.

Note

All these commands are executed on the node's terminal. Users with sudo access can execute these commands.
You need the wtmp and btmp files in the /var/log directory for the system to store log information.

User Tracking Commands

The following commands are used for user tracking in SMI.

Table 10. User Tracking Commands

Command

Description

last -F

Displays the list of users logged in the last session along with information such as date and time of last log in.

Note

All the user information (log in and log out details) are stored in the /var/log/wtmp file. The command fetches all the user information stored in this file from the time the wtmp file was created.
The list of users in the current session are displayed as still logged in.

last reboot

Displays the number of times the system was rebooted.

lastb

Displays the list of bad log in attempts recorded in the /var/log/btmp file.

lastlog

Displays each user's last logged information - login name, port, and last login time - recorded in var/log/lastlog file.

Audit Tracking Commands

The following commands are used for audit tracking in SMI.

Table 11. Audit Tracking Commands
Command	Description
sudo aureport -au -i \| more	Displays a summary of audit daemon logs report.
sudo cat /var/log/auth.log \| grep "Failed password"	Displays a list of failed SSH log in attempts.
sudo journalctl _SYSTEMD_UNIT=ssh.service \| egrep "Failed\|Failure"	Displays a list of all the failed log in attempts.
sudo journalctl -q _TRANSPORT=audit	Displays audit logs for SSH, SFTP and SCP.

TCP and UDP Open Ports

This section lists the TCP and UDP services and the corresponding open ports in the Kubernetes cluster nodes (control plane, worker and etcd).

The following table lists the TCP and UDP services and the corresponding open ports for the Primary control plane node.

Table 12. Primary Control Plane Node - Open Ports
Pod	Description	Port
kubelet	kubelet is the lowest level component in Kubernetes. It’s responsible for what’s running on an individual machine. You can think of it as a process watcher like supervisord but focused on running containers. It has one job: given a set of containers to run, make sure they are all running.	10248, 10250
kube-proxy	kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service. concept. kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.	10249, 443, 10256
calico-node	A node resource representing a node running Calico. When adding a host to a Calico cluster, a node resource needs to be created which contains the configuration for the calico/node instance running on the host.	9099
kube-controller	The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. The controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.	10257
kube-scheduler	kube-scheduler is the default scheduler for Kubernetes and runs as part of the control plane. A scheduler watches for newly created Pods that have no Node assigned. For every Pod that the scheduler discovers, the scheduler becomes responsible for finding the best Node for that Pod to run on	10259
bird	BIRD is an open source BGP client that is used to exchange routing information between hosts. The routes that Felix programs into the kernel for endpoints are picked up by BIRD and distributed to BGP peers on the network, which provides inter-host routing. If configured, there will be two BIRD processes running in the calico/node container. One for IPv4 (bird) and one for IPv6 (bird6).	3179
systemd-resolv	systemd-resolved is a system service that provides network name resolution to local applications.	53
sshd	sshd (SSH Daemon) is the daemon program for ssh. Together these programs replace rlogin and rsh and provide secure encrypted communications between two untrusted hosts over an insecure network.	22
kube-apiserver	The kubelet takes a set of PodSpecs and ensures that the described containers are running and healthy. kube-apiserver - REST API that validates and configures data for API objects such as pods, services, replication controllers.	6443
node_exporter	Node Exporter is a Prometheus exporter for hardware and OS metrics with pluggable metric collectors. It allows you to measure various machine resources such as memory, disk and CPU utilization.	9100
chronyd	chronyd provides support to work out the gain or loss rate of the 'real-time clock', i.e. the clock that maintains the time when the computer is turned off. It can use this data when the system boots to set the system time from a corrected version of the real-time clock.	323, 123

The following table lists the TCP and UDP services and the corresponding open ports for Secondary control plane node.

Table 13. Secondary Control Plane Node - Open Ports
Pod	Description	Port
kubelet	kubelet is the lowest level component in Kubernetes. It’s responsible for what’s running on an individual machine. You can think of it as a process watcher like supervisord but focused on running containers. It has one job: given a set of containers to run, make sure they are all running.	10248, 36189, 10250
kube-proxy	kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service. concept. kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.	10249, 10256
calico-node	A node resource representing a node running Calico. When adding a host to a Calico cluster, a node resource needs to be created which contains the configuration for the calico/node instance running on the host.	9099
kube-controller	The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. The controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.	10257
kube-scheduler	kube-scheduler is the default scheduler for Kubernetes and runs as part of the control plane. A scheduler watches for newly created Pods that have no Node assigned. For every Pod that the scheduler discovers, the scheduler becomes responsible for finding the best Node for that Pod to run on	10259
bird	BIRD is an open source BGP client that is used to exchange routing information between hosts. The routes that Felix programs into the kernel for endpoints are picked up by BIRD and distributed to BGP peers on the network, which provides inter-host routing. If configured, there will be two BIRD processes running in the calico/node container. One for IPv4 (bird) and one for IPv6 (bird6).	3179
systemd-resolv	systemd-resolved is a system service that provides network name resolution to local applications.	53
sshd	sshd (SSH Daemon) is the daemon program for ssh. Together these programs replace rlogin and rsh, and provide secure encrypted communications between two untrusted hosts over an insecure network.	22
kube-apiserver	The kubelet takes a set of PodSpecs and ensures that the described containers are running and healthy. kube-apiserver - REST API that validates and configures data for API objects such as pods, services, replication controllers.	6443
node_exporter	Node Exporter is a Prometheus exporter for hardware and OS metrics with pluggable metric collectors. It allows you to measure various machine resources such as memory, disk and CPU utilization.	9100
chronyd	chronyd provides support to work out the gain or loss rate of the 'real-time clock', i.e. the clock that maintains the time when the computer is turned off. It can use this data when the system boots to set the system time from a corrected version of the real-time clock.	323, 123

The following table lists the TCP and UDP services and the corresponding open ports for etcd node.

Table 14. ETCD Node - Open Ports
Pod	Description	Port
kubelet	kubelet is the lowest level component in Kubernetes. It’s responsible for what’s running on an individual machine. You can think of it as a process watcher like supervisord but focused on running containers. It has one job: given a set of containers to run, make sure they are all running.	10248, 10250, 10255
etcd	etcd is a distributed key-value store, which accepts TLS traffic, non-TLS traffic or both TLS and non-TLS traffic.	2379, 2380, 2381
systemd-resolv	systemd-resolved is a system service that provides network name resolution to local applications.	53
sshd	sshd (SSH Daemon) is the daemon program for ssh. Together these programs replace rlogin and rsh, and provide secure encrypted communications between two untrusted hosts over an insecure network.	22
chronyd	chronyd provides support to work out the gain or loss rate of the 'real-time clock', i.e. the clock that maintains the time when the computer is turned off. It can use this data when the system boots to set the system time from a corrected version of the real-time clock.	323, 123
node-exporter	Exports the node metrics to Prometheus and to be viewable on the Grafana dashboard in Host details and summary dashboards.	9100

The following table lists the TCP and UDP services and the corresponding open ports for worker node.

Table 15. Worker Node - Open Ports
Pod	Description	Port
kubelet	kubelet is the lowest level component in Kubernetes. It’s responsible for what’s running on an individual machine. You can think of it as a process watcher like supervisord but focused on running containers. It has one job: given a set of containers to run, make sure they are all running.	10248, 10250
kube-proxy	kube-proxy is a network proxy that runs on each node in your cluster, implementing part of the Kubernetes Service. concept. kube-proxy maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.	10249, 10256
calico-node	A node resource representing a node running Calico. When adding a host to a Calico cluster, a node resource needs to be created which contains the configuration for the calico/node instance running on the host.	9099
bird	BIRD is an open source BGP client that is used to exchange routing information between hosts. The routes that Felix programs into the kernel for endpoints are picked up by BIRD and distributed to BGP peers on the network, which provides inter-host routing. If configured, there will be two BIRD processes running in the calico/node container. One for IPv4 (bird) and one for IPv6 (bird6).	3179
systemd-resolv	systemd-resolved is a system service that provides network name resolution to local applications.	53
sshd	sshd (SSH Daemon) is the daemon program for ssh. Together these programs replace rlogin and rsh, and provide secure encrypted communications between two untrusted hosts over an insecure network.	22
node_exporter	Node Exporter is a Prometheus exporter for hardware and OS metrics with pluggable metric collectors. It allows you to measure various machine resources such as memory, disk and CPU utilization.	9100
chronyd	chronyd provides support to work out the gain or loss rate of the 'real-time clock', i.e. the clock that maintains the time when the computer is turned off. It can use this data when the system boots to set the system time from a corrected version of the real-time clock.	323, 123

Configurable Option to Control Ping Properties

SMI monitors applications such as the UPF that are deployed in KVM-based clusters. Monitoring, in part, is performed through pings sent at regular intervals to verify that the applications are alive. If an application is unresponsive for a certain number of times, then the monitoring service attempts to restart it.

The ping interval and the number of failure occurrences are now configurable using the following commands:

node-defaults kvm monitoring ping-interval<#_seconds>

node-defaults kvm monitoring failure-occurrence<#_instances>

<#_seconds> is any integer value. The minimum value is 3 and the default value is 10 seconds.

<#_instances> is any integer value. The minimum value is 3 and the default value is 10 times.

Upon changing these values and running a sync, the monitoring service is restarted and the new configuration is applied.

IFTASK Forwarder type

In past releases, SMI provided support for the Vector Packet Processing (VPP) data plane development kit (DPDK) forwarder for use with StarOS-based applications such as the 5G User Plane Function.

SMI now supports the use of the DPDK Internal Forwarder (IFTASK) for use with StarOS-based applications.

IFTASK support is enabled by the forwarder-type parameter as part of the day 0 configuration:

nodes control-plane

[ no ] vm-defaults upf day0 forwarder-type { IFTASK | VPP }

If the forwarder type command is not issued, then VPP will be used as the forwarder type by default.

Once IFTASK has been set, the no variant of this command can be used to re-enable the VPP forwarder type.

When the forwarder type is set to IFTASK, the following additional parameters are used as part of the UPF day 0 configuration:

IFTASK_SERVICE_TYPE=0
IFTASK_CORES=44
IFTASK_MCDMA_CORES=50

These parameters are hard-coded and set automatically by SMI during the deployment process.

Customer Data Recovery and Backup

Feature Description

Since the server-less Amazon Aurora database (DB) service is not available across regions, it's necessary to have a disaster recovery process for customer data when the DB in an active region fails. The disaster recovery procedure mainly involves generating the same DB snapshot in a different backup region and making all the other clusters connect to the restored DB.

Data Recovery and Backup Procedure

This section describes the customer data recovery and backup procedure.

Prerequisite

You must enable continuous automatic backup of ConfigDB Aurora DB on an active cluster.

AWS Backup Overview

Amazon Web Services (AWS) Backup is a fully-managed service that makes it easy to centralize and automate data protection across AWS services, in the cloud, and on premises. Using this service, one can configure backup policies and monitor activity for your AWS resources in one place. It automates and consolidate backup tasks that were previously performed service-by-service and removes the need to create custom scripts and manual processes.

Note

To use the AWS Backup service, you must opt in to have the AWS Backup service to back up the assigned resources.

Supported Resources

The supported resources include Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (Amazon EBS) volumes, Amazon Relational Database Service (Amazon RDS) databases (including Amazon Aurora clusters), Amazon DynamoDB tables, Amazon Neptune databases, Amazon DocumentDB (with MongoDB compatibility) databases, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Lustre file systems, Amazon FSx for Windows File Server file systems, and AWS Storage Gateway volumes.

Backup Rules

Define a backup rule to specify the backup schedule, backup window, destination regions, and lifecycle rules.

Frequency - hourly, daily, weekly, monthly
Backup window - default or custom
Retention period - days, weeks, months, years
Destination regions - List of available regions

Note

The recommended values are for every 1 hour with a retention period of 7 days with at least 3 destination regions.

Resource Assignment

After you create the rule, assign the desired services to perform the backup. In this case, the Aurora DB is assigned to the rule.

DB Recovery Procedure

This section describes the steps required to recover data.

Create an AWS SMI substrate on the specified region provided to you with the AWS account , for example, us-west-2, so you can deploy the SMI cluster.
Deploy the SMI cluster and initiate all the required SMI cluster components except the ConfigDB database, these are ConfigFE, ConfigBE, and the monitoring clusters on the specified region provided with your AWS account.
After a resource (ConfigDB) is backed up at least once in the active region, for example, us-east-2, it is considered protected and is available to be restored using the AWS Backup dashboard under protected resources in each region. Each resource has specific steps to restore from the backup. Initiate the ConfigDB in the backup region, for example, us-west-2.
Configure and initiate the backup cluster active state. After the Aurora DB is restored, configure the ConfigDB to point to the newly restored resource ARN and secret ARN to enable the ConfigDB to access the DB.

After the DB is connected, ensure that all the other clusters connect to ConfigDB and are fully active.
Configure the vault KMS and storage using information about the newly restored DB.
Verify the vault encryption and decryption. For example, the encrypted fields on the us-east-2 are decrypted on the us-west-2 region.

CIMC Certificate Renewal

The Cisco^® Integrated Management Controller (IMC) is a baseboard management controller that provides embedded server management for Cisco UCS^® C-Series Rack Servers and Cisco UCS S-Series Storage Servers. The Cisco IMC enables system management in the data center and across distributed locations.

The CIMC certificates are valid only for 3 years. If the certificate expires in less than 90 days, it must be renewed.

To renew the CIMC certificate, use the following configuration:

config 
   clusters cluster_name 
      node-defaults ucs-server cimc certificate rehydrate { true | false} 
      exit

NOTES:

When the certificate is renewed, the CIMC drops connections for 15 to 60 seconds while the host key is updated.
The default setting is false . When set to true , it renews the certificate that expires in less than 90 days.
Every cluster synchronization log displays the expiry date of the certificate.

XFS File System

Table 16. Feature History
Feature Name	Release Information	Feature Description
Configurable XFS File System	2023.04	SMI supports a new XFS filesystem to install the /data partition using Mongo DB for new deployments.

SMI utilizes the XFS filesystem to install /data partition using Mongo DB.

By default, all partitions are formatted using ext4.

Note

XFS works only with new deployments.

How it Works

The following steps describe how to enable data partition using XFS:

When partition is defined under os , run cluster-sync to create a file named smi-fs that contains the customized file system type information.

smi-fs is a valid shell file containing variable assignments (all instances of - are converted to _ ).

The file is created under:
- /scripts in cloud-init iso file for VMware
- /smi/cloud-init/ in smi-install-iso file for bare metal
For example:
```
# cat /smi/cloud-init/smi-fs
smi_fs_smi_state=FS_TYPE=xfs
```
The above variable assignment indicates that smi-state partition must be formatted using XFS as the file system type.

smi_fs_ is added as prefix to the variable name to avoid any name conflict.
During a fresh installation, the initialize script (shared-iso-files/templates/initialize) in the bootable ISO creates the file system using the information from smi-fs.

After reboot, another initialize script (shared-iso-files/templates/base-image-initialize) which is installed under /etc/initramfs-tools/smi-init/initialize updates /etc/fstab with the correct entries.

Note

If a separate disk is used for /data, the initialization of data partition is done in base-image-initialize.

Configuring XFS

To format OS partition with the XFS option, use the following sample configuration:

Cluster Level:

config 
   clusters cluster_name 
      node-defaults os partition smi-data 
         fs-type { ext4 | xfs } 
         exit

Node Level:

config 
   clusters cluster_name 
      nodes node_name 
         os partition smi-data 
         fs-type { ext4 | xfs } 
         exit

NOTES:

smi-data —Use when there is a separate disk for the /data directory (VMware, OpenStack, and so on).
fs-type { ext4 | xfs } —Specify the ext4 or XFS file system.

Verifying the Configuration

To verify the configuration, use the following commands:

smi-state is formatted with xfs using the following configuration:
```
os partition smi-state
   fs-type xfs
   exit
```

fstab :

$ cat /etc/fstab | grep xfs
LABEL=smi-state /mnt/stateful_partition xfs defaults 0 0

blkid :

$ blkid | grep xfs
/dev/sda5: LABEL="smi-state" UUID="9e717cb9-c1fb-4190-9283-aa20afe24d3a" TYPE="xfs" 
PARTLABEL="smi-state" PARTUUID="7251ecd2-9623-4a37-88f8-b969b836634d"

Cluster Access for OS Users

SMI supports the access of OS users to SMI cluster upon login using the configurable addons secure-access { enabled | disabled } CLI command in the Cluster Configuration mode. By default, this command is disabled to reduce the resource usage in the K8s cluster.

The helm chart is created to deploy Daemonsets onto master nodes. Only the access controller pod on the active master will run and manage user access.

To enable or disable the access of OS users to the SMI cluster, use the following configuration:

config 
   clusters cluster_name 
      addons secure-access { enabled | disabled } 
      end

CPU Frequency Scaling with Governor Settings

Table 17. Feature History
Feature Name	Release Information	Feature Description
CPU Frequency Scaling	2025.01.0	SMI provides configuration options to improve CPU performance by adjusting the CPU scaling governor settings. This feature allows the operator to switch between "powersave" and "performance" modes, with "powersave" as the default setting. The operator changes the governor setting to "performance" to increase system performance during demanding workloads. Commands Introduced: clusters `cluster_name` nodes `node_name` os tuned base-profile { balanced \| latency-performance } clusters `cluster_name` node-defaults os tuned base-profile { balanced \| latency-performance } clusters `cluster_name` node-type-defaults `node_type` os tuned base-profile { balanced \| latency-performance }

SMI offers configuration options to enhance CPU performance by adjusting the CPU scaling governor settings. Operators can switch between "powersave" (default) and "performance" modes, opting for "performance" during demanding workloads to increase system performance.

How the CPU Governor scaling works

The feature leverages the tuned utility, which is already integrated into the SMI base image, to manage CPU scaling governor settings along with other system parameters. tuned provides predefined profiles that are optimized for various workloads and can merge multiple profiles to customize the system configuration further.

Key Functionalities:

Profile Configuration: The feature allows configuration of a base-profile from tuned to manage CPU scaling and other system parameters.
Profile Precedence: When both a base-profile and a custom cndpprofile are applied, the custom profile takes precedence to ensure specific system tuning needs are met.
Environment Adaptation: By default, tuned applies profiles based on the environment; for example, "balanced" for bare-metal and so on.

Configure the CPU Governor Scaling

Use the following procedure from cluster deployer to configure scaling governor and enhance CPU performance.

Procedure

Step 1	Enter the Global Configuration mode.
Step 2	Enable the installing tuned using the following command: Example: `clusters cluster-name node-defaults os tuned enabled`
Step 3	Set the tuned base-profile to latency-performance on all the nodes using the following command: Example: `clusters cluster-name node-defaults os tuned base-profile latency-performance` By default, in the bare-metal setup, the profile is set to balanced which sets the `cpu_scaling_governor` to `powersave`.
Step 4	Save and commit the configuration.
Step 5	Run the cluster synchronization using the following command: Example: `clusters cluster_name actions sync run debug true` Example: `SMI Cluster Deployer# clusters kali-stacked actions sync run debug true This will run sync. Are you sure? [no,yes] yes message accepted` This command will run the synchronization and apply the tuned profile of latency-performance to all the nodes.

Ultra Cloud Core Subscriber Microservices Infrastructure - Operations Guide

Bias-Free Language

Results

Chapter: SMI Cluster Manager Operations

SMI Cluster Manager Operations

Overview

Sync API

Offline Images

Command Line Interface

Operating the SMI Cluster Manager on Bare Metal

Deploying Remote Clusters

Deploying Kubernetes Cluster

Deploying Remote UPF Clusters

Installing Kernel Based Virtual Machines

Upgrading the UPF Clusters

Deploying the CEE Clusters

Upgrading the CEE Clusters

Cluster Synchronization Lock

Deploy the VPC-DI cluster

Before you begin

Procedure

Example:

Example:

Example:

Example:

Example:

Life Cycle Management of the Virtual Machines

Provisioning the Virtual Machine

Removing the Virtual Machine

Recovering the Virtual Machine

Configuring the Virtual Machine Recovery

Adding and Removing KVM Nodes

Server RMAs

UCS Firmware Upgrade

Updating the UCS Server Firmware

Failure Scenarios

Failure Impact

Troubleshooting

UCS Server Health Check

Overriding Health Check Sync Failure

Configuring Server-Check Synchronization Phase

SMI Cluster RMA

Prerequisites for RMA Process

Unified RMA Procedure for Planned and Failure Events on Bare Metal

Unified RMA Procedure for the Control Plane and Worker Nodes

SMI Bare Metal Stacked Cluster

Control Plane Baremetal Node Failure - Unplanned

Control Plane Bare Metal Node Maintenance - Planned

Stacked Cluster with Worker Nodes

Worker Bare Metal Node Failure - Unplanned

Worker Bare Metal Node Maintenance - Planned

KVM Bare Metal Node Failure

Node Failures in SMI Cluster Manager HA

Active Bare Metal Node Failure - Unplanned

Active Bare Metal Maintenance - Planned

Standby Bare Metal Node Failure or Maintenance

SMI Cluster Manager AIO Node Failure or Planned Maintenance

Disaster Recovery

Active and Standby Node Failure - Unplanned

Restoring an Entire Site or Cluster

CEE Data Replication

Notifications

Sync-State Notification

Node-State Notification

Sync-Status Notification

Node-Status Notification

VM Status Alerts

Cluster Manager Notification

CEE Ops-Center Notification

VM Action Notifications

Configuring the Alert Notification in CEE

Sample Notification from the Alert Notification Stream

Configurable Host Profile Support

Adding a Host Profile

BIOS Settings

Hyper Threading Support on KVM

Isolation CPU Pinning

CPU Worker Count

Emulator Pinning

CPU Isolation

base `docker_bridge_subnet`