Solutions - Deployment Guide: Cisco AI PODs with Canonical Kubernetes and Canonical MLOps Stack White Paper

Maximize the performance of Cisco AI PODs with Canonical Kubernetes and Canonical MLOps stack

Accelerating AI/ML deployment with Canonical Kubernetes

This guide will show AI/ML solution architects, data scientists, and infrastructure engineers how to deploy Cisco AI PODs with Canonical Kubernetes and Canonical MLOps stack on Cisco AI PODs. Together, these solutions form a production-ready and scalable platform for ML workloads, delivering the cost and scale benefits of open source on a trusted enterprise platform:

● Pre-optimized resource use on Cisco AI PODs

● Streamlined Canonical MLOps lifecycle management for a quick transition from prototype to production

● Five years of support for LTS versions of Canonical Kubernetes and the Canonical MLOps stack, with the option of lengthening support to 15 years

● Security built into every layer of the stack through Cisco Secure AI Factory with NVIDIA, from infrastructure-level protection with Cisco Hypershield^™ and BlueField DPUs to model-layer defense with Cisco AI Defense

● Cloud-managed operations through Cisco Intersight^® and Nexus^® Dashboard, enabling policy-driven GPU orchestration and zero-touch provisioning of AI infrastructure at scale

Solution overview

Related image, diagram or screenshot

A powerful platform for MLOps, delivered by Canonical and Cisco

Canonical Kubernetes and Canonical MLOps integrate with Cisco AI PODs to deliver a turnkey environment for enterprise AI. The platform replaces fragmented tools with a unified platform that handles everything from GPU orchestration to model lifecycle management, significantly reducing the time it takes to scale AI/ML initiatives from pilot to production.

Cisco AI PODs: purpose-built infrastructure for enterprise AI

Cisco AI PODs are a pre-validated, modular AI infrastructure solution that combines Cisco UCS^® servers with NVIDIA GPUs, Cisco Nexus high-performance Ethernet networking, and integrated storage into a single, rack-scale deployment unit. As a core building block of the Cisco Secure AI Factory with NVIDIA, each Cisco AI POD is engineered to compress deployment timelines from months to weeks, giving your teams a production-ready foundation for training, fine-tuning, and inference workloads without the burden of assembling and validating each component independently.

At the compute layer, Cisco UCS C880A M8 Rack Servers deliver up to 8 B300 Ultra Tensor Core GPUs interconnected through NVLink, providing the GPU-to-GPU bandwidth that large-scale model training demands. Dual Intel^® Xeon^® 6^th Gen CPUs supply the host-side compute and memory capacity needed for data preprocessing, checkpointing, and serving workloads in parallel.

The networking fabric uses Cisco N9000 Series switches running at 400GbE/800GbE, built on Cisco^® Silicon One^® and NVIDIA Spectrum X. Cisco is the only networking vendor whose silicon is included in the NVIDIA Spectrum-X Ethernet platform through a shared-licensing partnership, combining Cisco’s congestion-aware per-packet load balancing with NVIDIA’s adaptive routing and out-of-order packet handling. This delivers the low-latency, lossless east/west traffic performance that distributed GPU training requires.

Security is built into Cisco AI PODs from the ground up rather than layered on after deployment. Cisco Hypershield provides AI-native, kernel-level enforcement through eBPFs, while NVIDIA BlueField-3 DPUs run the Cisco Hybrid Mesh Firewall at 400G line-rate stateful inspection without consuming CPU or GPU cycles. Cisco AI Defense adds model-layer protection: discovery of shadow AI assets, validation of model behavior, and runtime guardrails for prompt injection and data exfiltration. This security-first approach addresses the governance and compliance requirements that enterprise AI deployments face from day one.

Operationally, Cisco Intersight provides cloud-managed lifecycle management with policy-driven GPU orchestration, enabling real-time allocation of GPU resources to workloads and eliminating the manual provisioning overhead that slows down scaling. Cisco Nexus Dashboard delivers unified automation and observability across the network fabric, with topology-aware visualization and congestion analytics purpose-built for AI traffic patterns.

Canonical Kubernetes for GPU optimization:

Getting the most out of your hardware often entails a high degree of manual tweaking. Through Canonical’s partnership with Cisco, Canonical Kubernetes is optimized to deliver the full potential of the NVIDIA GPUs included in Cisco AI PODs, right out of the box. This reduces the operational overhead of GPU configuration, allowing teams to partition and scale hardware resources dynamically while maintaining strict workload isolation.

Streamlining the AI lifecycle with Canonical’s MLOps stack solutions:

Canonical’s MLOps stack builds on this robust foundation by providing the essential tools and workflows for defining, managing, and automating the entire AI workload lifecycle. Canonical’s Charmed Kubeflow, Charmed MLFlow, and Charmed Feast run seamlessly together to help ensure a smooth journey from data preparation and model training to deployment and monitoring.

Canonical’s MLOps stack simplifies complex operations by:

● Integrated tooling: seamlessly combines best practice open-source MLOps tools into a cohesive platform

● Workflow automation: Modular charmed operators encapsulate complex operational logic. This creates a repeatable framework for model building and deployment, minimizing manual overhead and accelerating time to market.

● Unified management: provides an integrated platform for managing ML experiments, artifacts, and production deployments

Enhanced networking with Cilium CNI — powered by Cisco Isovalent:

For high-performance, secure, and observable networking, this solution uses Cilium as the Container Network Interface (CNI), powered by Cisco Isovalent^® Cilium, based on eBPF technology, provides:

● Superior performance: offers high-throughput and low-latency networking crucial for distributed ML workloads that rely on fast communication between nodes and GPUs

● Advanced security: implements fine-grained network policies based on application identity, securing inter-service communication within your AI workloads

● Deep observability: provides rich network telemetry and tracing, simplifying troubleshooting and performance tuning for complex Kubernetes environments

This combination of Cisco’s purpose-built AI infrastructure, Canonical’s enterprise Kubernetes and MLOps expertise, and Cilium’s networking delivers a production-grade platform for enterprises looking to move AI projects from prototype to production at scale.

Cisco acquired Isovalent in 2024, bringing the Cilium and Tetragon open-source projects under Cisco’s engineering umbrella. Cilium is already the default CNI for Canonical Kubernetes, Google Kubernetes Engine, Amazon EKS, and Microsoft Azure AKS. With the Isovalent acquisition, Cisco now offers direct engineering support and enterprise-grade SLAs for Cilium deployments, tightly integrated with the broader Cisco Secure AI Factory networking and security stack. For AI workloads running on Cisco AI PODs, this means that the same vendor providing the physical network fabric, the GPU compute, and the infrastructure security also owns and maintains the Kubernetes networking layer, eliminating the multi-vendor finger-pointing that typically plagues troubleshooting in production AI clusters.

Deployment guide

1. Prerequisites and setup

1.1 Hardware and networking configuration

In this demo, For this demo, SSH access was given to a pre-provisioned Ubuntu 24.04 machine on CiscoAI PODs.

The Cisco AI POD used in this deployment consists of a Cisco UCS C885A M8 Rack Server equipped with 8 NVIDIA HGX H200 Tensor Core GPUs and dual AMD EPYC processors (96 cores each). The backend network fabric uses Cisco Nexus 9364E-GX2A switches running at 400 GbE, configured for RoCE (RDMA over converged Ethernet) to support the low-latency GPU-to-GPU communication required by distributed training workloads. Ubuntu 24.04 LTS was pre-installed on the server through Cisco Intersight’s OS provisioning workflow, which automates bare-metal deployment with policy-driven configuration profiles. Network configuration, including VLAN assignments, MTU settings for jumbo frames (9216 bytes), and LLDP discovery, was applied through Nexus Dashboard templates. The storage fabric was configured for NVMe over Fabrics (NVMeoF) to provide high-throughput data access to the training pipeline. All firmware versions for UCS BIOS, GPU vBIOS, and NIC firmware were validated against the Cisco Secure AI Factory with NVIDIA reference architecture compatibility matrix prior to deployment.

1.2 NVIDIA driver and CUDA installation

Install the recommended stable proprietary driver and the NVIDIA CUDA toolkit using the standard Ubuntu package manager.

sudo apt update

sudo ubuntu-drivers install --gpgpu nvidia:535-server

sudo apt install nvidia-cuda-toolkit

Next, perform a reboot to ensure that the new kernel module is loaded.

The installation can be verified by checking the driver version and running the “nvcc” compiler version check.

sudo nvidia-smi

nvcc --version

Related image, diagram or screenshot

1.3 inotify setup

Canonical Kubernetes uses inotify to interact with the file system. This may lead to situations where large Canonical Kubernetes deployments (such as Charmed Kubeflow and Charmed MLflow exceed the default inotify limits. Therefore, you can increase the limits, and retain the increased limits across the infrastructure, using the following commands:

sudo sysctl fs.inotify.max_user_instances=1280

sudo sysctl fs.inotify.max_user_watches=655360

echo "fs.inotify.max_user_instances=1280" >> sudo tee -a /etc/sysctl.conf

echo "fs.inotify.max_user_watches=655360" >> sudo tee -a /etc/sysctl.conf

2. Canonical Kubernetes deployment

The deployment uses Canonical's official Kubernetes snap, customized for the Cisco AI PODs' hardware capabilities.

2.1 Core cluster deployment

You can perform the Canonical Kubernetes deployment and bootstrap using the following snap command:

sudo snap install k8s --classic

sudo k8s bootstrap

2.2 Post-deployment verification

Upon completion of the bootstrap, you should check the status of the cluster:

sudo k8s status --wait-ready

Related image, diagram or screenshot

sudo k8s kubectl get nodes

Related image, diagram or screenshot

3. GPU and networking testing

3.1 NVIDIA GPU integration

Deploy the NVIDIA GPU Operator using its official Helm chart. This will automate the management and lifecycle of necessary components (such as drivers and device plug-ins) for GPU support in Kubernetes.

1. Addition of the NVIDIA GPU Operator Helm Repository: Add the official NVIDIA GPU Operator Helm Repository to the local Helm configuration:

sudo k8s helm repo add nvidia https://nvidia.github.io/gpu-operator

sudo k8s helm repo update

2. NIVIDI GPU Operator installation: Install the NVIDIA GPU Operator into the gpu-operator namespace:

sudo k8s helm install --generate-name --wait -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.9.2 --set mig.strategy=mixed

3. Verification: Monitor the cluster resources in order to confirm:

● Successful deployment of the driver and device plug-in pods

● Recognition of Kubernetes by the available NVIDIA GPUs

sudo k8s kubectl get pods -n gpu-operator

sudo k8s kubectl get nodes -o json | grep nvidia.com/gpu

Related image, diagram or screenshot

3.1.1 Basic CUDA verification

Perform the basic CUDA verification step by configuring a test pod to execute a simple CUDA operation.

As an output, you’ll see that the test pod was deployed using the nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2 image, requesting two GPU resources, which successfully ran and confirmed GPU functionality from within the containers.

apiVersion: v1

kind: Pod

metadata:

spec:

restartPolicy: OnFailure

containers:

- name: cuda-vector-add

image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2"

resources:

limits:

nvidia.com/gpu: 2

Related image, diagram or screenshot

3.1.2 Tensorflow benchmark

Perform the tensorflow benchmarking step by configuring a test pod to execute the tensorflow benchmarks.

As an output, you’ll see that the test pod was deployed using the nvcr.io/nvidia/tensorflow:24.06-tf2-py3 image, requesting two GPU resources, which successfully ran and confirmed GPU functionality with tensorflow from within containers.

apiVersion: v1

kind: Pod

metadata:

spec:

restartPolicy: Never

containers:

- name: tf-benchmarks

image: "nvcr.io/nvidia/tensorflow:24.06-tf2-py3" # Always check for latest OCI release tag: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow/tags

command: ["/bin/sh", "-c"]

args: ["cd /workspace && git clone https://github.com/tensorflow/benchmarks/ && cd /workspace/benchmarks/scripts/tf_cnn_benchmarks && python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16"]

resources:

limits:

nvidia.com/gpu: 2

ubuntu@cpn-ful-ai-c245-gpuwn4:~$ cat cuda_test.yaml

apiVersion: v1

kind: Pod

metadata:

spec:

restartPolicy: OnFailure

containers:

- name: cuda-vector-add

image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2"

resources:

limits:

nvidia.com/gpu: 2

sudo k8s kubectl logs tf-gpu-bench-full

Related image, diagram or screenshot

3.3. Networking configuration

Cilium is the default Container Network Interface (CNI) for the Canonical Kubernetes distribution, providing integrated networking and network policy enforcement. Therefore, no additional installation or configuration steps are required.

The MetalLB plug-in should be enabled and configured to allow access to Kubeflow:

sudo k8s enable load-balancer

sudo k8s set load-balancer.cidrs=10.2.0.1-10.2.0.254

3.3.1 Cilium CNI testing

Perform the Cilium testing using the official Cilium single-node connectivity check.

This requires making adjustments to the yaml file, as coredns is used in Canonical Kubernetes instead of kube-dns.

wget https://raw.githubusercontent.com/cilium/cilium/refs/tags/v1.17.1/examples/kubernetes/connectivity-check/connectivity-check-single-node.yaml.

sed -i 's/kube-dns/coredns/g' connectivity-check-single-node.yaml

sudo k8s kubectl create namespace cilium-test

sudo k8s kubectl apply -f connectivity-check-single-node.yaml

The previous command deploys the test pods and the necessary Cilium network policies:

Related image, diagram or screenshot

The test is successful if all created pods are in running state:

sudo k8s kubectl get pods -n cilium-test

Related image, diagram or screenshot

4. Canonical’s MLOps stack

4.1 Introduction to Canonical’s MLOps stack

Canonical’s MLOps stack provides a robust, production-ready platform for machine learning and deep learning workloads on Kubernetes. The suite is based on two key components, both deployed and managed using Juju:

● Charmed Kubeflow: Canonical's enterprise-ready MLOps platform

● Charmed MLflow: a lightweight platform for managing the ML lifecycle, including experimentation, reproducibility, and deployment

Deploying these charmed operators through Juju brings you the power of Kubernetes while simplifying the complex operational tasks associated with running a full-featured MLOps environment.

4.2 Installation

In this guide, the installation of Canonical’s MLOps stack is performed using the Juju operator lifecycle manager. This approach automates the deployment, scaling, integration, and management of the operators.

4.2.1 Juju setup

Install Juju configured Juju connect to the existing Canonical Kubernetes cluster:

sudo snap install juju

sudo k8s config | juju add-k8s ai-pod --client

juju bootstrap ai-pod

juju add-model kubeflow

4.2.2 Charmed Kubeflow deployment

Deploy the Charmed Kubeflow bundle from the 1.10/stable channel:

juju deploy kubeflow --trust --channel=1.10/stable

Wait until the deployed Juju applications are in an active state:

juju status --watch 5s

Related image, diagram or screenshot

4.2.3 Charmed MLflow deployment

Deploy Charmed MLflow separately and integrate it with Kubeflow:

juju deploy mlflow --channel=2.22/stable –trust

The integration with Kubeflow is done using the Resource dispatcher operator, which enables all Kubeflow users to access the MLflow model registry from their namespaces:

juju deploy resource-dispatcher --channel 2.0/stable --trust

#Integrate mlflow to the resource dispatcher

juju integrate mlflow-server:secrets resource-dispatcher:secrets

juju integrate mlflow-server:pod-defaults resource-dispatcher:pod-defaults

juju integrate mlflow-minio:object-storage kserve-controller:object-storage

juju integrate kserve-controller:service-accounts resource-dispatcher:service-accounts

juju integrate kserve-controller:secrets resource-dispatcher:secrets

#Integrate mlflow to the kubeflow dashboard

juju integrate mlflow-server:ingress istio-pilot:ingress

juju integrate mlflow-server:dashboard-links kubeflow-dashboard:links

Wait until the deployed Juju applications are in an active state (notice how now also mlflow-* applications are deployed):

juju status --watch 5s

Related image, diagram or screenshot

4.2.4 Accessing the Kubeflow dashboard

Expose the Kubeflow dashboard using the existing MetalLB load balancer configuration. You can determine the access IP by checking the external IP of the istio ingress gateway service.

Once that is complete, set up Dex authentication to allow access to a first user:

juju config dex-auth static-username=admin

juju config dex-auth static-password=admin

sudo k8s kubectl get svc -n kubeflow | grep istio-ingressgateway

You can verify that access to the dashboard has been successful through the assigned load-balancer IP.

Related image, diagram or screenshot

4.2.5 Jupyter notebook creation

From the Notebooks tab, select New Notebook to create a new notebook.

For this showcase, we're creating a Jupyter notebook using the kubeflownotebookswg/jupyter-pytorch-cuda-full:v1.9.2 image, for compatibility with the available Cuda version on the machine (12.2).

Related image, diagram or screenshot

As our notebook needs to be integrated with MLFlow, from Advanced Options, go to Configurations and allow access to Kubeflow Pipelines, MinIO, and MLflow from the dropdown menu:

Related image, diagram or screenshot

This will properly setup environment variables in the created jupyter notebook.

After clicking on the Launch button and waiting for the notebook status to become green, clicking on Connect will redirect to the notebook.

Related image, diagram or screenshot

4.3 Test showcase: MLOps pipeline setup

4.3.1 Kubeflow + MLflow model training

In the Jupyter notebook, install the prerequisites:

pip install mlflow==2.15.1 kserve==0.13.1 tenacity

Import the necessary packages:

import kfp

import mlflow

import os

import requests

from kfp.dsl import Input, Model, component

from kfp.dsl import InputPath, OutputPath, pipeline, component

from kserve import KServeClient

from mlflow.tracking import MlflowClient

from tenacity import retry, stop_after_attempt, wait_exponential

Create a pipeline component that downloads and saves the model as .csv:

@component(

base_image="python:3.11",

packages_to_install=["requests==2.32.3", "pandas==2.2.2"]

)

def download_dataset(url: str, dataset_path: OutputPath('Dataset')) -> None:

import requests

import pandas as pd

response = requests.get(url)

response.raise_for_status()

from io import StringIO

dataset = pd.read_csv(StringIO(response.text), header=0, sep=";")

dataset.to_csv(dataset_path, index=False)

Create a pipeline component that processes the dataset:

@component(

base_image="python:3.11",

packages_to_install=["requests==2.32.3", "pandas==2.2.2"]

)

def download_dataset(url: str, dataset_path: OutputPath('Dataset')) -> None:

import requests

import pandas as pd

response = requests.get(url)

response.raise_for_status()

from io import StringIO

dataset = pd.read_csv(StringIO(response.text), header=0, sep=";")

dataset.to_csv(dataset_path, index=False)

Create a pipeline component that trains the model. This starts an MLFlow training session, which will result in the model being trained and stored on MinIO, ready to be deployed:

@component(

base_image="python:3.11",

packages_to_install=["pandas==2.2.2", "scikit-learn==1.5.1", "mlflow==2.15.1", "pyarrow==15.0.2", "boto3==1.34.162"]

)

def train_model(dataset: InputPath('Dataset'), run_name: str, model_name: str) -> str:

import os

import mlflow

import pandas as pd

from sklearn.linear_model import ElasticNet

from sklearn.model_selection import train_test_split

df = pd.read_parquet(dataset)

target_column = "quality"

train_x, test_x, train_y, test_y = train_test_split(

df.drop(columns=[target_column]),

df[target_column], test_size=0.25,

random_state=42, stratify=df[target_column]

)

mlflow.sklearn.autolog()

with mlflow.start_run(run_name=run_name) as run:

mlflow.set_tag("author", "kf-testing")

lr = ElasticNet(alpha=0.5, l1_ratio=0.5, random_state=42)

lr.fit(train_x, train_y)

mlflow.sklearn.log_model(lr, "model", registered_model_name=model_name)

model_uri = f"{run.info.artifact_uri}/model"

print(model_uri)

return model_uri

Now create a pipeline component that deploys the trained model and serves it using Kserve, reserving a GPU for the served model:

@component(

base_image="python:3.11",

packages_to_install=["kserve==0.13.1", "kubernetes==26.1.0", "tenacity==9.0.0"]

)

def deploy_model_with_kserve(model_uri: str, isvc_name: str) -> str:

from kubernetes.client import V1ObjectMeta, V1ResourceRequirements

from kserve import (

constants,

KServeClient,

V1beta1InferenceService,

V1beta1InferenceServiceSpec,

V1beta1PredictorSpec,

V1beta1SKLearnSpec,

)

from tenacity import retry, wait_exponential, stop_after_attempt

isvc = V1beta1InferenceService(

api_version=constants.KSERVE_V1BETA1,

kind=constants.KSERVE_KIND,

metadata=V1ObjectMeta(

name=isvc_name,

annotations={"sidecar.istio.io/inject": "false"},

spec=V1beta1InferenceServiceSpec(

predictor=V1beta1PredictorSpec(

service_account_name="kserve-controller-s3",

sklearn=V1beta1SKLearnSpec(

storage_uri=model_uri

resources=V1ResourceRequirements(

limits={"nvidia.com/gpu": "1"}

)

client = KServeClient()

client.create(isvc)

@retry(

wait=wait_exponential(multiplier=2, min=1, max=10),

stop=stop_after_attempt(30),

reraise=True,

)

def assert_isvc_created(client, isvc_name):

assert client.is_isvc_ready(isvc_name), f"Failed to create Inference Service {isvc_name}."

assert_isvc_created(client, isvc_name)

isvc_resp = client.get(isvc_name)

isvc_url = isvc_resp['status']['address']['url']

print("Inference URL:", isvc_url)

return isvc_url

The pipeline can be configured to enable GPU allocation requests for a specific task:

def add_gpu_request(task: dsl.PipelineTask) -> dsl.PipelineTask:

"""Add a request field for a GPU to the container created by the PipelineTask object."""

return task.add_node_selector_constraint(accelerator="nvidia.com/gpu").set_accelerator_limit(

limit=1

)

Now, put all components together and define the pipeline:

ISVC_NAME = "wine-regressor5"

MLFLOW_RUN_NAME = "elastic_net_models"

MLFLOW_MODEL_NAME = "wine-elasticnet"

mlflow_tracking_uri = os.getenv('MLFLOW_TRACKING_URI')

mlflow_s3_endpoint_url = os.getenv('MLFLOW_S3_ENDPOINT_URL')

aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')

aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')

@pipeline(name='download-preprocess-train-deploy-pipeline')

def download_preprocess_train_deploy_pipeline(url: str):

download_task = download_dataset(url=url)

preprocess_task = preprocess_dataset(

dataset=download_task.outputs['dataset_path']

)

train_task = train_model(

dataset=preprocess_task.outputs['output_file'], run_name=MLFLOW_RUN_NAME, model_name=MLFLOW_MODEL_NAME

).set_env_variable(name='MLFLOW_TRACKING_URI', value=mlflow_tracking_uri)\

.set_env_variable(name='MLFLOW_S3_ENDPOINT_URL', value=mlflow_s3_endpoint_url)\

.set_env_variable(name='AWS_ACCESS_KEY_ID', value=aws_access_key_id)\

.set_env_variable(name='AWS_SECRET_ACCESS_KEY', value=aws_secret_access_key)

train_task = add_gpu_request(train_task)

deploy_task = deploy_model_with_kserve(

model_uri=train_task.output, isvc_name=ISVC_NAME

).set_env_variable(name='AWS_SECRET_ACCESS_KEY', value=aws_secret_access_key)

Now, let's run it:

client = kfp.Client()

url = 'https://raw.githubusercontent.com/canonical/kubeflow-examples/main/e2e-wine-kfp-mlflow/winequality-red.csv'

kfp.compiler.Compiler().compile(download_preprocess_train_deploy_pipeline, 'download_preprocess_train_deploy_pipeline.yaml')

run = client.create_run_from_pipeline_func(download_preprocess_train_deploy_pipeline, arguments={'url': url}, enable_caching=False)

The pipeline run will now appear in the Run tab in the Kubeflow UI:

Related image, diagram or screenshot

Once the pipeline is completed, the output of the last step will be the URL of the deployed model using Kserve:

Related image, diagram or screenshot

Which we can query from the Jupyter notebook:

kserve_client = KServeClient()

isvc_resp = kserve_client.get(ISVC_NAME)

inference_service_url = isvc_resp['status']['address']['url']

print("Inference URL:", inference_service_url)

input_data = {

"instances": [

[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4],

[7.8, 0.88, 0.0, 2.6, 0.098, 25.0, 67.0, 0.9968, 3.2, 0.68, 9.8]

]

}

response = requests.post(f"{inference_service_url}/v1/models/{ISVC_NAME}:predict", json=input_data)

print(response.text)

Related image, diagram or screenshot

Check that the GPU has correctly been assigned to the Kserve pod:

The pipeline compiled in the Jupyter notebook is shown in the UIs as a Kubeflow run and as an MLflow experiment, used for tracking parameters, metrics, artifacts, data, and environment configuration. Additionally, the ElasticNet regression model is also stored in the MLflow model registry, which enables model versioning, aliasing, tracking, and annotations.

To view the MLflow tracking UI, select MLflow from the Kubeflow central dashboard sidebar. Within Experiments, information about each experiment is available, including used datasets, hyperparameters, and model metrics:

Related image, diagram or screenshot

Clicking on the model, you can see information related to registered models, including descriptions, tags, and versions:

Related image, diagram or screenshot

Conclusion

When scaling your AI/ML workloads, it’s important you have a platform that empowers you to move quickly and efficiently.

Cisco AI PODs offer a robust, pre-integrated hardware layer including networking, built on a validated reference architecture. Canonical Kubernetes and Canonical’s MLOps stack make it easy to set up your system quickly and helps to ensure your GPUs are working at their full potential.

For more information

● Cisco Secure AI Factory with NVIDIA

● Cisco AI PODs Data Sheet

● Cisco UCS C885A M8 Rack Server Data Sheet

● Cisco Nexus Hyperfabric AI Infrastructure Data Sheet

● Cisco Hypershield Solution Overview

● Cisco AI Defense Data Sheet

● Cisco Intersight Platform

To find out more about what you can do with Canonical’s solutions, see the following further reading:

● Canonical Kubernetes documentation

● Charmed Kubeflow documentation

● Charmed MLflow documentation

Annex A: Cilium CNI testing yaml

---

metadata:

labels:

topology: any

component: network-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: echo-a-container

env:

- name: PORT

value: "8080"

ports:

- containerPort: 8080

image: quay.io/cilium/json-mock:v1.3.2@sha256:bc6c46c74efadb135bc996c2467cece6989302371ef4e3f068361460abaf39be

imagePullPolicy: IfNotPresent

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- localhost:8080

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- localhost:8080

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: services-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: echo-b-container

env:

- name: PORT

value: "8080"

ports:

- containerPort: 8080

hostPort: 40000

image: quay.io/cilium/json-mock:v1.3.2@sha256:bc6c46c74efadb135bc996c2467cece6989302371ef4e3f068361460abaf39be

imagePullPolicy: IfNotPresent

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- localhost:8080

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- localhost:8080

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: services-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: true

containers:

- name: echo-b-host-container

env:

- name: PORT

value: "21000"

ports: []

image: quay.io/cilium/json-mock:v1.3.2@sha256:bc6c46c74efadb135bc996c2467cece6989302371ef4e3f068361460abaf39be

imagePullPolicy: IfNotPresent

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- localhost:21000

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- localhost:21000

affinity:

podAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector:

matchExpressions:

- key: name

operator: In

values:

- echo-b

topologyKey: kubernetes.io/hostname

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: network-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: pod-to-a-container

ports: []

image: quay.io/cilium/alpine-curl:v1.5.0@sha256:7b286939730d8af1149ef88dba15739d8330bb83d7d9853a23e5ab4043e2d33c

imagePullPolicy: IfNotPresent

command:

- /bin/ash

- -c

- sleep 1000000000

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- echo-a:8080/public

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- echo-a:8080/public

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: network-check

traffic: external

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: pod-to-external-1111-container

ports: []

image: quay.io/cilium/alpine-curl:v1.5.0@sha256:7b286939730d8af1149ef88dba15739d8330bb83d7d9853a23e5ab4043e2d33c

imagePullPolicy: IfNotPresent

command:

- /bin/ash

- -c

- sleep 1000000000

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- https://1.1.1.1

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- https://1.1.1.1

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: policy-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: pod-to-a-denied-cnp-container

ports: []

image: quay.io/cilium/alpine-curl:v1.5.0@sha256:7b286939730d8af1149ef88dba15739d8330bb83d7d9853a23e5ab4043e2d33c

imagePullPolicy: IfNotPresent

command:

- /bin/ash

- -c

- sleep 1000000000

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- ash

- -c

- '! curl -s --fail --connect-timeout 5 -o /dev/null echo-a:8080/private'

livenessProbe:

timeoutSeconds: 7

exec:

command:

- ash

- -c

- '! curl -s --fail --connect-timeout 5 -o /dev/null echo-a:8080/private'

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: policy-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: pod-to-a-allowed-cnp-container

ports: []

image: quay.io/cilium/alpine-curl:v1.5.0@sha256:7b286939730d8af1149ef88dba15739d8330bb83d7d9853a23e5ab4043e2d33c

imagePullPolicy: IfNotPresent

command:

- /bin/ash

- -c

- sleep 1000000000

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- echo-a:8080/public

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- echo-a:8080/public

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: policy-check

traffic: external

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: pod-to-external-fqdn-allow-google-cnp-container

ports: []

image: quay.io/cilium/alpine-curl:v1.5.0@sha256:7b286939730d8af1149ef88dba15739d8330bb83d7d9853a23e5ab4043e2d33c

imagePullPolicy: IfNotPresent

command:

- /bin/ash

- -c

- sleep 1000000000

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- www.google.com

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- www.google.com

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: intra-node

component: nodeport-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

template:

metadata:

labels:

spec:

hostNetwork: false

containers:

- name: pod-to-b-intra-node-nodeport-container

ports: []

image: quay.io/cilium/alpine-curl:v1.5.0@sha256:7b286939730d8af1149ef88dba15739d8330bb83d7d9853a23e5ab4043e2d33c

imagePullPolicy: IfNotPresent

command:

- /bin/ash

- -c

- sleep 1000000000

terminationMessagePolicy: FallbackToLogsOnError

readinessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- echo-b-host-headless:31414/public

livenessProbe:

timeoutSeconds: 7

exec:

command:

- curl

- -sS

- --fail

- --connect-timeout

- "5"

- -o

- /dev/null

- echo-b-host-headless:31414/public

affinity:

podAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector:

matchExpressions:

- key: name

operator: In

values:

- echo-b

topologyKey: kubernetes.io/hostname

selector:

matchLabels:

replicas: 1

apiVersion: apps/v1

kind: Deployment

---

metadata:

labels:

topology: any

component: network-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

ports:

- name: http

port: 8080

type: ClusterIP

selector:

apiVersion: v1

kind: Service

---

metadata:

labels:

topology: any

component: services-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

ports:

- name: http

port: 8080

nodePort: 31414

type: NodePort

selector:

apiVersion: v1

kind: Service

---

metadata:

labels:

topology: any

component: services-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

ports:

- name: http

port: 8080

type: ClusterIP

selector:

clusterIP: None

apiVersion: v1

kind: Service

---

metadata:

labels:

topology: any

component: services-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

ports: []

type: ClusterIP

selector:

clusterIP: None

apiVersion: v1

kind: Service

---

metadata:

labels:

topology: any

component: policy-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

endpointSelector:

matchLabels:

egress:

- toPorts:

- ports:

- port: "53"

protocol: ANY

toEndpoints:

- matchLabels:

k8s:io.kubernetes.pod.namespace: kube-system

k8s:k8s-app: coredns

- matchLabels:

k8s:io.kubernetes.pod.namespace: kube-system

k8s:k8s-app: node-local-dns

- toPorts:

- ports:

- port: "5353"

protocol: UDP

toEndpoints:

- matchLabels:

k8s:io.kubernetes.pod.namespace: openshift-dns

k8s:dns.operator.openshift.io/daemonset-dns: default

apiVersion: cilium.io/v2

kind: CiliumNetworkPolicy

---

metadata:

labels:

topology: any

component: policy-check

traffic: internal

quarantine: "false"

type: autocheck

spec:

endpointSelector:

matchLabels:

egress:

- toPorts:

- ports:

- port: "8080"

protocol: TCP

toEndpoints:

- matchLabels:

- toPorts:

- ports:

- port: "53"

protocol: ANY

toEndpoints:

- matchLabels:

k8s:io.kubernetes.pod.namespace: kube-system

k8s:k8s-app: coredns

- matchLabels:

k8s:io.kubernetes.pod.namespace: kube-system

k8s:k8s-app: node-local-dns

- toPorts:

- ports:

- port: "5353"

protocol: UDP

toEndpoints:

- matchLabels:

k8s:io.kubernetes.pod.namespace: openshift-dns

k8s:dns.operator.openshift.io/daemonset-dns: default

apiVersion: cilium.io/v2

kind: CiliumNetworkPolicy

---

metadata:

labels:

topology: any

component: policy-check

traffic: external

quarantine: "false"

type: autocheck

spec:

endpointSelector:

matchLabels:

egress:

- toFQDNs:

- matchPattern: '*.google.com'

- toPorts:

- ports:

- port: "53"

protocol: ANY

rules:

dns:

- matchPattern: '*'

toEndpoints:

- matchLabels:

k8s:io.kubernetes.pod.namespace: kube-system

k8s:k8s-app: coredns

- matchLabels:

k8s:io.kubernetes.pod.namespace: kube-system

k8s:k8s-app: node-local-dns

- toPorts:

- ports:

- port: "5353"

protocol: UDP

rules:

dns:

- matchPattern: '*'

toEndpoints:

- matchLabels:

k8s:io.kubernetes.pod.namespace: openshift-dns

k8s:dns.operator.openshift.io/daemonset-dns: default

apiVersion: cilium.io/v2

kind: CiliumNetworkPolicy

Deployment Guide: Cisco AI PODs with
Canonical Kubernetes and Canonical MLOps Stack
White Paper

Available Languages

Download Options

Bias-Free Language

Available Languages

Download Options

Table of Contents

Learn more