Performance Tuning for Cisco UCS C225 M6 and C245 M6 Rack Servers with 3rd Gen AMD EPYC Processors White Paper

White Paper

Available Languages

Download Options

PDF (1.9 MB)
View with Adobe Reader on a variety of devices

Updated:November 9, 2022

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Document purpose and scope

The BIOS tests and initializes the hardware components of a system and boots the operating system from a storage device. A typical computational system has several BIOS settings that control the system’s behavior. Some of these settings are directly related to the performance of the system.

This document explains the BIOS settings that are valid for the Cisco Unified Computing System (Cisco UCS) servers with 3^rd Gen AMD EPYC processors: the Cisco UCS C225 M6 and C245 M6 Rack Servers using the AMD EPYC processor. It describes how to optimize the BIOS settings to meet requirements for best performance and energy efficiency for the Cisco UCS C225 M6 and C245 M6 servers.

This document also discusses the BIOS settings that can be selected for various workload types on Cisco UCS C225 M6 and C245 M6 servers that use 3^rd Gen AMD EPYC CPUs. Understanding the BIOS options will help you select appropriate values to achieve optimal system performance.

This document does not discuss the BIOS options for specific firmware releases of Cisco UCS servers. The settings demonstrated here are generic.

What you will learn

The process of setting performance options in your system BIOS can be daunting and confusing, and some of the options you can choose are obscure. For most options, you must choose between optimizing a server for power savings or for performance. This document provides some general guidelines and suggestions to help you achieve optimal performance from your Cisco UCS C225 M6 and C245 M6 Servers that uses 3^rd Gen AMD EPYC family CPUs.

AMD EPYC 7003 Series processors

The AMD EPYC 7003 Series processors are built with innovative Zen 3 cores and AMD Infinity architecture. The AMD EPYC system-on-a-chip (SoC) offers a consistent set of features across 8 to 64 cores. Each 3rd Gen EPYC processor consists of up to 8 core Complex Dies (CCDs) and an I/O Die (IOD). Each CCD contains one Core Complex (CCX), so each CCD contains up to 8 Zen 3 cores. Using AMD Infinity Fabric, the CCDs connect to the I/O Die (IOD) to access memory, I/O, and each other. Up to 8 memory channels, 4 TB of high-speed memory per socket, and 128 lanes of PCIe Gen 4 are supported.

In 2-socket systems, two EPYC 7003 Series SoCs are connected through their corresponding Infinity Fabric, or External Global Memory interconnect (xGMI), links. This design creates a high-bandwidth, low-latency interconnect between the two processors.

AMD EPYC 7003 Series processors are built with the specifications listed is listed below.

Table 1. AMD EPYC 7003 Series specifications


Cores process technology	7 nanometer (nm)
Maximum number of cores	64
Maximum memory speed	3200 mega-transfers per second (MT/s)
Maximum memory channels	8 per socket
Maximum memory capacity	4 TB per socket
PCI	128 lanes (maximum) PCIe Gen 4

For more information about the AMD EPYC 7003 Series processors microarchitecture, see Overview of AMD EPYC 7003 Series Processors Microarchitecture.

Non-Uniform Memory Access (NUMA) topology

The AMD EPYC 7003 Series processors use a Non-Uniform Memory Access (NUMA) memory architecture. The four logical quadrants in the IOD of an AMD EPYC 7003 Series processor, as described in the previous section, allow the processor to be partitioned into different NUMA domains.

Using system BIOS settings, users can optimize this NUMA topology for their specific operating environments and workloads with the NUMA Nodes per Socket (NPS) BIOS setting. Each server can be configured with NPS = 1 (NPS1), NPS = 2 (NPS2), NPS = 4 (NSP4), or NPS = 0 (NPS0; not recommended), with an additional option to configure the Layer 3 cache as NUMA (L3CAN). Not all AMD EPYC 7003 Series processors can be set to NPS4. Some CPUs have only six CCDs per socket, in which case only NPS1 and NPS2 are recommended.

AMD Rome and Milan 64-core CPUs can support three NUMA domains for which the BIOS can be configured as shown below:

● 1 node per socket (entire node)

● 2 nodes per socket (black dotted line)

● 4 nodes per socket (red dotted line)

Macintosh HD:Users:sandygraul:Documents:ETMG:Cisco:221320_Cisco:1_Performance Tuning for 3rd Gen AMD EPYC Processors based Cisco UCS Servers:art:fig01_AMD-Rome-and-Milan-processor-block-diagram.jpg

Figure 1.

AMD Rome and Milan processor block diagram with NUMA domains

NPS1

NPS1 indicates a single NUMA node per socket (or CPU). This setting configures all memory channels on the processor into a single NUMA domain: that is, all the cores on the processor, all memory connected to it, and all PCIe devices connected to the processor are in one NUMA domain. Memory is then interleaved across all eight memory channels on the processor.

NPS2

In NPS2, the processor is partitioned into two NUMA domains. Half the cores and half the memory channels of each processor are grouped together into one NUMA domain, and the remaining cores and memory channels are grouped into a second domain. Memory is interleaved across the four memory channels in each NUMA domain.

NPS4

NPS4 partitions the processor into four NUMA domains. Each logical quadrant of the processor is configured as its own NUMA domain. Memory is interleaved across the two memory channels in each quadrant. PCIe devices will be local to one of the four NUMA domains on the processor depending on the quadrant of the IOD that has the PCIe root for that device.

NPS0 (not recommended)

NPS0 interleaves memory access all memory channels on a two-socket system. This configuration should generally be avoided, because it adds intersocket latency to every memory access.

Layer 3 cache as NUMA Domain

In addition to the NPS settings, one more BIOS option for changing NUMA configurations is available. With the Layer 3 Cache as NUMA (L3CAN) option, each Layer 3 cache (one per CCD) is exposed as its own NUMA node. For example, a single processor with eight CCDs would have 8 NUMA nodes: one for each CCD. In this case, a two-socket system would have a total of 16 NUMA nodes.

BIOS tuning scenarios

This document focuses on two main scenarios: tuning the BIOS for general-purpose workloads and tuning the BIOS for enterprise workloads.

Tuning for general-purpose workloads

With the latest multiprocessor, multicore, and multithreading technologies in conjunction with current operating systems and applications, today's Cisco UCS servers based on the 3^rd Gen AMD EPYC processor deliver the highest levels of performance, as demonstrated by the numerous industry-standard benchmark publications from the Standard Performance Evaluation Corporation (SPEC).

Cisco UCS servers with standard settings already provide an optimal ratio of performance to energy efficiency. However, through BIOS settings you can further optimize the system with higher performance and less energy efficiency. Basically, this optimization operates all the components in the system at the maximum speed possible and prevents the energy-saving options from slowing down the system. In general, optimization to achieve greater performance is in most cases associated with increased consumption of electrical power. This document explains how to configure the BIOS settings to achieve optimal computing performance.

Tuning for enterprise workloads

With the evolution of computer architecture, performance has reached results that were unimaginable a few years ago. However, the complexity of modern computer architectures requires end users and developers to know how to write code. It also requires them to know how to configure and deploy software for a specific architecture to get the most out of it.

Performance tuning is difficult and general recommendations are problematic. This document tries to provide insights into optimal BIOS settings and OS tunings that have an impact on overall system performance. This document does not provide generic rule-of-thumb values to be used for performance tuning. The finest tuning of the parameters described requires a thorough understanding of the enterprise workloads and the Cisco UCS platform on which they run.

Cisco UCS BIOS options

This section describes the options you can configure in the Cisco UCS BIOS.

Processor configuration

You can configure the processor and BIOS settings through the Cisco Integrated Management Controller (IMC), Cisco UCS Manager and Cisco Intersight platform for a standalone system and Cisco Intersight Managed Mode systems. The BIOS settings for memory, processor, and power are on the Memory, Power/Performance, and Processor tabs. The BIOS tokens specific to AMD servers in Cisco Intersight Managed Mode is highlighted in red. Figure 2, Figure 3, and Figure 4 show the BIOS settings available using the IMC, Figure 5, Figure 6, Figure 7, Figure 8, and Figure 9 show the BIOS settings available using the Cisco Intersight platform and Figure 10 shows the BIOS settings available through Cisco UCS Manager for 3^rd Gen AMD EPYC processors.

Graphical user interface, applicationDescription automatically generated

Figure 2.

BIOS tokens for Memory available for configuration through Cisco IMC (screen 1)

Graphical user interface, text, application, emailDescription automatically generated

Figure 3.

BIOS tokens for Power and Performance available for configuration through Cisco IMC (screen 2)

Graphical user interface, applicationDescription automatically generated

Figure 4.

BIOS tokens for Processor available for configuration through Cisco IMC (screen 3)

A screenshot of a computerDescription automatically generated