Cisco Catalyst SD-WAN Network Configuration Guide, Releases 26.x and Later

PDF

Per packet load balancing

Want to summarize with AI?

Log in

Describes how PPL improves network bandwidth and performance by distributing and reordering packets across multiple WAN links, unlike traditional load balancing methods.


Per-packet load balancing (PPL) or adaptive single flow distribution is a bandwidth aggregation feature that

  • distributes packets from a single flow across multiple WAN links,

  • maximizes the use of all available paths, and

  • maintains data integrity by reordering packets at the destination.

Traditional load balancing vs PPL

Traditional load balancing uses one network link for each data stream, which can limit overall bandwidth use. PPL shares traffic across multiple available links, allowing you to get the most out of your network and improve performance.

Reorder packets

When a device distributes packets over multiple links, packets may arrive at the destination out of order. The PPL feature automatically reorders these packets on a best-effort basis and releases any buffered packets based on either a time threshold or a memory threshold.


Key concepts for understanding PPL

Explains how PPL efficiently distributes traffic across multiple WAN tunnels by dynamically selecting the best paths based on latency, flowlets, and network policies.

Sender and receiver

The sender is the WAN edge device that sends PPL traffic. The receiver is the WAN edge device that receives this traffic and may reorder packets if they arrive out of order.

Traffic selection

Defines which specific traffic from the service VPNs use PPL, enabling detailed control through network policies. You can configure this using data policies in the "from-service" direction, and any traffic matched in the policy’s match clause becomes eligible for PPL.

Inter-tunnel latency measurement

PPL measures the delay across all available tunnels and selects the tunnel with the lowest latency as a reference point. It compares other tunnels to this reference and includes only those within a set latency range for traffic distribution. This method is a Cisco proprietary algorithm. It ensures efficient and reliable path selection for your network traffic.

Flow splitting

Instead of splitting traffic into individual packets, PPL groups packets into flowlets, which are small sets of packets. You create flowlets either after sending a set number of packets or when you detect natural breaks (packet gaps) in the traffic.

Packet gap

Packet gap is a method used in PPL to detect natural breaks in a flow of traffic. When there is an idle period between bursts of packets that exceeds the maximum latency difference among available tunnels, a new flowlet is started and sent on a different tunnel.

Candidate tunnel group (CTG)

The group of tunnels available for PPL, selected first by “local colors” and then filtered using metrics like latency and loss. A maximum of eight tunnels can be part of the CTG.

Path selection

Determines how each flowlet is sent across the reasonably available tunnels in the CTG, using round-robin distribution.


Sequence of events for PPL

The sender device receives traffic from service-side VPNs and checks it against the traffic data policy for PPL rules. If matched, the sender splits the flow into flowlets and forwards them over reasonably available tunnels in the Candidate Tunnel Group. The receiver device collects and re-orders the flowlets as needed, then forwards the reordered flows into the appropriate service VPNs.

See the illustration to understand the sequence of events.


Benefits of PPL in Cisco SD-WAN

Benefits of PPL:
  • Adaptibility: Adjusts quickly to changes in network conditions to keep performance steady.

  • Bandwidth and resource utilization: Combines the capacity of multiple WAN links for higher data transfer rates.

    Uses all available network paths to make the optimal usage of bandwidth.

  • Performance: Increases speed and efficiency for elephant flows.

    Supports applications that need more bandwidth than a single link can provide.


Use cases for PPL

Scenario 1: Managing AI traffic between offices and data centers

AI tasks often involve sending large amounts of information from local offices to a central data center for processing. At the same time, users need quick, real-time responses from AI tools. This traffic is unpredictable and comes in sudden bursts. When all this data is forced through a single network path, it can create network congestion, leading to slow AI performance and delayed data uploads.

How does PPL help?

PPL breaks these large data streams into smaller, manageable pieces and sends them across all available network connections at once. This prevents any single connection from becoming overloaded.

Scenario 2: High-volume workload transfers and backups in enterprise environments

Enterprise environments routinely perform large-scale data transfers for activities such as workload upgrades, Windows operating system or application updates, data center backups, and disaster recovery operations. These scenarios often require moving significant amounts of data between endpoints. Such transfers can easily exceed the capacity of individual WAN circuits, especially when using multiple mid-sized links.

How does PPL help?

PPL splits large, single flows into multiple flowlets and distributes them across all available WAN circuits. This approach maximizes total available bandwidth, prevents any single link from being overwhelmed, and ensures that even deprioritized or non-critical traffic, like updates and backups, completes efficiently without impacting business-critical applications.


Supported platforms for PPL

These tables outline the supported hardware models and their corresponding DRAM requirements for Receiver and Sender PPL roles.

Table 1. Receiver PPL supported platforms

Model

Minimum DRAM

C8500-12X4QC

32 GB

C8500-20X6C

64 GB

C8570-G2

32 GB

C8300-2N2S-6T

16 GB

C8300-2N2S-4T2X

16 GB

Table 2. Sender PPL supported platforms

Model

Minimum DRAM

C8300-2N2S-4T2X 16 GB
C8300-2N2S-6T 16 GB
C8500L-8S4X 15 GB
C8475-G2 32 GB
C8455-G2 32 GB
C8500-12X4QC 16 GB
C8500-12X 16 GB
C8550-G2 32 GB
Note

Receiver platforms can act as senders, but sender platforms cannot act as receivers.


Restrictions for PPL

Restrictions for PPL:

  • Maximum senders: A receiver can only support up to 256 senders.

  • Maximum receivers: A sender can only support up to 512 receivers.

  • PPL data policy: You can configure PPL data policy action only in the ‘from-service’ direction.

    Do not use packet duplication (packet dup) or forward error correction (FEC) actions with PPL.

  • SLA classes: Limit SLA classes to 14 to reserve one for PPL. If 15 SLA classes are configured, remove one and reload the device before configuring PPL.

  • Multicast traffic: Multicast traffic is not supported with PPL.

  • Service insertion: Service insertion is not supported with PPL.

  • Parallel routing configurations: Sender cannot be configured with any parallel routing decisions such as next-hop, and remote-TLOC through data policy when PPL is enabled.