This document describes the Transmission Control Protocol (TCP) Optimization feature on Cisco IOS® XE SD-WAN routers, which was introduced in 16.12 release in August 2019. The topics covered are prerequisites, problem description, solution, the differences in TCP optimization algorithms between Viptela OS (vEdge) and XE SD-WAN (cEdge), configuration, verification and list of related documents.
There are no specific requirements for this document.
The information in this document is based on Cisco IOS® XE SD-WAN.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
High latency on a WAN link between two SD-WAN sides causes bad application performance. You have critical TCP traffic, which must be optimized.
When you use TCP Optimization feature, you improve the average TCP throughput for critical TCP flows between two SD-WAN sites.
Take a look at the overview and differences between TCP Optimization on cEdge Bottleneck Bandwidth and Round-trip (BBR) and vEdge (CUBIC)
Fast BBR propagation time algorithm is used in the XE SD-WAN implementation (on cEdge).
Viptela OS (vEdge) has a different, older algorithm, called CUBIC.
CUBIC takes mainly packet loss into consideration and is widely implemented across different client operating systems. Windows, Linux, MacOS, Android already have CUBIC built-in. In some cases, where you have old clients running TCP stack without CUBIC, enabling TCP optimization on vEdge brings improvements. One of the examples, where vEdge TCP CUBIC optimization benefited, is on submarines that use old client hosts and WAN links experiencing significant delays/drops. Note that only vEdge 1000 and vEdge 2000 support TCP CUBIC.
BBR is mainly focused on round-trip time and latency. Not on packet loss. If you send packets from US West to East coast or even to Europe across the public internet, in the majority of the cases you don't see any packet loss. Public internet is sometimes too good in terms of packet loss. But, what you see is delay/latency. And this problem is addressed by BBR, which was developed by Google in 2016.
In a nutshell, BBR models the network and looks at each acknowledgment (ACK) and updates max Bandwidth (BW) and minimum Round Trip Time (RTT). Then control sending is based on model: probe for max BW and min RTT, pace near estimate BW and keep inflight near Bandwidth-Delay-Product (BDP). The main goal is to ensure high throughput with a small bottleneck queue.
This slide from Mark Claypool shows the area, where CUBIC operates:
BBR operates in a better place, which is shown in this slide also from Mark Claypool:
If you want to read more about the BBR algorithm, you can find several publications about BBR linked at the top of the bbr-dev mailing list home page Here.
Platform & Algorithm
Key input parameter
cEdge (XE SD-WAN): BBR
Critical TCP traffic between two SD-WAN sites
vEdge (Viptela OS): CUBICP
Old clients without any TCP optimization
Supported XE SD-WAN Platforms
In the XE SD-WAN SW Release 16.12.1d, these cEdge platforms support TCP Opmitization BBR:
CSR1000v with 8 vCPU and min. 8 GB RAM
All platforms with DRAM less than 8 GB RAM are currently not supported.
All platforms with 4 or less data cores are currently not supported.
TCP Optimization does not support MTU 2000.
Currently - no support for IPv6 traffic.
Optimization for DIA traffic to a 3rd party BBR server not supported. You need to have a cEdge SD-WAN routers on both sides.
In the data center scenario currently, only one Service Node (SN) is supported per one Control Node (CN).
Note: ASR1k does not currently support TCP Optimization. However, there is a solution for ASR1k, where the ASR1k send TCP traffic via AppNav tunnel (GRE encapsulated) to an external CSR1kv for optimization. Currently (Nov. 2019) only one CSR1k as external service node is supported. This is described later in the configuration section.
This table summarizes caveats per release and underlines supported hardware platforms:
In 16.12.1 AppQoE FIA is not enabled on internet interface
In 16.12.1 AppQoE FIA is not enabled on internet interface
Single Edge Router
Need to support multiple SN
Multiple Edge Routers
Needs flow symmetry or Appnav flow sync. 16.12.1 not tested with
vManage enhancement to accept multiple SN IPs
Full Mesh Network
TCP Opt with BBR
Only 4300 & CSR
All but ISR1100
A concept of SN and CN is used for TCP Optimization:
SN is a daemon, which is responsible for the actual optimization of TCP flows.
CN is known as AppNav Controller and is responsible for traffic selection and transport to/from SN.
SN and CN can run on the same router or separated as different nodes.
There are two main use cases:
Branch use case with SN and CN running on the same ISR4k router.
Data Center use case, where CN runs on ASR1k and SN runs on a separate CSR1000v virtual router.
Both use cases are described in this section.
Use Case 1. Configure TCP Optimization on a Branch (all in one cEdge)
This image shows the overall internal architecture for a single standalone option at a branch:
Step1. In order to configure TCP optimization, you need to create a feature template for TCP Optimization in vManage. Navigate to Configuration > Templates > Feature Templates > Other Templates > AppQoE as shown in the image.
Step 2. Add the AppQoE feature template to the appropriate device template under Additional Templates:
Here is the CLI preview of the Template Configuration:
Use Case 2. Configure TCP Optimization in Data Center with an External SN
The main difference to the branch use case is the physical separation of SN and CN. In the all-in-one branch use case, the connectivity is done within the same router using Virtual Port Group Interface. In the data center use case, there is a AppNav GRE-encapsulated tunnel between ASR1k acting as CN and external CSR1k running as SN. There is no need for a dedicated link or cross-connect between CN and external SN, simple IP reachability is enough.
There is one AppNav (GRE) tunnel per SN. For future use, where multiple SNs are supported, it is recommended to use /28 subnet for the network between CN and SN.
Two NICs are recommended on a CSR1k acting as SN. 2nd NIC for SD-WAN controller is needed if SN has to be configured / managed by vManage. If SN is going to be manually configured/managed, then 2nd NIC is optional.
This image shows Data Center ASR1k running as CN and CSR1kv as Service Node SN :
The topology for the data center use case with ASR1k and external CSR1k is shown here:
This AppQoE feature template shows ASR1k configured as Controller:
CSR1k configured as external Service Node is shown here:
Failover in the data center use case with CSR1k acting as SN, in case of external CSR1k failure:
TCP sessions that already exist are lost because the TCP session on SN is terminated.
New TCP sessions are sent to the final destination, but TCP traffic is not optimized (bypass).
No blackholing for interesting traffic in case of SN failure.
Failover detection is based on AppNav heartbeat, which is 1 beat per second. After 3 or 4 errors, the tunnel is declared as down.
Failover in the branch use case is similar - in case of SN failure, the traffic is sent non-optimized directly to the destination.
Use this section in order to confirm that your configuration works properly.
Verify TCP Optimization on CLI with the use of this CLI command and see the summary of the optimized flows: