This document describes how to troubleshoot issues with Optimal Gateway Selection (OGS). OGS is a feature that can be used in order to determine which gateway has the lowest Round Trip Time (RTT) and connect to that gateway. One can use the OGS feature in order to minimize latency for Internet traffic without user intervention. With OGS, Cisco AnyConnect Secure Mobility Client (AnyConnect) identifies and selects which secure gateway is best for connection or reconnection. OGS begins upon first connection or upon a reconnection at least four hours after the previous disconnection. More information can be found in the Administrator's guide.
How does OGS work?
A simple Internet Control Message Protocol (ICMP) ping request does not work because many Cisco Adaptive Security Appliance (ASA) firewalls are configured in order to block ICMP packets to prevent discovery. Instead, the client sends three HTTP/443 requests to each headend that appears in a merge of all profiles. These HTTP probes are referred to as OGS pings in the logs, but, as explained earlier, they are not ICMP pings. In order to ensure that a (re)connection does not take too long, OGS selects the previous gateway by default if it does not receive any OGS ping results within seven seconds. (Look for OGS ping results in the log.)
Once the calculation is finished, the results are stored in the preferences_global file. There have been issues with this data not being stored in the file before.
Refer to Cisco bug ID CSCtj84626 for more details.
OGS caching works on a combination of the DNS domain and the individual DNS server IP addresses. It works as follows:
- Location A has a DNS domain of locationa.com, and two DNS server IP addresses - ip1 and ip2. Each domain/IP combination creates a cache key that points to an OGS cache entry. For example:
- locationa.com|ip1 -> ogscache1
- locationa.com|ip2 -> ogscache1
- If AnyConnect then connects to a physically-different network, the same buildup of domain/IP combinations is created and checked against the cached list. If there are any matches at all, that OGS cache value is used, and the client is still considered to be at location A.
Here are some failure scenarios users might encounter:
When Connectivity to the Gateway is Lost
When OGS is used, if connectivity to the gateway to which the users are connected is lost, then AnyConnect connects to the servers in the backup server listandnot to the next OGS host. The order of operations is as follows:
- OGS contacts only the primary servers in order to determine the optimal one.
- Once determined, the connection algorithm is:
- Attempt to connect to the optimal server.
- If that fails, try the optimal server?s backup server list.
- If that fails, try each server that remains in the OGS selection list, ordered by its selection results.
Resume After a Suspend
In order for OGS to run after a resume, AnyConnect must have had a connection established when the machine was put to sleep. OGS after a resume is only performed after the network environment test occurs, which is meant to confirm that network connectivity is available. This test includes a DNS connectivity subtest.
However, if the DNS server drops type A requests with an IP address in the query field, as opposed to replying with "name not found" (the more common case, always encountered during tests), then Cisco bug ID CSCti20768 "DNS query of type A for IP address, should be PTR to avoid timeout" applies.
TCP Delayed-ACK Window Size Selects Incorrect Gateway
When ASA versions prior to Version 9.1(3) are use, the captures on the client show a persistent delay in the SSL handshake. What is noticed is that the client sends its ClientHello, then the ASA sends its ServerHello. This is normally followed by a Certificate message (optional Certificate Request) and ServerHelloDone message. The anomaly is two-fold:
- The ASA does not immediately send the Certificate message after the ServerHello. The client window size is 64,860 bytes, which is more than enough to hold the entire response from the ASA.
- The client does not ACK the ServerHello immediately, so the ASA retransmits the ServerHello after ~120ms, at which point the client ACKs the data. Then the Certificate message is sent. It is almost as though the client waits for more data.
This happens because of the interaction between TCP slow-start and TCP delayed-ACK. Prior to ASA Version 9.1(3), the ASA uses a slow-start window size of 1, whereas the Windows client uses a delayed-ACK value of 2. This means that the ASA only sends one data packet until it gets an ACK, but it also means that the client does not send an ACK until it receives two data packets. The ASA times out after 120ms and retransmits the ServerHello, after which the client ACKs the data and the connection continues. This behavior was changed by Cisco bug ID CSCug98113 so that the ASA uses a slow start window size of 2 by default instead of 1.
This can impact OGS calculation when:
- Different gateways run different ASA versions.
- Clients have different delayed-ACK window sizes.
In such situations, the delay introduced by the delayed-ACK could be sufficient to cause the client to select the wrong ASA. If this value differs between the client and the ASA, there could still be problems. In such situations, the workaround is to adjust the Delayed Acknowledgements window size.
- Start the Registry Editor.
- Identify the GUID of the interface on which you want to disable the delayed-ACK. In order to do this, navigate to:
HKEY_LOCAL_MACHINE > SOFTWARE > Microsoft > WindowsNT > CurrentVersion > NetworkCards > (number).
Look at each number listed under NetworkCards. On the right-hand side, the Description should list the Interface (for example, Intel(R) Wireless WiFi Link 5100AGN) and the ServiceName should list the corresponding GUID.
- Locate and then click this registry subkey:
- On the Edit menu, point to New, and then click DWORD Value.
- Name the new value TcpAckFrequency, and assign it a value of 1.
- Quit Registry Editor.
- Restart Windows for this change to take effect.
Typical User Example
The most common use case is when a user at home runs OGS the first time, it records the DNS settings and the OGS ping results in the cache (defaults to a 14-day timeout). When the user returns home the next evening, OGS detects the same DNS settings, finds it in the cache, and skips the OGS ping test. Later, when the user goes to a hotel or restaurant that offers Internet service, OGS detects different DNS settings, runs the OGS ping tests, selects the best gateway, and records the results in the cache.
The processing is identical when it resumes from a suspended or hibernated state, if the OGS and AnyConnect resume settings allow for it.
Step 1. Clear the OGS Cache in Order to Force a Reevaluation
In order to clear the OGS cache and reevaluate the RTT for available gateways, simply delete the Global AnyConnect Preferences file from the PC. The location of the file varies based on the Operating System (OS):
- Windows Vista and Windows 7
C:\ProgramData\Cisco\Cisco AnyConnect VPN Client\preferences_global.xml
- Windows XP
C:\Documents and Settings\AllUsers\Application Data\Cisco\Cisco AnyConnect VPN
- Mac OS X
Step 2. Capture the Server Probes During the Connection Attempt
- Start Wireshark on the test machine.
- Start a connection attempt on AnyConnect.
- Stop the Wireshark capture once the connection is complete.
Step 3. Verify the Gateway Selected by OGS
In order to verify why OGS selected a particular gateway, complete these steps:
- Initiate a new connection.
- Run AnyConnect DART:
- Launch AnyConnect, and click Advanced.
- Click Diagnostics.
- Click Next.
- Click Next.
- Examine the DART results found in the newly created DartBundle_XXXX_XXXX.zip file on the desktop.
- Navigate to Cisco AnyConnect Secure Mobility Client > AnyConnect.txt.
- Note the time the OGS probes started for a particular server from this DART log:
Date : 10/04/2013
Time : 14:21:27
Type : Information
Source : acvpnui
Description : Function: CHeadendSelection::CSelectionThread::Run
OGS starting thread named gw2.cisco.com
Usually they should be around the same time, but in case the captures are large, the time stamp helps narrow down which packets are the HTTP probes and which ones are the actual connection attempt.
- Once AnyConnect sends three probes to the server, this message is generated with the results for each of the probes:
******************************************It is important to pay attention to these three values, because they must match the capture results.
Date : 10/04/2013
Time : 14:31:37
Type : Information
Source : acvpnui
Description : Function: CHeadendSelection::CSelectionThread::logThreadPingResults
OGS ping results for gw2.cisco.com: (219 218 132 )
- Look for the message that contains "*** OGS Selection Results***" in order to see the evaluated RTT, and if the most recent connection attempt was the result of a cached RTT or a new calculation.
Here is an example:
Date : 10/04/2013
Time : 12:29:38
Type : Information
Source : vpnui
Description : Function: CHeadendSelection::logPingResults
*** OGS Selection Results ***
OGS performed for connection attempt. Last server: 'gw2.cisco.com'
Results obtained from OGS cache. No ping tests were performed.
Server Address RTT (ms)
gw2.cisco.com 132 <========= As seen, 132 was the lowest delay
of the three probes from the previous DART log
Selected 'gw2.cisco.com' as the optimal server.
Step 4. Validate the OGS Calculations Run by AnyConnect
Inspect the capture for the TCP/SSL probes used in order to calculate RTT. See how long the HTTPS request takes over a single TCP connection. Each probe request should use a different TCP connection. In order to do this, open the capture in Wireshark, and repeat these steps for each of the servers:
- Use the ip.addr filter in order to isolate the packets sent to each of the servers into their own capture. In order to do this, navigate to Edit, and select Mark All Displayed Packets. Then navigate to File > Save As, select the Marked packets only option, and click Save:
- In this new capture, navigate to View > Time Display Format > Date and Time of Day:
- Identify the first HTTP SYN packet in this capture that was sent when the OGS probe was sent based on the DART logs as identified in Step 3.3.2. It is important to remember that, for the first server, the first HTTP request is not a server probe. It is easy to mistake the first request for a server probe, and thus arrive at values completely different from what OGS reports. This problem is highlighted here:
- In order to more easily identify each of the probes, right-click the HTTP SYN for the first probe, and then select Colorize Conversation as shown here:
Repeat this process for the SYNs on all of the probes. As shown in the previous image, the first two probes are depicted in different colors. The advantage of colorizing the TCP conversations is to easily spot retransmissions or other such oddities per probe.
- In order to change the time display, navigate to View > Time Display Format > Seconds Since Epoch:
Select Milliseconds, because that is the level of precision that OGS uses.
- Calculate the time difference between the HTTP SYN and the FIN/ACK, as shown in the diagram of Step 4. Repeat this process for each of the three probes, and compare the values to those shown in the DART logs in Step 3.3.3.
If after the analysis of the captures, the determined RTT values are calculated and compared to the values seen in the DART logs and everything is found to match up, but it still seems like the wrong gateway is being selected, then it is due to one of two problems:
- There is an issue on the headend. If this is the case, there might be too many retransmissions from one particular headend, or any other such oddities seen in the probes. A closer analysis of the exchange is required.
- There is a problem with the Internet Service Provider (ISP). If this is the case, there might be fragmentation or large delays seen for one particular headend.
Q: Does OGS work with load-balancing?
A: Yes. OGS is only aware of the cluster master name, and uses that in order to judge the nearest headend.
Q: Does OGS work with the proxy settings defined in the browser?
A: OGS does not support auto proxy or proxy Auto Config (PAC) files, but does support a hard-coded proxy server. As such, OGS operation does not occur. The relevant log message is: "OGS will not be performed because automatic proxy detection is configured."