Guest

Cisco AnyConnect Secure Mobility Client

AnyConnect Optimal Gateway Selection Troubleshoot Guide

Techzone Article content

Document ID: 116721

Updated: Dec 23, 2013

Contributed by Cisco TAC Engineers.

   Print

Introduction

This document describes how to troubleshoot issues with Optimal Gateway Selection (OGS). OGS is a feature that can be used in order to determine which gateway has the lowest Round Trip Time (RTT) and connect to that gateway. One can use the OGS feature in order to minimize latency for Internet traffic without user intervention. With OGS, Cisco AnyConnect Secure Mobility Client (AnyConnect) identifies and selects which secure gateway is best for connection or reconnection. OGS begins upon first connection or upon a reconnection at least four hours after the previous disconnection. More information can be found in the Administrator's guide.

Tip: OGS works best with the latest AnyConnect client and ASA software Version 9.1(3)* or later.

How does OGS work?

A simple Internet Control Message Protocol (ICMP) ping request does not work because many Cisco Adaptive Security Appliance (ASA) firewalls are configured in order to block ICMP packets to prevent discovery. Instead, the client sends three HTTP/443 requests to each headend that appears in a merge of all profiles. These HTTP probes are referred to as OGS pings in the logs, but, as explained earlier, they are not ICMP pings. In order to ensure that a (re)connection does not take too long, OGS selects the previous gateway by default if it does not receive any OGS ping results within seven seconds. (Look for OGS ping results in the log.)

Note: AnyConnect should send an HTTP request to 443, because the response itself is important, not a successful response. Unfortunately, the fix for proxy handling sends all requests as HTTPS. See Cisco bug ID CSCtg38672 - OGS should ping with HTTP requests.

Note: If there are no headends in the cache, AnyConnect first sends one HTTP request in order to determine if there is an authentication proxy, and if it can handle the request. It is only after this initial request that it begins the OGS pings in order to probe the server.

  • OGS determines the user location based on the network information, such as the Domain Name System (DNS) suffix and the DNS server IP address. The  RTT results, along with this location, are stored in the OGS cache.

  • OGS location entries are cached for 14 days. Enhancement CSCtk66531 was filed to make these settings user-configurable.

  • OGS is not run again from this location until 14 days after the location entry is first cached. During this time, it uses the cached entry and the RTTs determined for that location. This means that when AnyConnect starts again, it does not perform OGS again; instead, it uses the optimal gateway order in the cache for that location. In the Diagnostic AnyConnect Reporting Tool (DART) logs, this message is seen:

    ******************************************
    Date : 10/04/2013
    Time : 14:00:44
    Type : Information
    Source : acvpnui

    Description : Function: ClientIfcBase::startAHS
    File: .\ClientIfcBase.cpp
    Line: 2785
    OGS was already performed, previous selection will be used.

    ******************************************
  • RTT is determined with a TCP exchange to the Secure Sockets Layer (SSL) port of the gateway to which the user will try to connect as specified by the host entry in the AnyConnect profile.

    Note: Unlike the HTTP-ping, which does a simple HTTP post and then displays the RTT and the result, OGS computations are slightly more complicated. AnyConnect sends three probes for each server, and calculates the delay between the HTTP SYN that it sends out and the FIN/ACK for each of these probes. It then uses the lowest of the deltas in order to compare the servers and make its selection. So, even though HTTP-pings are a fairly good indication of which server the AnyConnect will choose, they might not necessarily tally. There is more information about this in the rest of the document.

  • Currently, OGS only runs the checks if the user comes out of a suspend, and the threshold has been exceeded. OGS does not connect to a different ASA if the ASA the user is connected to crashes or becomes unavailable. OGS contacts only the primary servers in the profile in order to determine the optimal one.

OGS Cache

Once the calculation is finished, the results are stored in the preferences_global file. There have been issues with this data not being stored in the file before.

Refer to Cisco bug ID CSCtj84626 for more details.

Location Determination

OGS caching works on a combination of the DNS domain and the individual DNS server IP addresses. It works as follows:

  • Location A has a DNS domain of locationa.com, and two DNS server IP addresses - ip1 and ip2. Each domain/IP combination creates a cache key that points to an OGS cache entry. For example:
    • locationa.com|ip1 -> ogscache1
    • locationa.com|ip2 -> ogscache1
  • If AnyConnect then connects to a physically-different network, the same buildup of domain/IP combinations is created and checked against the cached list. If there are any matches at all, that OGS cache value is used, and the client is still considered to be at location A.

Failure Scenarios

Here are some failure scenarios users might encounter:

When Connectivity to the Gateway is Lost

When OGS is used, if connectivity to the gateway to which the users are connected is lost, then AnyConnect connects to the servers in the backup server listandnot to the next OGS host. The order of operations is as follows:

  1. OGS contacts only the primary servers in order to determine the optimal one.
  2. Once determined, the connection algorithm is:
    1. Attempt to connect to the optimal server.
    2. If that fails, try the optimal server?s backup server list.
    3. If that fails, try each server that remains in the OGS selection list, ordered by its selection results.

Note: When the administrator configures the backup server list, the current profile editor only allows the administrator to enter the Fully Qualified Domain Name (FQDN) for the backup server, but not the user-group as is possible for the primary server:

Cisco bug ID CSCud84778 has been filed in order to correct this, but the complete URL must be entered in the host address field for the backup server, and it should work: https://<ip-address>/usergroup.

Resume After a Suspend

In order for OGS to run after a resume, AnyConnect must have had a connection established when the machine was put to sleep. OGS after a resume is only performed after the network environment test occurs, which is meant to confirm that network connectivity is available. This test includes a DNS connectivity subtest.

However, if the DNS server drops type A requests with an IP address in the query field, as opposed to replying with "name not found" (the more common case, always encountered during tests), then Cisco bug ID CSCti20768 "DNS query of type A for IP address, should be PTR to avoid timeout" applies. 

TCP Delayed-ACK Window Size Selects Incorrect Gateway

When ASA versions prior to Version 9.1(3) are use, the captures on the client show a persistent delay in the SSL handshake. What is noticed is that the client sends its ClientHello, then the ASA sends its ServerHello. This is normally followed by a Certificate message (optional Certificate Request) and ServerHelloDone message. The anomaly is two-fold:

  1. The ASA does not immediately send the Certificate message after the ServerHello. The client window size is 64,860 bytes, which is more than enough to hold the entire response from the ASA.

  2. The client does not ACK the ServerHello immediately, so the ASA retransmits the ServerHello after ~120ms, at which point the client ACKs the data. Then the Certificate message is sent. It is almost as though the client waits for more data.

This happens because of the interaction between TCP slow-start  and TCP delayed-ACK. Prior to ASA Version 9.1(3), the ASA uses a slow-start window size of 1, whereas the Windows client uses a delayed-ACK value of 2. This means that the ASA only sends one data packet until it gets an ACK, but it also means that the client does not send an ACK until it receives two data packets. The ASA times out after 120ms and retransmits the ServerHello, after which the client ACKs the data and the connection continues. This behavior was changed by Cisco bug ID CSCug98113 so that the ASA uses a slow start window size of 2 by default instead of 1.

This can impact OGS calculation when:

  • Different gateways run different ASA versions.
  • Clients have different delayed-ACK window sizes.

In such situations, the delay introduced by the delayed-ACK could be sufficient to cause the client to select the wrong ASA. If this value differs between the client and the ASA, there could still be problems. In such situations, the workaround is to adjust the Delayed Acknowledgements window size.

Windows

  1. Start the Registry Editor.

  2. Identify the GUID of the interface on which you want to disable the delayed-ACK. In order to do this,  navigate to:
    HKEY_LOCAL_MACHINE > SOFTWARE > Microsoft > WindowsNT > CurrentVersion > NetworkCards > (number).
    Look at each number listed under NetworkCards. On the right-hand side, the Description should list the Interface (for example, Intel(R) Wireless WiFi Link 5100AGN) and the ServiceName should list the corresponding GUID.

  3. Locate and then click this registry subkey:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<Interface GUID>

  4. On the Edit menu, point to New, and then click DWORD Value.

  5. Name the new value TcpAckFrequency, and assign it a value of 1.

  6. Quit Registry Editor.

  7. Restart Windows for this change to take effect.

Note: Enhancement request CSCum19065 has been filed to make TCP tuning parameters configurable on the ASA.

Typical User Example

The most common use case is when a user at home runs OGS the first time, it records the DNS settings and the OGS ping results in the cache (defaults to a 14-day timeout). When the user returns home the next evening, OGS detects the same DNS settings, finds it in the cache, and skips the OGS ping test. Later, when the user goes to a hotel or restaurant that offers Internet service, OGS detects different DNS settings, runs the OGS ping tests, selects the best gateway, and records the results in the cache.

The processing is identical when it resumes from a suspended or hibernated state, if the OGS and AnyConnect resume settings allow for it.

Troubleshoot OGS

Step 1. Clear the OGS Cache in Order to Force a Reevaluation

In order to clear the OGS cache and reevaluate the RTT for available gateways, simply delete the Global AnyConnect Preferences file from the PC. The location of the file varies based on the Operating System (OS):

  • Windows Vista and Windows 7
    C:\ProgramData\Cisco\Cisco AnyConnect VPN Client\preferences_global.xml
  • Windows XP
    C:\Documents and Settings\AllUsers\Application Data\Cisco\Cisco AnyConnect VPN
    Client\preferences_global.xml
  • Mac OS X
    /opt/cisco/vpn/.anyconnect_global
  • Linux
    /opt/cisco/vpn/.anyconnect_global

Step 2. Capture the Server Probes During the Connection Attempt

  1. Start Wireshark on the test machine.
  2. Start a connection attempt on AnyConnect.
  3. Stop the Wireshark capture once the connection is complete.

    Tip: Since the capture is only used in order to test OGS, it is best to stop the capture as soon as AnyConnect selects a gateway. It is best to not go through a complete connection attempt, because that can cloud the packet capture.

Step 3. Verify the Gateway Selected by OGS

In order to verify why OGS selected a particular gateway, complete these steps:

  1. Initiate a new connection.
  2. Run AnyConnect DART:
    1. Launch AnyConnect, and click Advanced.
    2. Click Diagnostics.
    3. Click Next.
    4. Click Next.


  3. Examine the DART results found in the newly created DartBundle_XXXX_XXXX.zip file on the desktop.
    1. Navigate to Cisco AnyConnect Secure Mobility Client > AnyConnect.txt.

    2. Note the time the OGS probes started for a particular server from this DART log:
      ******************************************

      Date : 10/04/2013
      Time : 14:21:27
      Type : Information
      Source : acvpnui

      Description : Function: CHeadendSelection::CSelectionThread::Run
      File: .\AHS\HeadendSelection.cpp
      Line: 928
      OGS starting thread named gw2.cisco.com

      ******************************************



      Usually they should be around the same time, but in case the captures are large, the time stamp helps narrow down which packets are the HTTP probes and which ones are the actual connection attempt.

    3. Once AnyConnect sends three probes to the server, this message is generated with the results for each of the probes:
      ******************************************

      Date : 10/04/2013
      Time : 14:31:37
      Type : Information
      Source : acvpnui

      Description : Function: CHeadendSelection::CSelectionThread::logThreadPingResults
      File: .\AHS\HeadendSelection.cpp
      Line: 1137
      OGS ping results for gw2.cisco.com: (219 218 132 )

      ******************************************
      It is important to pay attention to these three values, because they must match the capture results.

    4. Look for the message that contains "*** OGS Selection Results***" in order to see the evaluated RTT, and if the most recent connection attempt was the result of a cached RTT or a new calculation.

      Here is an example:
      ******************************************

      Date        : 10/04/2013
      Time        : 12:29:38
      Type        : Information
      Source      : vpnui

      Description : Function: CHeadendSelection::logPingResults
      File: .\AHS\HeadendSelection.cpp
      Line: 589
      *** OGS Selection Results ***
      OGS performed for connection attempt. Last server: 'gw2.cisco.com'

      Results obtained from OGS cache. No ping tests were performed.

      Server Address     RTT (ms)
      gw1.cisco.com     302
      gw2.cisco.com     132 <========= As seen, 132 was the lowest delay
      of the three probes from the previous DART log
      gw3.cisco.com     506
      gw4.cisco.com     877


      Selected 'gw2.cisco.com' as the optimal server.

      ******************************************

Step 4. Validate the OGS Calculations Run by AnyConnect 

Inspect the capture for the TCP/SSL probes used in order to calculate RTT. See how long the HTTPS request takes over a single TCP connection. Each probe request should use a different TCP connection. In order to do this, open the capture in Wireshark, and repeat these steps for each of the servers:

  1. Use the ip.addr filter in order to isolate the packets sent to each of the servers into their own capture. In order to do this, navigate to Edit, and select Mark All Displayed Packets. Then navigate to File > Save As, select the Marked packets only option, and click Save:



  2. In this new capture, navigate to View > Time Display Format > Date and Time of Day:



  3. Identify the first HTTP SYN packet in this capture that was sent when the OGS probe was sent based on the DART logs as identified in Step 3.3.2. It is important to remember that, for the first server, the first HTTP request is not a server probe. It is easy to mistake the first request for a server probe, and thus arrive at values completely different from what OGS reports. This problem is highlighted here:



  4. In order to more easily identify each of the probes, right-click the HTTP SYN for the first probe, and then select Colorize Conversation as shown here:



    Repeat this process for the SYNs on all of the probes. As shown in the previous image, the first two probes are depicted in different colors. The advantage of colorizing the TCP conversations is to easily spot retransmissions or other such oddities per probe.

  5. In order to change the time display, navigate to View > Time Display Format > Seconds Since Epoch:



    Select Milliseconds, because that is the level of precision that OGS uses.

  6. Calculate the time difference between the HTTP SYN and the FIN/ACK, as shown in the diagram of Step 4. Repeat this process for each of the three probes, and compare the values to those shown in the DART logs in Step 3.3.3.

Analysis

If after the analysis of the captures, the determined RTT values are calculated and compared to the values seen in the DART logs and everything is found to match up, but it still seems like the wrong gateway is being selected, then it is due to one of two problems:

  • There is an issue on the headend. If this is the case, there might be too many retransmissions from one particular headend, or any other such oddities seen in the probes. A closer analysis of the exchange is required.

  • There is a problem with the Internet Service Provider (ISP). If this is the case, there might be fragmentation or large delays seen for one particular headend.

Q&A

Q: Does OGS work with load-balancing?

A: Yes. OGS is only aware of the cluster master name, and uses that in order to judge the nearest headend.

Q: Does OGS work with the proxy settings defined in the browser?

A: OGS does not support auto proxy or proxy Auto Config (PAC) files, but does support a hard-coded proxy server. As such, OGS operation does not occur. The relevant log message is: "OGS will not be performed because automatic proxy detection is configured."

Updated: Dec 23, 2013
Document ID: 116721