Guest

Cisco Aironet 1130 AG Series

WGB Roaming: Internal Details and Configuration

Document ID: 113198

Updated: Aug 26, 2011

   Print

Introduction

Cisco Workgroup Bridge (WGB) is a very useful tool for the design and deployment of a wireless network because it allows non-wireless devices to gain mobility. WGB provides many details on roaming, security access, etc, that impact deployment scenarios depending on your needs.

In code versions 12.4(25d)JA and later, Cisco introduced a set of commands and changes in order to optimize the use of WGB on high speed roaming environments.

This document covers different aspects of how a WGB works, including roaming algorithm decision points, and how to configure it for the intended usage model.

Prerequisites

Requirements

Cisco recommends that you have knowledge of these topics:

  • Cisco Wireless LAN solution

  • Cisco Workgroup Bridge

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

What is a Work Group Bridge?

A WGB is basically an access point (AP) configured to act as a wireless client towards an infrastructure, and to provide Layer 2 connectivity for the devices connected to its ethernet interface.

A typical WGB deployment has these components:

  • WGB device, normally with at least one radio and one ethernet interface

  • A wireless infrastructure, normally called root AP, which can be either Autonomous or Unified.

  • One or more wired client devices connected to the WGB. This document does not cover mixed role scenarios (one radio as WGB, one radio as root on same AP).

There are three main types of WGB:

  • Cisco WGB: Cisco WGB is any Cisco IOS® - based AP configured as WGB (1130, 1240, 1250, etc). This mode uses the IAPP protocol to inform the network infrastructure of the devices that the WGB has learned on its Ethernet interface. In this case, the Wireless LAN Controller (WLC) or root AP has Layer 2 visibility of the devices "hanging" from the WGB.

  • Non Cisco WGB: This is a third party device acting as a WGB, connecting one or more wired devices to the wireless infrastructure. These do not support IAPP, and either allow only a single wired device, or provide a MAC address translation mechanism, hiding all their wired clients behind a single 802.11 MAC address. These types of devices need special handling on Address Resolution Protocol (ARP) and DHCP frames if the infrastructure is a WLC due to the security checks and frame handling done on controllers.

  • Cisco AP configured as "Universal WGB": This is a mode that suppresses the IAPP mechanism, so the WGB can be used towards a either Cisco infrastructure or third party root APs. In this case, the WGB takes the address of its ethernet client, limiting the number of devices behind it to one.

The next section focuses on the scenario of a Cisco WGB used either towards autonomous or WLC infrastructure.

Usage Scenarios

Typical WGB use examples include:

  • Connecting a wired printer to the network

  • Different manufacturing deployments, where it is not feasible or practical to run a cable to the wired device

  • In-vehicle deployments, where the WGB provides connectivity from a car, metro train, etc, to an outdoor wireless network

  • Wired cameras

Each example has its own requirements on terms of:

  • Bandwidth needed to support the application that will run on top of the wireless infrastructure

  • Roaming delay tolerance - How long it takes for the WGB to move from current AP to next one while the device is moving?

  • Forwarding time tolerance - How many frames are lost on each roaming?

A printer does not move much, so roaming requirements are lower. A train mounted WGB on the other hand, needs fine tuning on the roaming component in order to insure correct behavior while it is moving around.

A video stream can have a large bandwidth requirement, so it needs high wireless data rates. However, a telemetry application might only need a few frames from time to time.

It is important that the requirements are properly defined from the beginning, as they affect not only the configuration of the WGB, but also how the wireless infrastructure has to be designed. For example, AP placement, distance, power levels, enabled rates, etc, all affect roaming characteristics. Therefore, all are a crucial point if high speed roaming is needed.

In general, you must know these details:

  • What is the needed bandwidth for the application?

  • What is the roaming delay tolerance?

  • Can the application handle properly network disconnections? Is there an additional backup mechanism?

  • Can the application handle packet loss properly? (Even on the best wireless design, you must expect a percentage of packet loss.)

This document does not address the details on how to design a RF environment for high speed roaming/outdoor. Refer to the Outdoor Mesh deployment guide.

Roaming

For a wireless device, roaming is a very critical part of its functionality.

Basically, roaming means the capability to go from one AP to another, both belonging to the same wireless infrastructure.

As roaming needs a change from the current AP to the next, there is a resultant disconnection or time without service. This disconnection can be small. For example, less than 200ms on voice deployments or much longer, even seconds, if the security needed enforces a full authentication on each roam event.

Roaming is needed so the device can find a new parent with hopefully better signal, and it can continue to access the network infrastructure properly. At the same time, too many roams can cause multiple disconnections or time without service, which affects access. It is important for a mobile device, such as a WGB, to have a good roaming algorithm with enough configuration capabilities to adapt to different RF environments and data needs.

Elements of Roaming

  • Triggers: Each client implementation has one or more triggers or events, that when met, causes the device to move to another parent AP. Examples: beacon loss (device does not hear anymore the regular beacons from AP), packet retries, signal level, no data received, deauthentication frame received, low data rate in use, etc. The possible triggers can be different from client implementation to another because they are not fully standardized. Simpler devices might have a poor trigger set, which causes bad (sticky clients) or unnecessary roams. The WGB supports all of the previous elements described before.

  • Scan time: The wireless device (WGB) spends some time searching for potential parents. This normally implies going on different channels, doing active probing or passively listening for APs. As the radio has to scan, this means time that the WGB spends doing something else different from forwarding data. From this scan time, the WGB can build a valid set of parents that can be roamed to.

  • Parent selection: After scan time, the WGB can check the potential parents, select the best one and trigger the association/authentication process. Sometimes, the decision point can be to remain on the current parent if there is not a significant benefit from a roaming event (remember that roaming too much can be bad).

  • Association/Authentication: The WGB proceeds to associate to the new AP, which normally covers both 802.11 authentication and association phases, plus completing the security policy configured on the SSID (WPA 2-PSK, CCKM, None, etc.).

  • Traffic Forwarding Restore: The WGB updates network infrastructure of its known wired clients through IAPP updates after roaming. After this point, the traffic to/from the wired clients to the network resumes.

Configuration Guide - Security policies

One important aspect for roaming on mobile devices is what is the security policy that will be implemented on the infrastructure. There are several options, each one with good/bad points. These are the most important ones:

  • Open—Basically no security. This is the fastest, and simpler of all policies. This has the main problem of not restricting unauthorized access to the infrastructure and no protection against attacks, which limits its usage to very specific scenarios. For example, mines where no external attacks are possible due to sheer nature of the deployment.

  • MAC address authentication—Basically same level of security as open, as MAC address spoofing is a trivial attack. Not recommended due to the added time to complete the MAC validation, which slows down roaming.

  • WPA2-PSK—Offers good level of encryption (AES-CCMP), but authentication security depends on the quality of the preshared key. For security measures, a password of minimum 12 characters and random is recommended. Similar to the pre-shared key method, as the key is used on multiple devices, if the key is compromised the password needs to be modified across all equipments. The roaming speed is acceptable, as it is done in 6 frame exchanges, and you can calculate what will be the upper/lower time bounds for it to complete because it does not involve any external equipment (no RADIUS server, etc). In general, this method is the preferred one after balancing problems and benefits.

  • WPA2 with 802.1x—This improves on the previous method by using a per device/user credential, which can be individually changed. The main problem is that for roaming, this method does not work properly when the device is moving fast, or short roaming times are needed. In general, this uses the same 6 frames plus the EAP exchange which can be between 4 and up. This depends on which EAP type is selected and the certificate sizes. Normally, this takes between 10 to 20 frames, plus the added delay of radius server processing.

  • WPA2+CCKM—This mechanism offers good protection, uses 802.1x to build the initial authentication, then does a quick exchange of just 2 frames on each roam event. This offers a very quick roaming time. The main problem is that in case of a failed roam, it reverts back on 802.1x. Then, starts using CCKM again after it authenticates. If the application on top of the WGB can tolerate an occasional long roaming time in case of problems, it can be used as the best option versus PSK.

This document does not cover not-recommended technologies that have security issues such as LEAP, WPA-TKIP, WEP, etc.

Configuring WPA2-PSK

On the WGB, this is fairly simple to configure. You need SSID definition and the proper encryption on the radio.

dot11 ssid wgbpsk
vlan 32
authentication open 
authentication key-management wpa version 2
wpa-psk ascii YourReallySecurePSK!
no ids mfp client
 
interface Dot11Radio0
ssid wgbpsk
encryption mode ciphers aes-ccm
station-role workgroup-bridge

Your SSID name and pre-shared key have to match your network infrastructure.

Configuring WPA2 with 802.1x

It basically builds on top of previous config, with the addition of EAP profiles and authentication method:

dot11 ssid wlan1
authentication open eap eap 
authentication network-eap eap 
authentication key-management wpa version 2
dot1x credentials wgb
dot1x eap profile eapfast
no ids mfp client
eap profile eapfast 

!--- This covers the EAP method type used on your network.

method fast
!
!     
dot1x credentials wgb 

!--- This is your WGB username/password.

username cisco
password 7 1511021F0725  
 
interface Dot11Radio0
encryption mode ciphers aes-ccm 
ssid wlan1

Configuring WPA2 with CCKM

Only one step on top of WPA2 with just one minor change: using CCKM flag on the SSID configuration. This assumes the WLAN is configured for CCKM only on the WLC side:

dot11 ssid wlan1
authentication open eap eap 
authentication network-eap eap 
authentication key-management cckm
dot1x credentials wgb
dot1x eap profile eapfast
no ids mfp client

Validation of the method used

A quick check on the WGB can report the encryption and key management in use, for example, in CCKM:

wgb-1260#sh dot11 associations al
Address           : 0024.97f2.75a0     Name             : lap1140-etsi-1
IP Address        : 192.168.40.10      Interface        : Dot11Radio 0
Device            : LWAPP-Parent      Software Version : NONE 
CCX Version       : 5                  Client MFP       : Off

State             : EAP-Assoc          Parent           : -                  
SSID              : wlan1                           
VLAN              : 0
Hops to Infra     : 0                  Association Id   : 1
Tunnel Address    : 0.0.0.0
Key Mgmt type     : CCKM               Encryption       : AES-CCMP
 
Current Rate      : m7.-               Capability       : WMM ShortHdr ShortSlot
Supported Rates   : 48.0 54.0 m0. m1. m2. m3. m4. m5. m6. m7.
Voice Rates       : disabled           Bandwidth        : 20 MHz 
Signal Strength   : -59  dBm           Connected for    : 72 seconds
Signal to Noise   : 41  dB            Activity Timeout : 8 seconds
Power-save        : Off                Last Activity    : 7 seconds ago
Apsd DE AC(s)     : NONE

Packets Input     : 12064              Packets Output   : 136       
Bytes Input       : 2892798            Bytes Output     : 19514     
Duplicates Rcvd   : 87                 Data Retries     : 8         
Decrypt Failed    : 0                  RTS Retries      : 0         
MIC Failed        : 0                  MIC Missing      : 0         
Packets Redirected: 0                  Redirect Filtered: 0 

Configuring Roaming

On the WGB, you can modify several parameters that affect roaming algorithm.

Packet retries

By default, the WGB re-transmits a frame 64 times. If it is not properly acknowledged (ACK) by a parent, it assumes that parent is no longer valid, and starts a scan/roaming process. See this one as a "async" roaming trigger because it can be done at any moment that a transmission fails.

The command to configure this, goes inside the dot11 interface, and it takes the following options:

packet retries NUM [drop]

Num: Is between 1 and 128, with a default of 64. A good number for a quick roaming trigger is usually 32. Using a lower number is not advisable on most RF environments.

drop: If not present, the WGB starts a roaming event when maximum retries are reached. When present, the WGB does not start new roaming and uses other triggers, such as beacon loss and signal.

RSSI Monitoring

WGB can implement a pro-active signal scan for the current parent and start a new roaming process when the signal falls below an expected level.

This process takes two parameters:

  • A timer, which wakes up the check process every X seconds

  • RSSI level, which is used to start a roaming process if the current signal is bellow it.

For example:

in d0 
mobile station period 4 threshold 75 

The time should not be lower that what the WGB takes to complete an authentication process in order to prevent a "roamming loop" in some conditions or to avoid a too aggressive roaming behavior. In general, it should be tested to see what accomodates the application needs.

For PSK it can be lower than in EAP based methods (typical 2 and 4 for very aggresive applications).

The RSSI level is expresed as a positive integer, although it is basically a normal -dBm measured level. You should use a sightly higher number than the minimum needed to keep your data rate working properly. For example, if your desired minimum rate is 6 mbps, a threshold RSSI of -87 should be sufficient. For a 48 mbps, you need -70 dBm, etc.

Note: This command can also trigger a "roaming by data rate change", which is too aggresive. It must be used together with minimum-rate for good results.

Minimum Data Rate

Starting with 12.4(25d)JA, Cisco added a configurable parameter to control when the WGB should trigger a new roaming event, if the current data rate to parent is bellow a given value.

This is helpful to ensure a desired lower bound on speed is kept in order to support video or voice applications.

Before this command was available, the WGB triggered a roaming frequently when the rate was found to be lower than the previous time. Basically on time X+1, if the rate was lower than previous X time, the WGB started a roaming process. On the logs you would see these messages:

*Mar  1 00:36:43.490: %DOT11-4-UPLINK_DOWN: Interface Dot11Radio1, parent lost: Had to lower data rate

This is too aggresive, and normally, the only solution was to configure a single data rate both in WGB and on parent APs.

Now, the recommended way is to always configure this command, whenever a mobile station period command is used:

in d0 
mobile station  minimum-rate 2.0

With this, the new roaming process is only triggered if the current rate is lower than the configured value. This reduces unnecesary roamings and allows to keep an expected rate value.

Note: The message "Had to lower data rate" is expected to occur even with this config, just that now it should only be seen if WGB was TX at a lower than configured speed, when the mobile station period check time was triggered.

Scan Channels

The WGB scans all "country channels" while doing a roaming event. This means that depending on radio domain, you can scan channels 1 to 11 on 2.4 Ghz band, or 1 to 13.

Each scanned channel takes some time. On 802.11bg this is around 10 to 13 ms. On 802.11a, it can be up to 150 ms if channel is DFS enabled (so not probing, just doing passive scan there).

A good optimization is to restrict the scanned channels to use only the ones in service by the infrastructure. This is especially important on 802.11a, as the channel list is large, and the time per channel can be long if DFS is in use.

There are three points to take when designing a channel plan for WGB/Roaming:

  • For 2.4 GHz band, try to stick to 1/6/11 to minimize side channel interference. Any other channel plan with 4, etc., tends to be difficult to engineer properly from RF point of view, without increasing interference.

  • Using a single channel setup for all APs is a good idea from scan point of view. This only makes sense if the total number of clients to support is very low, and there are not high bandwidth requirements. This eliminates the radio change time from the scan time. Be aware that few environments can benefit from this option, so use with care.

  • For 5.0 GHz band, if it is possible by your local regulations, using indoor non-DFS channels(36 to 48) allows faster scan time, as WGB can actively probe each one, instead of doing passive listening for longer time.

The channel plan in use for your deployment might need to accommodate other requirements. Use the general RF design recommendations.

In order to configure the scan channel list:

in d0 
mobile station scan 1 6 11

Note: Mobile station only shows up when using the WGB role on the radio.

Note: Make sure your WGB scan list matches your infrastructure channel list. If not, the WGB will not find your available APs.

Configure Timers

Starting with 12.4(25a)JA, there are several new commands to optimize recovery timer when a problem is found, which are only available when the AP is in WGB mode.

wgb-1260(config)#workgroup-bridge timeouts ?

  assoc-response  Association Response time-out value
  auth-response   Authentication Response time-out value
  client-add      client-add time-out value
  eap-timeout     EAP Timeout value
  iapp-refresh    IAPP Refresh time-out value

In the case of assoc-response, auth-response, client-add, these indicate how long the WGB will wait for the parent AP to answer, before considering the AP as dead the and trying next candidate. The default values are 5 seconds, which is too long for some applications. The minimum timer is 800 ms and is recommended for most mobile applications.

In eap-timeout, the WGB sets a maximum time to wait, until the full EAP authentication process is completed. This works from a EAP supplicant point of view in order to restart the process if the EAP authenticator is not answering back. The default value is 60 seconds. Be careful to never configure a value that can be lower than the actual time needed to complete a full 802.1x authentication. Normally, setting this to 2 to 4 seconds is correct for most deployments.

For iapp-refresh, the WGB by default generates an IAPP bulk update to the parent AP after roaming in order to inform of the known wired clients. There is a second retransmission after association around 10 seconds later. This timer allows to do a "fast retry" of the IAPP bulk after association in order to overcome the possibility that the first IAPP update was lost due to RF, or encryption keys not yet installed on the parent AP. For fast roaming scenarios, 100ms can be used. However, make sure there is a large number of WGB in use. This increases significantly the total number of IAPP sent to the infrastructure after each roaming.

Example for aggresive values:

workgroup-bridge timeouts eap-timeout 4
workgroup-bridge timeouts iapp-refresh 100
workgroup-bridge timeouts auth-response 800
workgroup-bridge timeouts assoc-response 800
workgroup-bridge timeouts client-add 800

These have been successfully tested on mobile WGB deployment scenarios.

Other WGB optimizations

There are other minor changes to take into consideration for WGB deployment scenarios:

Radio Related

  • Reduce rts retries - rts retries 32. This can save some RF time on aggresive scenarios. Normally this is not needed.

  • Antenna type: If using a single antenna (no diversity), you should configure the radio to improve general performance:

antenna transmit right-a
antenna receive right-a

Antenna diversity is desirable, but not always possible when physically installing antennas on the vehicle. Proper antenna selection is critical for roaming. As little as 2 dB can be a huge difference on general roaming average times.

Log Related

  • In order to save some milliseconds, reduce the console logging level to errors only: logging console errors. Do not disable it completely because it can affect negatively the roaming performance on some conditions.

  • Ideally, use telnet or ssh from the ethernet side to collect debugs or logs. This has a much lower impact on performance in comparison to logging debugs over console: logging monitor debugging.

  • The command to understand what is occuring for WGB roaming point of view is debug dot11 dot11 0 trace print uplink. This has low impact on the CPU, but do not enable other debug options unless instructed because each one might increment the total roaming time.

  • Try to use SNTP when possible. This keeps the WGB time on sync, which is extremely helpful for troubleshooting.

MFP usage

  • MFP can be useful from a security point of view. However, a drawback is that on roaming failure scenarios, the WGB does not accept de-auth frames from the AP parent to trigger a new roaming if the encryption key between both of them has gone wrong for any reason.

  • On these rare failure scenarios, the WGB can take up to 5 seconds to trigger a new scan, if the current parent can be heard with good RF signal. There is a "catch-all" detection mechanism that WGB can trigger if no valid data frames are received during that time.

  • By default, the WGB tries to use the client MFP if the SSID has WPA2 AES in use.

  • It is recommended to disable client MFP if fast recovery times are needed (WGB to react to non-protected deauth frames). This is a compromise between security needs and fast recovery times. The decision depends on what is more important for the deployment scenario.

dot11 ssid wgbpsk  
   no ids mfp client

EAP-TLS on WGB and "clock save interval"

Refer to the Synchronize IOS Supplicant Clocks and Save Time Setting to NVRAM section of Release Notes for Cisco Aironet Access Points and Bridges for Cisco IOS Release 12.4(21a)JY.

Keep in mind that if using uWGB, the uWGB might never get a chance to do a sntp sync because it is typically associated with the attached MAC address and the uWGB BVI does not have network access. Therefore, in the case of a uWGB, it is recommended to get a good clock sync in NVRAM at deployment at minimum. If the attached enet device has the ability to be an NTP source (as well as updated client via its uWGB connection), then it is possible to consider having the uWGB sntp sync from it as an effective NTP reflection point.

Full Configuration Example

no service pad
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname wgb-1260
!
logging rate-limit console 9
logging console errors
!
clock timezone CET 1
no ip domain lookup
!
!
dot11 syslog
!
!
dot11 ssid wgbpsk
   vlan 32
   authentication open 
   authentication key-management wpa version 2
   wpa-psk ascii 7 060506324F41584B56
   no ids mfp client
!
!
!
!         
!
!
username Cisco password 7 13261E010803
!
!
bridge irb
!
!
interface Dot11Radio0
no ip address
no ip route-cache
!
encryption mode ciphers aes-ccm 
!
ssid wgbpsk
!
antenna transmit right-a
antenna receive right-a
  packet retries 32
station-role workgroup-bridge
rts retries 32
mobile station scan 2412 2437 2462
mobile station minimum-rate 6.0 
mobile station period 3 threshold 70
bridge-group 1
!

interface GigabitEthernet0
no ip address
no ip route-cache
duplex auto
speed auto
no keepalive
bridge-group 1
!
interface BVI1
ip address 192.168.32.67 255.255.255.0
no ip route-cache
!
ip default-gateway 192.168.32.1
no ip http server
no ip http secure-server

bridge 1 route ip

sntp server 192.168.32.1
clock save interval 1
workgroup-bridge timeouts eap-timeout 4
workgroup-bridge timeouts iapp-refresh 100
workgroup-bridge timeouts auth-response 800
workgroup-bridge timeouts assoc-response 800
workgroup-bridge timeouts client-add 800

Debug Analysis

In any issues occur, it is important to capture the output of the debug dot11 dot11 0 trace print uplink command as a first step. This provides a good view of what is occurring with the roaming process.

This is an example current parent as candidate:

 Sep 27 11:42:38.797: %DOT11-4-UPLINK_DOWN: Interface Dot11Radio0, parent lost: Signal strength too low
 Sep 27 11:42:38.797: CDD051F1-0 Uplink: Lost AP, Signal strength too low

This is trigger for low signal met. It depends on mobile station period X threshold Y command. First message is always sent to the console, second is part of the uplink debug traces. It is not a problem, but part of the normal WGB process.

 Sep 27 11:42:38.798: CDD052C7-0 Uplink: Wait for driver to stop

The Uplink process forces a radio queue purge before starting a channel scan. This step can take from a few milliseconds to several seconds depending on channel utilization and queue depth. Data frames are not timed out. Voice frames have a time comparision done, thus should be dropped faster. Some delay might be observed in noisy enviroments.

 Sep 27 11:42:38.798: CDD05371-0 Uplink: Enabling active scan
 Sep 27 11:42:38.799: CDD05386-0 Uplink: Scanning

This is the actual channel scan taking place. It parks the radio approximately 10 to 13 ms per configured channel.

 Sep 27 11:42:38.802: CDD064CD-0 Uplink: Rcvd response from 0021.d835.ade0 channel 1 3695

This is the list of probe responses received. First number is the channel, second is microseconds taken to receive it.

 Sep 27 11:42:38.808: CDD078F1-0 Uplink: Compare1 0021.d835.ade0 - Rssi 58dBm, Hops 0, Count 0, load 0
 Sep 27 11:42:38.809: CDD07929-0 Uplink: Compare2 0021.d835.cce0 - Rssi 46dBm, Hops 0, Count 0, load 0

Actual comparison done in these details:

 Sep 27 11:42:38.809: CDD07BDB-0 Uplink: Same as previous, send null data packet

Parent selection

 Sep 27 11:42:38.809: CDD07BF7-0 Uplink: Done
 Sep 27 11:42:38.808: %DOT11-4-UPLINK_ESTABLISHED: Interface Dot11Radio0,
 Associated To AP AP1 0021.d835.ade0 [None WPAv2 PSK]Roaming completed.

This is the point where the roaming is "finished". Traffic resumes as soon as IAPP frames are processed by the parent.

Parent Compare information

 Sep 27 14:16:47.590: F515B1FF-0 Uplink: Compare1 0021.d835.7620 - Rssi 60dBm, Hops 0, Count 0, load 3
 Sep 27 14:16:47.591: F515B238-0 Uplink: Compare2 0021.d835.e8b0 - Rssi 58dBm, Hops 0, Count -1, load 0

The compare1 prints the actual association count -1 (thus WGB itself is not taken in the number) if the “current” AP is still the one WGB is associated, then actual hops and load.

The compare2 prints the differences. This is why it is possible to see a negative number. If test has a higher number than current, you see negative.

Depending on current association count, load, signal difference, mobile threshold value, the WGB might or might not select a new parent.

The comparison is always between two APs, with the selected AP replacing the current for next iteration. Therefore, some of the decisions can be due to RSSI on one loop, or due to other factors on the next test.

Related Information

Updated: Aug 26, 2011
Document ID: 113198