Guest

Asynchronous Connections

Configuring MICA Modem Recovery

Document ID: 7042



Contents

Introduction
Prerequisites
      Requirements
      Components Used
      Conventions
Modem Failure Overview
Configuring Modem Recovery
      Modem Recovery Configuration Examples
Using Modem Recovery
      Automatic Identification of Bad Modems
      Reloading Firmware
      Explaining the Configuration
Related Information

Introduction

Field analysis and investigation has found cases where modems in production will sometimes stop working. However, reloading the firmware generally recovers the modem back into operational mode by resetting it. The objective of the modem recovery feature is to have the network access server (NAS) identify modems which have gone out of operation, and automatically reload their digital signal processor (DSP) firmware with minimal impact on end users and NAS capacity.

For information on configuring recovery for NextPort Software Port Entity (SPE)s refer to the document Configuring NextPort SPE Recovery.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

Modem recovery first appeared in IOS Software version 12.0(1)T. We recommend using IOS Software versions 12.1(7) mainline, 12.1(7)AA, 12.2(1) mainline, or above. Make sure you have an IOS loaded that supports modem/spe recovery.

Note: As of 12.2(1) the modem recovery commands are no longer supported for MICA Modems on the AS5800. For 12.2(1) and higher (on the AS5800 only) use the spe recovery commands for both MICA and NextPort. Use the table below to determine if you you should use the modem recovery command or the spe recovery command.

Platform

Modem Type

IOS Version

Recovery Command

AS5200 AS5300

MICA

12.0(1)T and higher

modem recovery

AS5800

MICA

12.0(1)T-12.1(5)T

modem recovery

MICA

12.2(1) and higher

spe recovery

NextPort

12.1(3)T and higher

spe recovery

AS5350 AS5400 AS5850

NextPort

12.1(1)XD, 12.2 or higher

spe recovery

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

For more information on document conventions, refer to the Cisco Technical Tips Conventions.

Modem Failure Overview

MICA modems are implemented with one DSP for each two ports, and one control processor (CP) per six ports. The set of six MICA modems controlled by one CP is known as a "hex"; a Hex Modem Module (HMM) consists of one hex and a Dual Modem Module (DMM) contains two hexes. From time to time, a DSP or a CP will fail. This will cause all subsequent modem calls into that DSP or CP to fail to trainup.

Since in MICA modems, each DSP services two adjacent ports, the show modem command output may display a pair of adjacent modems having significantly higher trainup failures or "no answers" than the other modems. To recover from such problems, you must reload portware into the affected HMM/DMM.

Note: On the Cisco 3600 router there is no modem recovery feature, you will need to reload the Cisco IOS® Software.

There are several indications in the output of the show modem command which can indicate a modem is in need of recovery.

  • Look for a pair of modems showing a high number of failed calls compared to the other modems. For a DSP that has stopped working, the failed call count will be very similar for both of the modems it services.

  • A pair of modems showing a close number of "No Answer".

  • If modem recovery is configured correctly, a pair of modems will be marked Bad (B) after the "modem recovery threshold" number of calls fail.

The above three indications should be used to track down a failing DSP. Shown below is an example of conditions one and three being present in the show modem output.

           AvgHold Inc calls Out calls Busied Failed  No   Succ
    Mdm     Time   Succ Fail Succ Fail   Out   Dial Answer  Pct
   * 1/0  01:35:55 82   5     0    0     1      0     0     94%
   * 1/1  01:06:10 100  8     0    0     1      0     0     93%
   * 1/2  01:05:39 103  11    0    0     1      0     0     90%
     1/3  01:03:16 111  6     0    0     1      0     0     95%
   * 1/4  01:07:21 100  7     0    0     1      0     0     93%
     1/5  00:50:12 121  8     0    0     1      0     0     94%
     1/6  01:00:56 117  6     0    0     0      0     0     95%
     1/7  00:56:55 108  10    0    0     0      0     0     92%
   B 1/8  01:10:17 93   15    0    0     0      0     0     86%
   B 1/9  01:06:25 96   15    0    0     0      0     0     86%
     1/10 01:07:02 103  2     0    0     0      0     0     98%
     1/11 01:10:02 101  6     0    0     0      0     0     94%
   * 1/12 01:04:02 109  8     0    0     1      0     0     93%
   * 1/13 01:09:50 101  7     0    0     1      0     0     94%
     1/14 01:30:57 90   4     0    0     1      0     0     96%
     1/15 01:14:26 94   9     0    0     1      0     0     91%
     1/16 00:55:59 110  11    0    0     1      0     0     91%
     1/17 01:15:37 94   2     0    0     1      0     0     98%
     1/18 01:17:59 97   4     0    0     1      0     0     96%
     1/19 01:00:16 111  8     0    0     1      0     0     93%
     1/20 01:31:12 79   10    0    0     1      0     0     89%
     1/21 01:55:25 71   8     0    0     1      0     0     90%
     1/22 01:55:29 80   4     0    0     1      0     0     95%

Note: Modems 1/8 and 1/9 above are marked bad because of their abnormally low success percentage.

Shown below is an example of conditions one and two being present in the show modem output.

           AvgHold Inc calls Out calls Busied Failed  No   Succ
    Mdm     Time   Succ Fail Succ Fail   Out   Dial Answer  Pct
   ...
   ...
   1/3/48 00:03:03  14  63    0    0     0      0     62    18%
   1/3/49 00:03:43  9   65    0    0     0      0     62    12%
   1/3/50 00:32:42  30  1     0    0     0      0     0     97%
   1/3/51 00:32:52  29  1     0    0     0      0     0     97%

Configuring Modem Recovery

Once you have installed an IOS version supporting modem recovery, now configure a modem recovery scheme to meet the needs of your installation. In order to learn how best to configure this, you will need to understand the usage patterns and policies. This will involve answering such questions as:

  • At what rate are the modems failing? Two DSPs per day? Two DSP's per hour?

  • When is your daily usage at it's lowest?

  • What is your policy regarding clearing active calls? Are you willing to drop calls as needed, in order to reload DSPs, to prevent other callers from getting busy signals?

Modem Recovery Configuration Examples

Here are some example configurations for modem recovery. For more information, refer to Modem Management Commands.

Note: On NextPort systems, the modem recovery commands are replaced by new Software Port Entity (SPE) commands. For information on configuring recovery for NextPort SPEs refer to the document Configuring NextPort SPE Recovery.

modem recovery command

spe recovery command

modem recovery threshold

spe recovery port-threshold

modem recovery action

spe recovery port-action

modem recovery maintenance

spe download maintenance

Note: If your IOS does not support the modem recovery command then use the corresponding spe recovery command (from the table above) in place of the example modem recovery command.

Highly aggressive modem recovery

This assumes the administrator finds modems need frequent recovery. Hence, modems must be recovered quickly and continuously to prevent the clients from encountering busies. In this case it is assumed the administrator is willing to drop active calls as needed.

Note: An SPE unit is defined as the smallest software downloadable unit. For MICA a SPE is either 6 or 12 modems, depending on whether the MICA module is single or double density.

Command

Explanation

modem recovery threshold 10

When a modem suffers 10 consecutive trainup failures, it will be taken out of service and marked as pending recovery.

modem recovery action download

Recovery will involve the SPE having fresh code downloaded.

modem recovery maintenance schedule immediate

If a SPE is marked pending recovery, it will be scheduled for a download immediately rather than waiting for a maintenance time.

modem recovery maintenance max-download 3

A maximum of three SPEs will be in maintenance mode at once.

modem recovery maintenance window 60

Any idle ports will be busied out during the maintenance window. If all active calls on the SPE drop during the maintenance window, the SPE will reload immediately; otherwise, wait up to 60 minutes before performing the maintenance action.

modem recovery maintenance action drop-call

The action to perform when the window expires is to disconnect active calls and reload the SPE.

Moderately aggressive modem recovery

In this case, the administrator finds modem need recovering at the rate of only 2 or 3 DSPs per day. The customer has several spare DSPs per chassis, and a usage is lowest from 02:00 to 04:00, so it is acceptable to delay recovery till that time.

Command

Explanation

modem recovery threshold 10

We keep this at 10 to maintain the degree of certainty that the modem is indeed bad.

modem recovery action download

Recovery will still be downloading fresh code to the SPE.

modem recovery maintenance time 02:00

Rather than starting immediately, however, this administrator is willing to wait until a lull in usage.

modem recovery maintenance stop-time 04:00

The usage slowdown only lasts until 4 AM.

modem recovery maintenance max-download 5

The maximum number of SPEs tied up is raised to 5 because during the maintenance window fewer modems are needed to service incoming requests.

modem recovery maintenance window 90

For the same reasons above, the administrator can also afford to wait 90 minutes for the remaining calls to drop.

modem recovery maintenance action drop-call

If a SPE needs to be recovered, this administrator is still willing to disconnect a user to make sure the modems are available during peak usage hours.

Conservative modem recovery

In this case, the administrator has many spare modems in the chassis, and is unwilling to drop user calls.

Command

Explanation

modem recovery threshold 7

In this case the threshold is lowered to 7 to make it more likely for the modem to be marked bad.

modem recovery action download

Downloading fresh code to the SPE is once again the specified action.

modem recovery maintenance time 02:00

This administrator is also willing to wait until the number of users is smallest.

modem recovery maintenance stop-time 05:00

This maintenance window lasts until 5 AM.

modem recovery maintenance max-download 5

The maximum number of SPEs would still be 5 because during the maintenance window fewer modems are needed to service incoming requests.

modem recovery maintenance window 120

This defines a longer maintenance window (120 minutes).

modem recovery maintenance action reschedule

Since this NAS has spare modems, the administrator can afford to have some of the modems out of service during peak usage hours. There is no need to disconnect users.

Because the NAS has many spare modems, the administrator configures a threshold of "7". This will quickly detect a failed modem, but at the cost of some modems being marked for download prematurely. During the maintenance period, if the calls should fail to drop, the recovery will be automatically rescheduled, rather than drop existing calls.

Using Modem Recovery

Automatic Identification of Bad Modems

MICA modems have shown to maintain a healthy call success rate (CSR) of 90 to 95 percent under normal usage. What this means is that 90 to 95 percent of all calls which are allocated to a modem successfully connect, link, trainup, negotiate, and finally enter a Steady State where the client and NAS modems can transfer data. The 5 to 10 percent failure rate can be associated to numerous client side issues including such as incompatible clients and clients disconnecting. These client side issues cannot be viewed as a problem with the MICA modem. So, in the worst case scenario, you can expect that at least 1 call in 10 attempts will fail. Thus, basic statistics tell us:

The probability of 1 consecutive failure call attempts is: 1/10

The probability of 2 consecutive failure call attempts is: 1/10 x 1/10

The probability of 3 consecutive failure call attempts is: 1/10 x 1/10 x 1/10

The probability of "n" consecutive failure call attempts is: (1/10) raised to the power of "n"

As such, basic statistics tell us that even under a situation of a normal CSR rate of 90 percent, the probability of a good modem failing to enter a steady state (after a call has been allocated to it) drops significantly after each failed call attempt. Therefore, under this analysis, setting the modem threshold to a value of ten makes it extremely unlikely that a modem with ten consecutive failures would be operating properly.

Reloading Firmware

As mentioned earlier, MICA modems are implemented in a modular fashion whereby six (HMM) or 12 (DMM) modems are allocated to a single CP overseeing the operation of the DSPs. This was done to minimize costs and complexity. An unfortunate consequence of this design is that the NAS is unable to download DSP firmware to a single modem, but requires all 6 or 12 modems to be reloaded at the same time. This issue is not significant when booting the NAS because no active calls are being processed at that time. However, this issue becomes significant when trying to load firmware for either recovery or upgrade purposes. The objective is to reload the modem module with a minimal impact to the end users and the NAS operation. For this purpose, you have two tools at your disposal:

  • Module "busyout": This basically locks all modems on the module as being busy and will not allow new calls to be allocated on any of the modems until the "busyout" is removed. This is usually after the modem module is reloaded. Existing calls on modems are not affected when module is in the "busyout" state.

  • Hourly utilization analysis: Modem usage is actually quite predictable. There are telecommuters who use modems between 7:00 AM and 6:00 PM who provide a consistent call volume throughout the business day. Then, there are nightly surfers who surf the Web between 6:00 PM and 2:00 AM. As a result, modem usage between 2:00 AM and 7:00 AM is typically at its lowest.

The "busyout" tool is currently widely used for firmware upgrades. However, this tool has a significant drawback. If the module is left in a "busyout" state until all calls drop, then a single modem call that does not disconnect can prevent new calls from being accepted on that module. For example, if you have one active call in a module of twelve modems and the remaining eleven modems do not have existing calls, this can seriously impact a NAS's ability to perform at top capacity. This is because you have removed these eleven modems from service.

To avoid this, the modem recovery feature uses a firmware reload algorithm which will attempt to reload the module firmware with the least possible impact and still retain a good chance of getting the firmware downloaded to the modems. This is done by:

  • The firmware download taking place as soon as possible without requiring a "busyout". If any modem on a given module is in either a recovery pending or upgrade pending state (seen as a "P" state), and there are no active calls left on that module, the recovery mechanism will download the module right away. This mechanism should do most of the downloads in a safe and controlled fashion without requiring any modems to be in "busyout" state to accomplish the objective.

  • Having "busyout" scheduled during the off hours, so modem recovery maintenance can be performed on the modules. This is necessary for a NAS which is heavily loaded throughout the day. Therefore, it is necessary that no new calls get allocated to a modem module and the active calls have a chance to drop normally before proceeding with the download. However, unlike the regular "busyout", the modem recovery mechanism only puts the module in "busyout" for a specified window of time. If the window expires for the download, it cannot continue anymore with the "busyout" of the module, and must do something different to return the modem capacity back to the NAS. Also, you must manage the amount of modem modules which can be in the "busyout" state at the same time. Even though you are in the off hours when performing this action, you really should not "busyout" more than 20 percent of modem modules at one time.

Explaining the Configuration

The "busyout" behavior is managed with the modem recovery maintenance configuration, which includes the time when recovery starts (default is 3:00 am), the window ( maximum "busyout" duration for a single module), and max-download (the maximum number of modem modules which can be in the "busyout" state at the same time during the window - (default is 20 percent of NAS capacity and is dynamically calculated).

Consider the following settings on a NAS with 10 modem modules (all needing to be reloaded) and the following configuration:

modem recovery maintenance time 00:00 (hh:mm) 
modem recovery maintenance window 60 (minutes) 
modem recovery maintenance max-download 2 (number of modules) 
   TIME
 00:00 01:00 02:00 03:00 04:00 05:00 06:00
   |     |     |     |     |     |     |
   ------------------------------------------------------------------->
   ^     ^     ^     ^     ^     ^
   |     |     |     |     |     |
   |     |     |     |     |     - should be finished at 5 AM
   |     |     |     |     - window to download last 2 modules
   |     |     |     - window to download next 2 modules
   |     |     - window to download next 2 modules
   |     - window to download next 2 modules
   - window to download first 2 modules

In the above case, the NAS will be in a recovery maintenance "busyout" state for at most five hours. This is a very unlikely situation, but can easily be handled by the recovery maintenance process.

The following table explains the configuration commands available for fine tuning the modem recovery process:

modem recovery threshold

Number of consecutive call attempts which fail to trainup before you consider the modem faulty. The default is set to 30.

modem recovery action

Once a modem has been determined to be faulty, the configured action will take place on the modem. The following choices are possible:

  • disable: Mark the modem bad (B).

  • none: Ignore the recovery threshold and continue to accept calls.

  • download: Set the modem into a recovery pending state; therefore, stopping the modem from accepting new calls. (This is the default).

modem recovery maintenance

Every 24 hours, the modem recovery maintenance process will wake up and attempt to recover any modems which are in the recovery pending state.

modem recovery maintenance time

The actual time of day when the modem recovery maintenance process wakes up and starts recovering MICA modems. (The default is 3:00 am).

modem recovery maintenance window

When a MICA module attempts to reload it's portware, it must avoid taking down any modem connections which may exist. Because of this, the recovery process sets all modems not currently in use to the recovery pending state. If any modems on the module are active, the recovery process waits for the calls to terminate normally.

In order to avoid capacity problems from attempting recovery for an excessively long period. A maintenance window is configured to require the modem recovery to take place within that timeframe or a given action will be performed on that module when the window expires. (The default window is 60 minutes).

modem recovery maintenance action

When the modem recovery maintenance window expires, one of the following actions will be performed on the modem module awaiting recovery:

  • disable: Mark the originally faulty modem as being bad and return all other modems back into service.

  • reschedule: Leave the originally faulty modem as needing recovery and return all other modems back into service. (This is the default).

  • drop-call: Force the recovery by dropping any active calls remaining on modems within the module.

modem recovery maintenance max-download

When the modem recovery maintenance process starts, it attempts to recover all modems in the recovery pending state. This can potentially be all modules on a given system. To avoid taking down all modems on a given system, only a maximum number of module recoveries can take place at one time. The default is dynamically calculated to be 20 percent of the modules on a given system. This command allows this value to be overridden.

modem recovery maintenance schedule

Determine when you are to attempt module recovery as described below:

  • immediate: Attempt to recover module right away.

  • pending: Mark the modem recovery pending and wait until maintenance window. (This is the default.)

modem recovery maintenance stop-time

Time of day to stop all pending recovery maintenance tasks. Some customers have specific maintenance times which can be fine tuned. If you prefer that the maintenance window not exceed a certain time of day, this option can be useful.

There is no default stop-time.


Related Information



Updated: Jan 25, 2008 Document ID: 7042