Troubleshooting Guide
Appendix C - Overload Control
Downloads: This chapterpdf (PDF - 443.0KB) The complete bookPDF (PDF - 10.87MB) | Feedback

Overload Control

Table Of Contents

Overload Control

Overload Control Phases

Detecting Overload

Computing MCL

Reducing Overload

Slowing Overload Reduction

Configuring

Setting the Minimum System MCL

Configuring the SIP Response Code

Configuring Emergency Call Handling

Configuring SIP Message Handling

SIP Message Types

Message Rejection Logic

Operating

Viewing MCL

Editing the OLM.CFG File

Sample olm.cfg

Measurements

Call Processing Measurements

Service Interaction Manager Measurements

Traffic Measurements Monitor Counters

Miscellaneous Measurements

Troubleshooting

Events and Alarms

Congestion Status—Maintenance (112)

CPU Load of Critical Processes—Maintenance(113)

Queue Length of Critical Processes—Maintenance(114)

IPC Buffer Usage Level—Maintenance(115)

CA Reports the Congestion Level of FS—Maintenance(116)

Logs


Overload Control


Revised: July 22, 2009, OL-8723-19

Overload Control Phases

Overload is a switch condition that exists when system resources cannot handle system tasks. Increases in call traffic or messages indirectly related to call traffic usually cause overload (Table C-1).

The Overload Control feature supports the BTS Call Agent (CA) and Feature Server (FS). Overload Control detects, controls, and manages overload from all types of networks (SIP, SS7, ISDN, MGCP, H.323):

Table C-1 Overload Control Phases

Overload Control Phase
Actions

1. Detection

Measures and compares factors to threshold values.

Determines system congestion and machine congestion level (MCL).

Detects BTS machine congestion conditions in 5 levels: none, mild, moderate, severe, emergency.

2. Control

Decreases overload, this is configurable and varies with MCL but usually means rejecting a percentage of incoming calls.


Caution To control overload you must edit configuration files. Changes to these files significantly impact system performance. Edit them only under the direction of a Cisco engineer.

3. Management

Affects the following switch areas:

Alarms

Logs

Billing

Measurements


Detecting Overload

In the detection phase of Overload Control any one of three factors can have the highest MCL. This value dictates the MCL for the entire system. The three factors are:

Critical processes CPU usage—The olm.cfg configuration file has UNIX "nice" values. BTS uses these values to calculate CPU utilization for critical processes over a small period of time (2 seconds minimum). Values for each process should be at or below the set value.

Critical process queue lengths—The olm.cfg configuration file has critical queue lengths for BTS processes like BCM, MGA, SGA, SIA, ISA and H3A. You can define multiple (32 factors total) critical queues for any BTS process. BTS monitors the usage proportion of each critical IPC queue.

IPC buffer pool usage—BTS monitors the proportion of available buffers in the IPC buffers pool, this reflects MCL: the higher the usage, the greater the congestion.

BTS detects its own MCL in five levels:

MCL0—No congestion and no need for any abatement.

MCL1—Mild congestion. Call rejection starts as configured in olm.cfg.

MCL2—Moderate congestion. Call rejection increases as configured in olm.cfg.

MCL3—Severe congestion. Call rejection increases still more as configured in olm.cfg.

MCL4—Emergency congestion. BTS rejects all calls including emergency calls.

Computing MCL

BTS computes factor levels by calculating averages for each factor. The rate of sampling (number of slots) can be configured per factor (3-10 slots). The MCL is set according to a factor level. In Table C-2 thresholds are set to 70, 80, 90, and 95 percent.

Table C-2 MCL Thresholds

Onset /abatement thresholds
Factor Level
MCL

-

0-69

MC0

level_1_threshold = 70

70-79

MC1

level_2_threshold = 80

80-89

MC2

level_3_threshold = 90

90-95

MC3

level_4_threshold = 95

95-100

MC4


Reducing Overload

When MCL exceeds MCL0, Overload Control reduces MCL as follows:

Selectively reject new calls by the signaling adapters—A percentage of calls and messages are rejected at the current MCL level, based on olm.cfg. Emergency calls are not rejected at MCL 1-3, but all calls, including emergency calls, are rejected at MCL4.

Tell the network to stop sending traffic—This starts when BTS is mildly congested (at MCL1) and continues through all higher MCL levels until the overload condition abates to MC0. This action can only be applied to the following types of networks:

SS7 sends Automatic Control Level (ACL) parameter in ISUP release messages.

H.323 sends Resource Availability Indicator (RAI) message.

SIP sends 500 or 503 with a retry.

CA stops sending triggers to POTS FS—When the FS is congested the following occurs:

FS notifies CA once of its congested status.

CA sends only emergency triggers to FS, as it manages FS's congestion abatement

Slowing Overload Reduction

Sudden abatement reduction may cause MCL to rapidly increase again. To counteract MCL "bouncing", MCL reduces one MCL level at a time, regardless of how low computed MCL becomes. This permits the system MCL to reduce gracefully over a number of intervals.

Damping also slows the system MCL. Damping values are in milliseconds and define the shortest amount of time an MCL level can exist. You can configure the damping time for each MCL level in olm.cfg using:

level_1_damping_time

level_2_damping_time

level_3_damping_time

level_4_damping_time

Configuring

This section explains how to perform the following tasks:

Setting the Minimum System MCL

Configuring the SIP Response Code

Configuring Emergency Call Handling

Configuring SIP Message Handling


Note These tasks include examples of CLI commands that illustrate how to provision the specific feature. For a complete list of all CLI tables and tokens, refer to the Cisco BTS 10200 Softswitch Command Line Interface Reference Guide.


Setting the Minimum System MCL


Warning Manually setting minimum MCL means call processing is affected exactly as it would be if MCL were set at that level due to actual system overload/congestion. Use it for test purposes only.


To set the minimum system MCL, enter a command similar to the following:

control machine-congestion-level platform_id=CA146, mcl=2;

MACHINE CONGESTION LEVEL ON CALL AGENT CA146 IS... -> 

ADMIN MCL -> NO_CONGESTION(0)
COMPUTED MCL -> NO_CONGESTION(0)
EFFECTIVE MCL -> NO_CONGESTION(0)
FEATURE SERVER CONGESTION -> 
FSAIN205 IS NOT CONGESTED
FSPTC235 IS NOT CONGESTED
REASON -> ADM executed successfully
RESULT -> ADM configure result in success

Reply : Success: at 2006-02-28 09:54:27 by btsadmin

Configuring the SIP Response Code

When rejecting a SIP message during overload, you can use either of the following:

500 Server Internal Error

503 Service Unavailable

Use the following command. The default value is 503.

add ca_config type=SIA-OC-REJECTION-RESP; datatype=integer; value=500; 

Configuring Emergency Call Handling

The BTS checks the:

Called-party number for all incoming calls against the EMERGENCY-NUMBER-LIST

Calling party category (CPC) in ISUP calls

If the BTS determines it is an emergency call and the MCL is 1, 2, or 3, the BTS gives it priority and does not rejected the call. If the MCL is 4, The BTS rejects all calls, including emergency calls.

To add a number to the EMERGENCY-NUMBER-LIST, enter a command similar to the following:

add emergency-number-list digit_string=911;

Reply : Success: at 2006-02-28 09:48:40 by btsadmin
MNT add successful
Transaction 934823299797597704 was processed.

To display the EMERGENCY-NUMBER-LIST, enter:

show emergency-number-list;

DIGIT_STRING=911

Reply : Success: at 2006-02-28 09:48:45 by btsadmin
Entry 1 of 1 returned.

To delete a number from the EMERGENCY-NUMBER-LIST, enter a command similar to the following:

delete emergency-number-list digit_string=911;

Reply : Success: at 2006-02-28 09:52:20 by btsadmin
MNT delete successful
Transaction 934823480106794504 was processed.

Configuring SIP Message Handling

When processing an incoming SIP call, the BTS looks at the MCL of the CA. It uses the following factors to decide whether to accept or reject the message:

SIP Message Type

Call type (normal or emergency)

Configured rejection percentage

Current MCL status

SIP Message Types

Message Rejection: INVITE

If overloaded BTS rejects a percentage of incoming INVITE messages. The percentage rejected is based on sia.cfg. Only new INVITE messages are checked for acceptance. Re-INVITE messages are always accepted.

Message Rejection: REGISTER

If overloaded BTS rejects a configured percentage of REGISTER messages.

Message Rejection: REFER

If overloaded BTS rejects a a percentage of incoming REFER messages.

Message Rejection: SUBSCRIBE

If overloaded BTS rejects a configured percentage of out-of-dialog SUBSCRIBE messages. The BTS also rejects SUBSCRIBE messages without call contexts. The BTS does not reject SUBSCRIBE messages received in an INVITE dialog.

Message Rejection: OPTIONS

If overloaded BTS rejects OPTIONS messages . There is no configuration required; all OPTIONS messages are rejected between MCL1 and MCL4

Message: Unsolicited NOTIFY Repression

If overloaded BTS does not send unsolicited NOTIFY messages (MWI requests) to endpoints. However, even if overloaded BTS does receive and process unsolicited NOTIFY requests.

UDP Messages

BTS drops messages like STUN if they are less than the configured size. This applies to UDP messages.

Message Rejection Logic

When the BTS rejects an incoming SIP call it responds with 500 or 503. Using CLI set the response code.

The BTS includes a "Retry-After" header in its response. The value (in seconds) in this header notifies the endpoint the BTS will not receive further requests for the specified time. For example, "Retry-After: 5" means the endpoint should send the next request to the BTS until after 5 seconds has passed.

Operating

This section explains how to perform the following tasks:

Viewing MCL

Editing the OLM.CFG File

It also explains how this feature affects the following operational area:

Measurements

Viewing MCL

To display the MCL, enter a command similar to the following:

status machine-congestion-level platform_id=CA146;

MACHINE CONGESTION LEVEL ON CALL AGENT CA146 IS... -> 

ADMIN MCL -> NO_CONGESTION(0)
COMPUTED MCL -> NO_CONGESTION(0)
EFFECTIVE MCL -> NO_CONGESTION(0)
FEATURE SERVER CONGESTION -> 
FSAIN205 IS NOT CONGESTED
FSPTC235 IS NOT CONGESTED
REASON -> ADM executed successfully
RESULT -> ADM configure result in success

Reply : Success: at 2006-02-28 09:54:27 by btsadmin

If platform_id is the FS, e.g., FSPTC235, the output shows MCL. If platform_id is the CA, e.g., CA146, the output includes congestion status of FSs as seen by the CA. Without this parameter the command displays the MCL of all platforms on the system.

Editing the OLM.CFG File

The Overload Manager (OLM) section appears in the platform.cfg file. Overload Control uses the following configuration files:

Table C-3 Configuration Files Used by Overload Control

Name
Location
Description

olm.cfg

/opt/OptiCall/ca/bin/

Specifies the parameters that control OLM

Exists in separate versions for Call Agent and each Feature Server

Stores configuration information (to compute MCL) in global data section

Shown in Appendix A


Caution Changes to olm.cfg significantly impact system performance. Edit it only under the direction of a Cisco engineer.

sia.cfg

(SIP Adapter)

/opt/OptiCall/ca/bin/

Has SIP timer values used during overload


Caution These values can be changed only by a Cisco engineer.


Sample olm.cfg

[MACROS]
#Dynamic defaults
#These macros can be used as default token values to make this file easier to maintain.
#i.e. for all OBJECTS that use them, the values for all can be changed in this one place.

#Slots
SLOTS=10

#Thresholds
THRESH1=50
THRESH2=70
THRESH3=90
THRESH4=95

#Alarm Step Size
STEP=5
[DEFAULTS]
#Static defaults
#These values take effect if no corresponding entry is made for a given OBJECT when the
#token is called for.

#Thresholds
level_1_threshold=50            #The value at which MCL level 1 is triggered for this 
factor
level_2_threshold=70            #The value at which MCL level 2 is triggered for this 
factor
level_3_threshold=90            #The value at which MCL level 3 is triggered for this 
factor
level_4_threshold=95            #The value at which MCL level 4 is triggered for this 
factor

#Alarm Step Size
info_alarm_step_size=0          #The number of percentage point steps that will cause an 
info
                                  #alarm for this factor
#Slots
slot_array_size=10              #The number of slots in the array of factor levels over 
which
                                  #the mean is taken

[SUPEROBJECT]
#Global data
olm_sampling_interval=2         #In milliseconds, how often OLM "wakes up" and performs
                                  #its computations

olm_printing_cycle=0            #In cycles of olm_sampling_interval how often to print
                                  #table at INFO3

level_1_reject_rate=10          #Percentage of new calls to reject when the system
                                  #reaches MCL 1
level_2_reject_rate=50          #Percentage of new calls to reject when the system
                                  #reaches MCL 2
level_3_reject_rate=90          #Percentage of new calls to reject when the system
                                  #reaches MCL 3

level_1_damping_time=1600       #The minimum amount of time (ms) that can be spent at MCL1
level_2_damping_time=1200       #The minimum amount of time (ms) that can be spent at MCL2
level_3_damping_time=800        #The minimum amount of time (ms) that can be spent at MCL3
level_4_damping_time=400        #The minimum amount of time (ms) that can be spent at MCL4

alarm_damping_time=60           #The minimum amount of time(s) before alarm 112 is changed

[OBJECT 1]
#Defines the section of the configuration file relating to CPU Utilization.

factor_type=cpu_utilization   #The type of this factor
cpu_collection_interval=5     #In seconds, the value passed to the gosGetCpuUsage()
                                #platform function.
cpu_nice_value=12             #The Unix nice value that is used to identify critical 
processes.

#Slots
slot_array_size=${SLOTS}      #The number of slots in the array of factor levels over 
which the
                                #mean is taken.
#Thresholds in %
level_1_threshold=85          #The value at which MCL level 1 is triggered for this factor
level_2_threshold=90          #The value at which MCL level 2 is triggered for this factor
level_3_threshold=95          #The value at which MCL level 3 is triggered for this factor
level_4_threshold=98          #The value at which MCL level 4 is triggered for this factor

#Alarm Step Size %
info_alarm_step_size=${STEP}  #The number of percentage point steps that will cause an
                                #info alarm for this factor
[OBJECT 2]
#Defines the section of the configuration file pertaining to IPC BufferPool Utilization.
factor_type=ipc_buff_pool     #The type of this factor

#Slots
slot_array_size=${SLOTS}      #The number of slots in the array of factor levels over 
which the
                                #mean is taken.
#Thresholds in %
level_1_threshold=${THRESH1}  #The value at which MCL level 1 is triggered for this factor
level_2_threshold=${THRESH2}  #The value at which MCL level 2 is triggered for this factor
level_3_threshold=${THRESH3}  #The value at which MCL level 3 is triggered for this factor
level_4_threshold=${THRESH4}  #The value at which MCL level 4 is triggered for this factor

#Alarm Step Size %
info_alarm_step_size=${STEP}  #The number of percentage point steps that will cause an
                                #info alarm for this factor
[OBJECT 3]
#Defines a section of the configuration file pertaining to monitoring of Critical Queue 
Sizes.
factor_type=critical_queue    #The type of this factor
process_name=BCM              #The 3 or 4 character process name.
thread_type=1                 #The numeric thread type associated with the queue.
thread_instance=1             #The numeric thread instance number associated with
                                #the queue being monitored.
#Slots
slot_array_size=${SLOTS}      #The number of slots in the array of factor levels over 
which the
                                #mean is taken.
#Thresholds in %
level_1_threshold=${THRESH1}  #The value at which MCL level 1 is triggered for this factor
level_2_threshold=${THRESH2}  #The value at which MCL level 2 is triggered for this factor
level_3_threshold=${THRESH3}  #The value at which MCL level 3 is triggered for this factor
level_4_threshold=${THRESH4}  #The value at which MCL level 4 is triggered for this factor
#Alarm Step Size %
info_alarm_step_size=${STEP}  #The number of percentage point steps that will cause an
                                #info alarm for this factor

Measurements

These tables list new, modified, or deleted measurements.


Note See the Measurements section of the BTS 10200 Operations and Maintenance Guide for a complete list of all traffic measurements.


Call Processing Measurements

Table C-4 lists the new call processing measurements provided to support this feature.

Table C-4 Call Processing Measurements Used by Overload Control

Measurement
Description

CALLP_OLM_OFFERED

The total number of calls offered to OLM

CALLP_OLM_ACCEPT

The total number of calls accepted by OLM

CALLP_OLM_REJECT

The total number of calls rejected by OLM

CALLP_OLM_ACCEPT_MCL0

Calls accepted by OLM at MCL0

CALLP_OLM_ACCEPT_MCL1

Calls accepted by OLM at MCL1

CALLP_OLM_ACCEPT_MCL2

Calls accepted by OLM at MCL2

CALLP_OLM_ACCEPT_MCL3

Calls accepted by OLM at MCL3

CALLP_OLM_REJECT_MCL1

Calls rejected by OLM at MCL1

CALLP_OLM_REJECT_MCL2

Calls rejected by OLM at MCL2

CALLP_OLM_REJECT_MCL3

Calls rejected by OLM at MCL3

CALLP_OLM_REJECT_MCL4

Calls rejected by OLM at MCL4

CALLP_OLM_REJECT_EMERGENCY

Emergency calls rejected at MCL4

CALLP_OLM_MCL1_COUNT

Total number of MCL1 occurrences

CALLP_OLM_MCL2_COUNT

Total number of MCL2 occurrences

CALLP_OLM_MCL3_COUNT

Total number of MCL3 occurrences

CALLP_OLM_MCL4_COUNT

Total number of MCL4 occurrences

CALLP_OLM_ISUP_MSG_DUMPED

Number of ISUP messages dumped at MCL4 by layer 3/4 interface (MIM) due to system overload.


Service Interaction Manager Measurements

Table C-5 lists the new Service Interaction Manager measurements provided to support this feature.

Table C-5 Service Interaction Manager Measurements used by Overload Control

Measurement
Description

SIM_OC_TRIG_FILTERED

The number of triggers dropped when the FS is overloaded (a single counter is used by SIM, which tracks the trigger filtering for all the FS). SIM will update this counter every time it filters a trigger due to congestion on a FS.

SIM_OC_EMG_TRIG_FORCED

The number of emergency triggers (i.e. TRIGGER_911) forced when the FS is overloaded (a single counter is used by SIM which tracks number of emergency triggers forced for all the FS). SIM will update this counter every time when it forces an emergency trigger (TRIGGER_911) to FS.

SIM_OC_TRIG_FORCED

The number of triggers forced when the FS is overloaded (a single counter is used by SIM which tracks the number of forced triggers for all the FSs). SIM will update this counter every time when it forces a trigger.


Traffic Measurements Monitor Counters

Table C-6 lists the new Traffic Measurements Monitor (TMM) measurements provided to support this feature.

Table C-6 TMM Timers used by Overload Control

Measurement
Description

SIA_OC_RX_INVITE_REJECT

The total number of incoming INVITE messages rejected by SIA due to overload.

SIA_OC_RX_REGISTER_REJECT

The total number of incoming REGISTER messages rejected by SIA due to overload

SIA_OC_RX_REFER_REJECT

The total number incoming REFER messages rejected by SIP due to overload.

SIA_OC_RX_SUBSCRIBE_REJECT

The total number of incoming SUBSCRIBE messages rejected.

SIA_OC_RX_UNSOL_NOTIFY_SUPP

The total number of unsolicited notification requests suppressed without sending to endpoints.

SIA_OC_RX_OPTIONS_REJECT

The total number of incoming OPTIONS messages rejected by SIA due to overload.


Miscellaneous Measurements

Table C-7 lists additional measurements added to support Overload Control.

Table C-7 Miscellaneous Measurements used by Overload Control

Timer
Description

ISUP_CONG_CALL_REJECTED

The congestion-rejected calls on a per trunk group basis. This is implemented for SGA.

POTS_OC_DP_RECEIVED

The number of Detection Points (DPs) reported during periods of congestion. This is being pegged by the FS

H323_OC_SETUP_REJECTED

The total number of incoming H.225 Setup messages rejected by the BTS due to overload.

MEAS_ISA_OC_SETUP_REJECTED

The number of ISDN calls rejected due to system overload.

MEAS_MGA_OC_CALL_REJECTED

The number of MGCP calls rejected due to system overload.


Troubleshooting

This section lists the Events and Alarms added to support this feature.

Events and Alarms

The FS sends an alarm when:

MCL changes

An individual critical factor reaches its threshold

The CA sends an Informational alarm when:

It receives a congested notification

It receives an abatement notification from an FS

Informational alarms are sent at fixed 25 percent increments. A configurable parameter, info_alarm_step_size, is added to each factor defined in olm.cfg. Ensure the value allows sufficient warning. The default for info_alarm_step_size is 5, giving factor informational alarms at 5, 10, 15 percent, etc.

Congestion Status—Maintenance (112)

The Congestion Status alarm (major) shows MCL changes, "System MCL Level". This is the effective MCL or the greater of the computed MCL and the administrative MCL.

When a new MAINTENANCE(112) alarm appears, old MAINTENANCE(112) alarms clear. When the system MCL falls to 0, the "new" alarm clears.

Dampen this alarm using alarm_damping_time in olm.cfg. The value of alarm_damping_time is the minimum amount of time that passes before the alarm is issued after the change occurred.

For additional information, refer to the "MAINTENANCE (112)" section on page 7-60.

CPU Load of Critical Processes—Maintenance(113)

The CPU Load of Critical Processes alarm (info) shows MCL from the CPU utilization factor crossed a multiple of the info_alarm_step_size in olm.cfg, "Factor Level" and "Factor MCL". This alarm appears for every crossing of the info_alarm_step_size in the upper and lower direction for this factor, but it is required to pass the next higher or lower level before appearing again.

For additional information, refer to the "MAINTENANCE (113)" section on page 7-60.

Queue Length of Critical Processes—Maintenance(114)

The Queue Length of Critical Processes alarm (info) shows MCL for defined critical process queue length factors crossed a multiple of the info_alarm_step_size in olm.cfg, process_name then 1 byte for "Factor Level" and 1 for "Factor MCL". This alarm appears for every crossing of the info_alarm_step_size in the upper and lower direction for this factor, but it is required to pass the next higher or lower level before appearing again.

For additional information, refer to the "MAINTENANCE (114)" section on page 7-61.

IPC Buffer Usage Level—Maintenance(115)

The IPC Buffer Usage Level alarm (info) shows MCL for IPC buffer usage factor crossed a multiple of the info_alarm_step_size in olm.cfg, "Factor Level" and "Factor MCL". This alarm is produced appears for every crossing of the info_alarm_step_size in the upper and lower direction for this factor, but it is required to pass the next higher or lower level before appearing again.

For additional information, refer to the "MAINTENANCE (115)" section on page 7-61.

CA Reports the Congestion Level of FS—Maintenance(116)

CA Reports the Congestion Level of FS alarm (info) shows CA received a congestion or abatement notification from an FS.

For additional information, refer to the "MAINTENANCE (116)" section on page 7-62.

Logs

Use the INFO logs to get differing levels of information about the alarms:

INFO1—Are included with each alarm

INFO3—Prints factors feature controlled by olm.cfg shows system overview

INFO4—Have extra detail

INFO5—Shows exact details of the factor MCL computations