Guest

Cisco BTS 10200 Softswitch

Field Notice: Potential Restart of s7a Process When Inbound PAM Message is Received


August 26, 2004


Products Affected

BTS SS7-ISUP - 3.5

Problem Description

When an inbound PAM message is received there is a potential for the s7a process to core dump due to an uninitialized variable length in an omni isup library. The s7a process is automatically restarted by the BTS application. However, in some cases offnet calls may fail on some of the CICs managed by the restarted process due to conflicts between the s7a process and the omni stack regarding CIC registration.

Background

The omni SS7 stack is the BTS10200's interface to the PSTN. The s7a process manages CIC registration with the omni SS7 stack and sends/receives SS7 messages to or from the stack. There are four s7a processes in the BTS10200 Call Agent. These four processes each act independently, and have responsiblity for managing their own unique group of SS7 circuits.

Problem Symptoms

When an s7a process is restarted a Signaling 119 alarm will be issued. If the secondary problem with CIC registration is encountered, offnet calls attempting to use some of the circuits managed by the restarted s7a process may fail. You may receive complaints from subscribers about offnet call failures and see the error messages similar to the following Call Agent trace log:

[***ERROR*** 15:54:09.243 S7A4 00-0 Event ] "a7n1_isup_4: aISUPputUmsg() failed, prim=258 (base+3) err=26 [Cic Is not Registerd:693 appl Id=30 

The key string in this message is "Cic Is not Registerd", other data may vary.

Offnet calls using the circuits managed by the other s7a processes are not affected.

Workaround/Solution

Cisco is pursuing a patch from the 3rd party SS7 stack vendor as the solution to this issue.

Cisco has identified a workaround to keep this issue from happening in the field as follows. The workaround will change the behavior of the BTS application software so that an automatic failover is done instead of restarting the s7a process.

A Recovery procedure for systems that experience the issue before the workaround is applied is included after the workaround.

Overview of workaround:

A parameter will be added to the platform.cfg file on both sides of the call agent. Both sides will be restarted to make the change take effect. One side will always be active so there is no impact to call processing other than attempted originations for a few seconds while switchovers are in progress. Active stable calls will not be affected by the switchovers.

Note:?The example steps in this procedure use call agent instance number CA146. Replace CA146 with the appropriate instance number for your system.

Steps to apply the workaround:

Assumed initial conditions are the primary side is active and the secondary side is in standby state as shown in this example output from nodestat;

| CA146 : ACTIVE | 
| NORMAL PRIMARY 2004/08/18-09:52:33 | 
| Replication Status: REPLICATING | 
  1. Navigate to the call agent bin directory and save a copy of the platform.cfg file.

    prica50# pwd 
    /opt/OptiCall/CA146/bin 
    prica50# 
    prica50# cp ?p platform.cfg platform.cfg.orig 
    prica50# 
    

    This step must be done on both the primary and secondary sides.

  2. Edit the platform.cfg file and add "-restart_s7a 0" to the Args line of the s7m process. This parameter should be added immediately before the "-s7a_bin" parameter. After the edit is complete, use the diff command to confirm the changes were made properly.

    prica50# vi platform.cfg 
    
    < make the described changes to the Args line for ProcName=S7M > 
    
    prica50# ls -l platform* 
    -rw-r--r-- 1 oamp staff 39247 Aug 18 10:59 platform.cfg 
    -rw-r--r-- 1 oamp staff 39231 Aug 10 12:51 platform.cfg.orig 
    prica50# 
    prica50# diff platform.cfg platform.cfg.orig 
    345c345 
    < Args=-node a7n1 -lname a7n1_ctrl -nproc 4 -noreset -restart_s7a 0 -s7a_bin s7a.CA146 
    --- 
    > Args=-node a7n1 -lname a7n1_ctrl -nproc 4 -noreset -s7a_bin s7a.CA146 
    prica50# 
    

    This step must be done on both the primary and secondary sides.

  3. Restart the secondary call agent application to make the change to platform.cfg take effect.

    secca50# platform stop -i CA146 
    Once the platform is down, restart it. 
    secca50# platform start -nocopy -i CA146 
    
  4. Force the secondary side to active via CLI

    CLI> control call-agent id=CA146; target-state=forced-standby-active
    
  5. Restart the primary call agent application to make the change to platform.cfg take effect.

    prica50# platform stop -i CA146 
    Once the platform is down, restart it. 
    prica50# platform start -nocopy -i CA146 
    
  6. Optional, but recommended step to return the primary side to active and release the force.

    CLI> control call-agent id=CA146; target-state=forced-active-standby 
    CLI> control call-agent id=CA146; target-state=normal 
    
    

RECOVERY PROCEDURE (if workaround has not been applied):

Note:?Do not use this procedure if the workaround has been applied!

If the signaling 119 alarm is issued, the active call agent application should be stopped immediately using the command platform stop -i CAxxx. This will force a switchover to the secondary side.

Note:?The switchover cannot be done via CLI>control call-agent.

Example:

A signaling 119 alarm is issued:

ID=1092426787515 
TYPE=SIGNALING 
NUMBER=119 
DESCRIPTION=S7A process faulty 
SEVERITY=CRITICAL 
ALARM_STATUS=ON 
TIMESTAMP=2004-08-13 15:53:07 
ORIGIN=S7A4S7A4.PRIMARY.CA146 <<<<< alarm was issued by the primary side 
THREAD=S7A4S7A4.PRIMARY 
COMPONENT_ID=S7A4 
DW1=S7A4 exited abnormally

The alarm was issued by the primary side. The primary side was active, therefore stop the primary call agent to force a failover to the secondary using the following commands; platform stop -i CAxxx. Replace xxx with the instance number of the system. In the alarm above the instance number is 146

When the primary side is stopped, the secondary side will become active and offnet calls should be processed normally.

Restart the primary side to restore redundancy. platform start -nocopy -i CAxxx

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.

DDTS

Description

CSCef49172 (registered customers only)

Title: Inbound PAM message can trigger core dump in Omni library

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.