Guest

Cisco BTS 10200 Softswitch

Field Notice: BTS 10200 Softswitch EVT Process Causes Deferrors in Oracle DB


February 7, 2005


Products Affected

BTS10200 R3.5.X

Problem Description

An unusually high number of database 3 alarms, errors in Deferror queue, are seen due to a configuration issue with the Event (EVT) process. An example of an alarm is as follows:

ID=1107072602939 
TYPE=DATABASE 
NUMBER=3 
DESCRIPTION=There are errors in EMS database DefError queue; contact DBA 
SEVERITY=CRITICAL ALARM_STATUS=ON 
TIMESTAMP=2005-01-31 22:40:02 
ORIGIN=DATABASE 
THREAD=UNSPECIFIED THREAD 
COMPONENT_ID=EM01 
DW1=ems-1 
DW2=optical1 
DW3=1_transactions_in_error

Background

The Event Process runs on both the active and secondary EMS. Every 30 minutes, these two processes delete old entries from the Oracle tables EVENT_LOG and ALARM_LOG. However, the Event Process on the primary side is deleting entries on the active DB while the Event Process on the standby is deleting entries on the standby DB. The problem arises when both deletions, on active and on standby, occur within 2-3 seconds of each other and cause Oracle Replication to generate conflicting transaction errors.

Problem Symptoms

The dbadm tool can be used to display replication error transactions (Deferror) as follows:

  1. Login to the EMS as root

  2. su - oracle

  3. dbadm -r get_deferror

    Oracle error ORA-01403: no data found is the typical output from get_deferror when the error transactions are due to the EVT configuration issue described in this field notice.

  4. dbadm -r get_defcall | more

This will show the REP_DELETE operation on EVENT_LOG or ALARM_LOG from replication error queue.

Workaround/Solution

Modify the Platform Programs table in the EMS to indicate that EVT should only be started when the EM01 goes ACTIVE.

On both EMS's:

. /opt/ems/bin/Runtime.sh [ENTER] 
mysql -u root -pmc68000 -h `hostname` -P 3307 -C Ems [ENTER] 
mysql> UPDATE PlatformPrograms set manage='Y' WHERE id='EVT';[enter] 
mysql>exit 

kill `ps -ef | grep -v grep | grep EVT | awk '{print $2}'` 

on both EMS's:

 
pkill smg3

Verify on primary the evt is still running

ps -ef|grep -i evt

Verify on secondary evt is not running

ps -ef|grep -i evt

This notice applies to RLS 3.x customers only and it is being tracked by defect CSCeg42249.

Additional Information

Checking Deferrors

In Oracle, there is a tool called dbadm which will be used for this procedure. When executing the procedure you will have output generated for specific Oracle queues as depicted below. The dbadm tool should never be used to truncate anything other than the deferror queue. The command dbadm -A truncate_def should never be used on a live system as this command truncates not only the deferror queue but also the deftran and deftrandest queues.

On the standby EMS:

su - oracle 
dbadm - C rep 

optical2:test-ems-2:/opt/orahome$ dbadm -C rep 

OPTICAL2::Deftrandest is empty? YES 
OPTICAL2::dba_repcatlog is empty? YES 
OPTICAL2::Deferror is empty? NO <==== 
OPTICAL2::Deftran is empty? NO 
OPTICAL2::Has no broken job? YES 
OPTICAL2::JQ Lock is empty? YES 

OPTICAL1::Deftrandest is empty? YES 
OPTICAL1::dba_repcatlog is empty? YES 
OPTICAL1::Deferror is empty? YES 
OPTICAL1::Deftran is empty? YES 
OPTICAL1::Has no broken job? YES 
OPTICAL1::JQ Lock is empty? YES 

These queues will be defined as the following:

Deferror - Contains the ID of each transaction that could not be applied. You can use this ID to locate the queued calls associated with this transaction. These calls are stored in the DEFCALL view. In other words, deferror is used for storing the ID of each replication transaction that could not be applied because of data conflict. This queue should always be empty.

Deftran - Is used for recording all deferred transactions in the deferred transactions queue on the current BTS system. It is normal to see that this queue is not empty.

Deftrandest - Lists the destinations for each deferred transaction in the deferred transactions queue on the current BTS system. It is normal to see that this queue is not empty.

Dba_repcatlog - This is an internal debugging queue that should be empty. If it is not, then Cisco needs to be called.

Has no broken job - This is an internal debugging queue that should be empty. If it is not, then Cisco needs to be called.

JQ Lock - This is an internal debugging queue that should be empty. If it is not, then Cisco needs to be called.

Example #1 --- Not OK to truncate deferror queue

On the standby EMS:

su - oracle 
dbadm -C rep 

optical2:test-ems-2:/opt/orahome$ dbadm -C rep 

OPTICAL2::Deftrandest is empty? YES 
OPTICAL2::dba_repcatlog is empty? YES 
OPTICAL2::Deferror is empty? NO <==== 
OPTICAL2::Deftran is empty? NO 
OPTICAL2::Has no broken job? YES 
OPTICAL2::JQ Lock is empty? YES 

OPTICAL1::Deftrandest is empty? YES 
OPTICAL1::dba_repcatlog is empty? YES 
OPTICAL1::Deferror is empty? YES 
OPTICAL1::Deftran is empty? YES 
OPTICAL1::Has no broken job? YES 
OPTICAL1::JQ Lock is empty? YES

Notice the deferrors present.

Go to the side that has the errors, in this case optical2/standby ems

Do the following on side with deferrors

su - oracle 

optical2:test-ems-2:/opt/oracle/admin/rep$ dbadm -r get_deferror 
Connected. 

List of replication error transactions with start_time:

TRAN_ID FROM_DB START_TIME ERROR_MSG 
--------------- -------- ---------------- -------------------------------------- 
6.95.2954 OPTICAL1 05Jan29 14:44:12 ORA-01403: no data found 

This gives the transaction id

Do the following and look for that transaction id

optical2:test-ems-2:/opt/oracle/admin/rep$ dbadm -r get_defcall_order 
Connected. 

Replication transaction operations (calls) in the order of start_time:

CALLNO TRAN_ID PACKAGENAME PROCNAME START_TIME 
------- ---------- ------------------------- --------------- ----------------- 
0 6.95.2954 SECURITYLEVELS$RP REP_UPDATE 005Jan29 14:44:12 

Notice that this deferror is not for a event or alarm table. At this point stop and open a case with Cisco.

Example #2 --- OK to truncate deferror queue

On the standby EMS:

su - oracle 
dbadm -C rep 

optical2:secems05:/opt/orahome$ dbadm -C rep 

OPTICAL2::Deftrandest is empty? YES 
OPTICAL2::dba_repcatlog is empty? YES 
OPTICAL2::Deferror is empty? NO <==== 
OPTICAL2::Deftran is empty? NO 
OPTICAL2::Has no broken job? YES 
OPTICAL2::JQ Lock is empty? YES 

OPTICAL1::Deftrandest is empty? YES 
OPTICAL1::dba_repcatlog is empty? YES 
OPTICAL1::Deferror is empty? YES 
OPTICAL1::Deftran is empty? YES 
OPTICAL1::Has no broken job? YES 
OPTICAL1::JQ Lock is empty? YES 

In this example there are deferrors on side B, this is denoted by OPTICAL2(side B).

Go to side b and issue the following commands

su - oracle 
dbadm -r get_deferror 

Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001 
optical2:secems05:/opt/orahome$ dbadm -r get_deferror 
Connected. 

List of replication error transactions with start_time:

TRAN_ID FROM_DB START_TIME ERROR_MSG 
--------------- -------- ---------------- -------------------------------------- 
4.29.671 OPTICAL1 04Nov24 15:13:44 ORA-01403: no data found 
2.64.671 OPTICAL1 04Nov24 15:13:49 ORA-01403: no data found 

This gives the transaction id

Do the following and look for that transaction id

optical2:secems05:/opt/orahome$ dbadm -r get_defcall_order 
Connected. 

Replication transaction operations (calls) in the order of start_time:

CALLNO TRAN_ID PACKAGENAME PROCNAME START_TIME 
------- ---------- ------------------------- --------------- ----------------- 
0 4.29.671 EVENT_LOG$RP REP_DELETE 004Nov24 15:13:44 
0 2.64.671 EVENT_LOG$RP REP_DELETE 004Nov24 15:13:49

Both of these transactions are for events, therefore you can truncate deferror with the following commands. Login to the side that is reporting deferrors, in this case OPTICAL2(side B)

su - oracle 
dbadm -A truncate_deferror 

optical2:secems05:/opt/orahome$ dbadm -A truncate_deferror 

***************************************************************** 
You are about to execute the following process: 

==> Truncate replication Deferred Error Queue tables. 

database: optical2 
hostname: secems05 

***************************************************************** 

Do you want to continue? [y/n] y 
INFO: Truncating replication deferred error queue tables... 
INFO: Truncating table SYSTEM.DEF$_ERROR 
INFO: Truncating table SYSTEM.DEF$_AQERROR 
INFO: Deferred error queue tables are truncated.

Once the deferrors are truncated, issue the following commands to check the queue

optical2:secems05:/opt/orahome$ dbadm -C rep 

OPTICAL2::Deftrandest is empty? YES 
OPTICAL2::dba_repcatlog is empty? YES 
OPTICAL2::Deferror is empty? YES <==== 
OPTICAL2::Deftran is empty? YES 
OPTICAL2::Has no broken job? YES 
OPTICAL2::JQ Lock is empty? YES 

OPTICAL1::Deftrandest is empty? YES 
OPTICAL1::dba_repcatlog is empty? YES 
OPTICAL1::Deferror is empty? YES 
OPTICAL1::Deftran is empty? YES 
OPTICAL1::Has no broken job? YES 
OPTICAL1::JQ Lock is empty? YES

DDTS

To follow the bug ID link below and see detailed bug information, you must be a registered user and you must be logged in.

DDTS

Description

CSCeg42249 (registered customers only)

EVT Process causes Deferrors in Oracle DB

For More Information

If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:

Receive Email Notification For New Field Notices

Product Alert Tool - Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.