Introduction
This document describes a step by step procedure to resolve database replication or synchronization issue in Prime Network by rebuilding standby database from Primary database.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Use this procedure to rebuild secondary database only if SWITCHOVER_STATUS of primary database is into UNRESOLVABLE GAP.
- Ensure that the primary database is in READ WRITEand secondary database is in READ ONLYor READ ONLY WITH APPLYmodes.
- Ensure that you identify correct primary and secondary Prime Central gateway/database.
Components Used
The information in this document is based on these software and hardware versions:
- Prime Central release 1.2 and above
- Oracle Database 11G release
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Background Information
1. Use this command to know switchover_status of primary database:
SQL> select switchover_status from v$database
SWITCHOVER_STATUS
--------------------
UNRESOLVABLE GAP
Note: Prime Central Geo HA switchover fails abruptly leaving the Prime Central GEO HA system and/or database role status corrupt (both primay or both standby) and then you need to rebuild either primary or secondary depending upon last active/standby status.
Note: For all other cases, open SR with Cisco TAC to resolve database replication issue.
2. Use this command to know the current mode of primary and secondary database:
SQL> select open_mode from v$database;
3. Use this command to know ORACLE_SID of primary and secondary database:
On Primary Gateway from oracle user:
echo $ORACLE_SID -> output should be “primedb”
On Secondary Gateway from oracle user:
echo $ORACLE_SID -> output should be “primstdb”
Problem
Prime Central GEO HA switch and/or failover/failback procedure fails when Active and Standby databses become out of sync to each other. This results in both the primary and standby databses to either become active or standby at the same time.
Troubleshoot
Before you follow the solution you can perform basic trobleshooting steps:
1. Check the network connectivity and/or latency related issues between Primary and Secondary Prime Network Gateway.
2. Check this database log on Primary to find any database related ORA errors:
<database_home_directory>/diag/rdbms/anadb/anadb/trace/alert_anadb.log
3. Check open_mode, current_scn and switchover status on Primary and Secondary database.
SQL> select open_mode from v$database
SQL> select current_scn from v$database
SQL> select switchover_status from v$database;
4. Root cause for database replication mostly could be network communication issue betwee Primary and Secondary Prime Central database, corrupt database or similary database related errors.
Solution
Step 1. Verify the ORACLE_SID on both primary and secondary Gateway/databases.
On Primary Gateway/database ORACLE_SID = primedb
On Secondary Gateway/database ORACLE_SID = primdtdb
If, ORACLE_SID on either of Primary or Secondary Gateway is not as mentioned earlier, use this command to configure the correct SID:
setenv ORACLE_SID = <value>
Note: Here <value > = primedb or primstdb is based on primary or secondary.
Step 2. On Active and Standby database login as sysdba and find the directory path to the datafile and redo logs and archive log files.
Make use of these commands:
To find the datafiles:
SQL> select name from v$datafile;
To find the redo log files:
SQL> select member from v$logfile;
To find the archive log:
SQL> show parameter log_archive_dest_1;
Step 3. To rebuild the database, run this script in correct syntax after you identify the correct scenario described in step 4.
sh PCoracleADG.ksh [PRIMARY] [STANDBY] [DB_TO_BE_DROPPED] [SYSTEM_PASSWD] [ORACLE_BASE]
[ORACLE_USER] [ARCHIVED_LOG_LOCATION] [ORACLE_DATA_FILES_LOCATION] [REDO_LOG_LOCATION]
Note: Script is copied under $ORACLE_BASE/standby folder and needs to run as an oracle user.
Step 4. Identify any scenario with current status of your setup and proceed accordingly:
Note: As an example it is assumed that HA1 is Primary Gateway/database and HA2 is secondary Gateway/Database
Scenario 1: HA1 is active and needs to rebuild standby database on HA2. Make use of these parameters and run the script mentioned in Step 3. on HA2.
PRIMARY = primedb
STANDBY = primstdb
DB_TO_BE_DROPPED = primstdb
SYSTEM_PASSWD = use Step 5
ORACLE_BASE = /orahome/oracle
ORACLE_USER = oracle
ARCHIVED_LOG_LOCATION = output of ‘show parameter log_archive_dest_1;&rsquo
ORACLE_DATA_FILES_LOCATION = output of ‘select name from v$datafile;&rsquo
REDO_LOG_LOCATION = output of ‘select member from v$logfile;’
Scenario 2: HA2 is active and needs to rebuild standby database on HA1. Use these parameters and run the script mentioned in Step 3. on HA1.
PRIMARY = primedb
STANDBY = primstdb DB_TO_BE_DROPPED = primedb SYSTEM_PASSWD = use Step 5 ORACLE_BASE = /orahome/oracle ORACLE_USER = oracle ARCHIVED_LOG_LOCATION = output of ‘show parameter log_archive_dest_1;&rsquo ORACLE_DATA_FILES_LOCATION = output of ‘select name from v$datafile;&rsquo REDO_LOG_LOCATION = output of ‘select member from v$logfile;’
Step 5. Use this procedure to find out <SYSTEM_PASSWD>:
su - prime on HA1 or HA2
grep Embedded_SYSTEM_PASS= install/conf/.db.conf
For example if it has Embedded_SYSTEM_PASS=90f8006cd6bc0dde, then:
- java -cp install/utils/encryptionUtil.jar EncodeDecode decrypt 90f8006cd6bc0dde.
- Prime Central returns output string which is used as SYSTEM_PASSWD in Step 4.
Verify
Database verification on Primary and Secondary Prime Network gateway:
1. Verify that the number and the names of redo log files are same on Active and Standby database.
2. Verify that the number and the size of the datafiles on Active and Standby database are same.
3. Use this command on both Active and Standby database to show that the current SCN on Standby database can catch up with the scn on Primary database:
sqlplus / as sysdba
SQL>select current_scn from v$database;
4. Verify that the database_roleof Active database is PRIMARY and Standby database is LOGICAL_STANDBY.
sqlplus / as sysdba
SQL>select database_role from v$database;
5. Verify that the open_mode of Active database is READ WRITE and READ ONLY WITH APPLY on Standby database.
sqlplus / as sysdba
SQL>select open_mode from v$database;
6. Verify that the switchover_status of Active is TO STANDBY and NOT ALLOWED on Standby database:
sqlplus / as sysdba
SQL>select switchover_status from v$database;
7. Validate that the archive logs get transferred:
On Activedatabase:
SQL> alter system switch logfile;
On Standby database:
Check to ensure that a new file is created in ~/arch