This document is designed to help you troubleshoot the Cisco PGW 2200 when you receive the 'MSO refused, Warm start-up Failed' message. This error message appears after you issue the MML command sw-over::confirm. Since warm-start is a low priority and asynchronous activity, multiple components can be in the process of warm-starting their standby peers. The alarm helps an operator know when a standby unit is ready to take over as a standby. Raise the alarm when procM sends a Make Peer Standby request to IOCM. Only clear the alarm after warm-start is successful.
Cisco recommends that you have knowledge of these topics:
The information in this document is based on these software and hardware versions:
Cisco PGW 2200 Software Releases 9.3(2) and later
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
After you issue the MML command sw-over::Confirm on the Active Cisco PGW 2200, you receive this error.
PGW2200 mml> sw-over::Confirm MGC-01 - Media Gateway Controller 2004-05-26 11:37:37.061 MEST M DENY SROF "Proc Mgr" /* MSO refused, Warm start-up Failed. */ ; PGW2200 mml>
Note: A "Warm Restart" is an indication that the STANDBY is ready to receive check-pointing data. This usually happen on processes like the replicator and IOCC MTP3 through the IOCM. It can be that SS7 IOCC is the reason why IOCM rejects the sw-over command. Other issues can also be the case. For this case, collect the log information with the information in this section.
When the user attempts a manual switchover (MSO) and is denied, MML responds with one of these reasons:
MSO refused, standby system not ready—Switchover failed because the standby system was not ready.
MSO refused, warm start-up in progress—Switchover failed because start-up of the standby system was in progress.
MSO refused, Warm start-up Failed—MSO is refused and the warm start-up switchover failed.
MSO refused, System is not in active state—Switchover failed because the PGW 2200 host in not in an active state.
MSO refused, Detected standalone Flag—Switcover failed because no Standby PGW 2200 host is configured.
PGW2200 mml> rtrv-alms MGC-01 - Media Gateway Controller 2004-05-26 11:37:40.732 MEST M RTRV "lnk-1-cisco1: 2004-04-29 18:24:43.766 MEST,ALM=\"SC FAIL\",SEV=MJ" "lnk-1-cisco2: 2004-04-29 18:24:43.779 MEST,ALM=\"SC FAIL\",SEV=MJ" "lnk-2-cisco3: 2004-04-29 18:24:43.797 MEST,ALM=\"SC FAIL\",SEV=MJ"
Note: Always check with the MML rtrv-alms command the alarms that occur during the sw-over::confirm command. Do this in combination with the UNIX command tail -f platform.log under the /opt/CiscoMGC/var/log directory. Also check the error message linked to the sw-over command.
The platform.log error messages linked to this situation are:
Wed May 1 16:13:47:752 2004 MEST | ProcessManager (PID 698) <Error>GEN_ERR_HA_MSO: Cannot comply with Manual Switch Over request. Reason Warm start up failed
The Standby Warm Start alarm is set in the Active Box at the start of the Warm-Start process in IOCM.
The alarm is automatically cleared from the Active box only when the Warm-Start process successfully finishes.
In the event of a Warm-Start failure, this alarm is not cleared. If this happens, the alarm is cleared only when the Warm-Start is processed successfully at a later time.
The affect of the alarm is that a manual switch-over is denied.
This is the corrective action if the alarm does not clear:
Make sure that the pom.dataSync parameter is set to true in the Active and Standby PGW 2200.
Stop and start the Standby PGW 2200 software.
If the alarm still does not clear, open a Technical Support service request and log the platform.log under directory /opt/CiscoMGC/var/log and mml.log - alarm.log, the current PGW 2200 configuration, the previous two configuration directories (CFG_) when the alarm was seen, and platform.log from both PGW 2200 to the service request.
This is an example of a troubleshooting procedure:
Check the release notes for any items linked to this error message. These are fixed in later Cisco PGW 2200 releases.
Make sure you do not run into any corrupted patch. Verify the platform.log files at the moment the problem is reported under the /opt/CiscoMGC/var/log directory. Also check for the file messages related to UNIX error messages under the /var/adm directory.
Cisco recommends that you upgrade to the latest Cisco PGW 2200 patches.
If everything in this step is OK, proceed to step 2.
Issue the netstat -a command to see if the replication is in an Established mode (for example, Active <-> Standby).
Issue the MML prov-sync command to see if this works correctly. Also, issue a sw-over::confirm command again and check the status. The Cisco PGW 2200 uses Replication TCP port 2970,2974.
On an Active Cisco PGW 2200, run the UNIX command netstat -a | grep 29\[0-9\]\[0-9\].
On the Standby Cisco PGW 2200, run the UNIX command netstat -a | grep 29\[0-9\]\[0-9\].
For example, check the Active system to see if it is in an ESTABLISHED mode.
mgc-bru-20 mml> rtrv-ne MGC-01 - Media Gateway Controller 2004-05-28 11:03:46.236 GMT M RTRV "Type:MGC" "Hardware platform:sun4u sparc SUNW,UltraAX-i2" "Vendor:"Cisco Systems, Inc."" "Location:MGC-01 - Media Gateway Controller" "Version:"9.3(2)"" "Platform State:ACTIVE" ; mgc-bru-20 mml> mgcusr@mgc-bru-20% netstat -a | grep 29\[0-9\]\[0-9\] mgc-bru-20.2974 *.* 0 0 24576 0 LISTEN mgc-bru-20.2970 *.* 0 0 24576 0 LISTEN mgc-bru-20.37637 mgc-bru-22.2974 24820 0 24820 0 ESTABLISHED mgc-bru-20.37638 mgc-bru-22.2970 24820 0 24820 0 ESTABLISHED mgc-bru-20.telnet dhcp-peg3-cl31144-254-5-149.cisco.com.2906 65256 3 25D mgcusr@mgc-bru-20%
This example checks the Standby system for the ESTABLISHED mode.
mgc-bru-22 mml> rtrv-ne MGC-01 - Media Gateway Controller 2004-05-28 13:09:20.552 MSD M RTRV "Type:MGC" "Hardware platform:sun4u sparc SUNW,Ultra-5_10" "Vendor:"Cisco Systems, Inc."" "Location:MGC-01 - Media Gateway Controller" "Version:"9.3(2)"" "Platform State:STANDBY" ; mgc-bru-22 mml> mgcusr@mgc-bru-22% netstat -a | grep 29\[0-9\]\[0-9\] mgc-bru-22.2974 *.* 0 0 24576 0 LISTEN mgc-bru-22.2970 *.* 0 0 24576 0 LISTEN mgc-bru-22.2974 mgc-bru-20.37637 24820 0 24820 0 ESTABLISHED mgc-bru-22.2970 mgc-bru-20.37638 24820 0 24820 0 ESTABLISHED mgc-bru-22.telnet dhcp-peg3-cl31144-254-5-149.cisco.com.2910 65256 1 25D mgcusr@mgc-bru-22%
If this is OK, proceed to step 3.
Check to see if both configurations are the same on Active and Standby with the UNIX diff command.
Issue the UNIX command netstat -i to see if you do not have any increase in the counters for the Ierrs, Oerrs, and Collis values.
mgcusr@PGW2200% netstat -i Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 loopback localhost 28389215 0 28389215 0 0 0 eri0 1500 mgc-bru-20 mgc-bru-20 187731714 231 185007958 3 0 eri1 1500 mgc-bru-20b mgc-bru-20b 0 0 82 2 0 0 mgcusr@PGW2200%
Check the configuration on the Cisco PGW 2200 and create a Cisco PGW 2200 Standby file under the /opt directory. This is a temporary directory that you remove after a final check.
Use FTP to copy all the information from the Cisco PGW 2200 Active under the /opt/CiscoMGC/etc directory. Move this information over to the Cisco PGW 2200 Standby under the /opt/temp directory and the subdirectories. Be sure you have a backup of Cisco PGW 2200 Active/Standby before you do this.
Note: Only XECfgParm.dat changes during the UNIX dircmp command. You can also run the UNIX command diff.
# dircmp -d /opt/temp /opt/CiscoMGC/etc/ May 31 13:52 2004 Comparison of /opt/temp /opt/CiscoMGC/etc/ Page 1 directory . same ./accRespCat.dat same ./alarmCats.dat same ./alarmTable.dat same ./auxSigPath.dat same ./bearChan.dat same ./bearChanSwitched.dat same ./buckets.dat same ./cable.dat same ./charge.dat same ./chargeholiday.dat same ./codec.dat same ./components.dat same ./compTypes.dat same ./condRoute.dat same ./Copyright same ./crossConnect.dat same ./dependencies.dat same ./dialplan.dat same ./digitAnalysis.dat same ./dmprSink.dat same ./dns.dat same ./dpc.dat same ./extNodes.dat same ./extNodeTypes.dat same ./extProcess.dat same ./files.dat same ./gtdParam.dat same ./linkSetProtocol.dat same ./mclCallReject.dat same ./mclThreshold.dat same ./mdlProcess.dat same ./measCats.dat same ./measProfs.dat same ./mmlCommands.dat same ./percRoute.dat same ./physLineIf.dat same ./processes.dat same ./procGroups.dat same ./profileComps.dat same ./profiles.dat same ./profileTypes.dat same ./properties.dat same ./propSet.xml.dat same ./propSet.xml.dat.old.newfile same ./propSet.xml.dat.old.newfile.newfile same ./propSet.xml.dat.old.newfile.newfile.newfile same ./propVal.xsd.dat same ./routeAnalysis.bin same ./routeAnalysis.dat same ./routes.dat same ./services.dat same ./sigChanDev.dat same ./sigChanDevIp.dat same ./sigPath.dat same ./snmpmgr.dat same ./stp.dat same ./tables.dat same ./tariff.dat same ./testLine.dat same ./thresholds.dat same ./trigger.dat same ./trigger.template same ./trunkGroup.dat same ./variant.dat same ./variant.dat.old.newfile same ./variant.dat.old.newfile.newfile same ./variant.dat.old.newfile.newfile.newfile same ./version.dat different ./XECfgParm.dat
To help you troubleshoot, you also need to think about what has changed in the network around the time these issues occurred. For instance, gateway upgrades, configuration changes, any new circuits added, and so forth.
Proceed to step 4 if everything in this step is OK.
In most instances, this error message is linked to I/O channel controller (IOCC) processes that do not run or a failure on the Standby Cisco PGW 2200. If this is the case, stop and start the Cisco PGW 2200 application on Standby with the UNIX command ./CiscoMGC stop. Then restart the application with the ./CiscoMGC start command under /etc/init.d directory.
Run the MML command rtrv-softw:all on the Cisco PGW 2200 Standby host ensure that all processes correctly run.
PGW2200 mml> rtrv-softw:all MGC-01 - Media Gateway Controller 2004-05-31 13:04:21.410 MSD M RTRV "CFM-01:RUNNING STANDBY" "ALM-01:RUNNING STANDBY" "MM-01:RUNNING STANDBY" "AMDMPR-01:RUNNING STANDBY" "CDRDMPR-01:RUNNING STANDBY" "DSKM-01:RUNNING IN N/A STATE" "MMDB-01:RUNNING IN N/A STATE" "POM-01:RUNNING STANDBY" "MEASAGT:RUNNING STANDBY" "OPERSAGT:RUNNING STANDBY" "ss7-i-1:RUNNING IN N/A STATE" "mgcp-1:RUNNING IN N/A STATE" "Replic-01:RUNNING STANDBY" "ENG-01:RUNNING STANDBY" "IOCM-01:RUNNING STANDBY" "TCAP-01:RUNNING IN N/A STATE" "eisup-1:RUNNING IN N/A STATE" "FOD-01:RUNNING IN N/A STATE" "sip-1:RUNNING IN N/A STATE" ; PGW2200 mml>
If all processes show that they correctly run but still display the error message during MML command sw-over, proceed to step 5. Otherwise, check the reason for the failure.
An example is if you update and add some new SS7 trunks and run into this sw-over failure message. At that point, change the ss7-i-1 process into debug mode. This provides more details of the error message in the /opt/CiscoMGC/var/log/platform.log file. The default equals error status.
PGW2200 mml>rtrv-log:all MGC-01 - Media Gateway Controller 2004-05-31 13:10:35.376 MSD M RTRV "CFM-01:ERR" "ALM-01:ERR" "MM-01:ERR" "AMDMPR-01:ERR" "CDRDMPR-01:ERR" "DSKM-01:ERR" "MMDB-01:ERR" "POM-01:ERR" "MEASAGT:ERR" "OPERSAGT:ERR" "ss7-i-1:ERR" "mgcp-1:ERR" "Replic-01:ERR" "ENG-01:ERR" "IOCM-01:ERR" "TCAP-01:ERR" "eisup-1:ERR" "FOD-01:ERR" "sip-1:ERR" ; PGW2200 mml>
Change the ss7-i-1 process into debug mode with this MML command on the Cisco PGW 2200 Standby host.
Issue the UNIX command vi to remove the # character under the /opt/CiscoMGC/etc directory for the XECfgParm.dat file on the Standby.
ioChanMgr.logPrio = Debug foverd.logPrio = Debug
Under the /etc/init.d directory, run the commands ./CiscoMGC/stop and ./CiscoMGC/start on the Standby Cisco PGW 2200.
Issue the MML command sw-over::confirm again. Then check the MML rtrv-alms command and the UNIX command tail -f platform.log for the error message information.
Check to see if the Replication process on the Active Cisco PGW 2200 is in the Active state.
PGW2200 mml> rtrv-softw:all <snip> "Replic-01:RUNNING ACTIVE" <snip>
Collect all information and add these details to the Service Request.
If all these steps are tested/checked, you can proceed with this step since the problem can still exist on the Active Cisco PGW 2200.
During the maintenance window, you need to shutdown the active Cisco PGW 2200 with the /etc/init.d/CiscoMGC stop command.
The Standby needs to take over. However, before you perform this step, ensure that all the configuration information from the Active system (step 3) and the rtrv-tc:all command show that the status of the calls are greater than or equal to the Active Cisco PGW 2200. Also use the rtrv-softw:all command to check that all processes are in STANDBY status.
If this step fails, open a Service Request that includes all details and information related to the error message.
The Cisco Support Community is a forum for you to ask and answer questions, share suggestions, and collaborate with your peers.
Refer to Cisco Technical Tips Conventions for information on conventions used in this document.