This document is designed to help you troubleshoot the Cisco PGW 2200 when you receive the 'MSO refused, Warm start-up Failed' message. This error message appears after you issue the MML command sw-over::confirm. Since warm-start is a low priority and asynchronous activity, multiple components can be in the process of warm-starting their standby peers. The alarm helps an operator know when a standby unit is ready to take over as a standby. Raise the alarm when procM sends a Make Peer Standby request to IOCM. Only clear the alarm after warm-start is successful.
Cisco recommends that you have knowledge of these topics:
The information in this document is based on these software and hardware versions:
Cisco PGW 2200 Software Releases 9.3(2) and later
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
After you issue the MML command sw-over::Confirm on the Active Cisco PGW 2200, you receive this error.
PGW2200 mml> sw-over::Confirm
MGC-01 - Media Gateway Controller 2004-05-26 11:37:37.061 MEST
/* MSO refused, Warm start-up Failed. */
Note: A "Warm Restart" is an indication that the STANDBY is ready to receive check-pointing data. This usually happen on processes like the replicator and IOCC MTP3 through the IOCM. It can be that SS7 IOCC is the reason why IOCM rejects the sw-over command. Other issues can also be the case. For this case, collect the log information with the information in this section.
When the user attempts a manual switchover (MSO) and is denied, MML responds with one of these reasons:
MSO refused, standby system not ready—Switchover failed because the standby system was not ready.
MSO refused, warm start-up in progress—Switchover failed because start-up of the standby system was in progress.
MSO refused, Warm start-up Failed—MSO is refused and the warm start-up switchover failed.
MSO refused, System is not in active state—Switchover failed because the PGW 2200 host in not in an active state.
MSO refused, Detected standalone Flag—Switcover failed because no Standby PGW 2200 host is configured.
PGW2200 mml> rtrv-alms
MGC-01 - Media Gateway Controller 2004-05-26 11:37:40.732 MEST
"lnk-1-cisco1: 2004-04-29 18:24:43.766 MEST,ALM=\"SC FAIL\",SEV=MJ"
"lnk-1-cisco2: 2004-04-29 18:24:43.779 MEST,ALM=\"SC FAIL\",SEV=MJ"
"lnk-2-cisco3: 2004-04-29 18:24:43.797 MEST,ALM=\"SC FAIL\",SEV=MJ"
Note: Always check with the MML rtrv-alms command the alarms that occur during the sw-over::confirm command. Do this in combination with the UNIX command tail -f platform.log under the /opt/CiscoMGC/var/log directory. Also check the error message linked to the sw-over command.
Wed May 1 16:13:47:752 2004 MEST | ProcessManager
(PID 698) <Error>GEN_ERR_HA_MSO: Cannot comply with Manual
Switch Over request. Reason Warm start up failed
Troubleshoot Procedure Example
The Standby Warm Start alarm is set in the Active Box at the start of the Warm-Start process in IOCM.
The alarm is automatically cleared from the Active box only when the Warm-Start process successfully finishes.
In the event of a Warm-Start failure, this alarm is not cleared. If this happens, the alarm is cleared only when the Warm-Start is processed successfully at a later time.
The affect of the alarm is that a manual switch-over is denied.
This is the corrective action if the alarm does not clear:
Make sure that the pom.dataSync parameter is set to true in the Active and Standby PGW 2200.
Stop and start the Standby PGW 2200 software.
If the alarm still does not clear, open a Technical Support service request and log the platform.log under directory /opt/CiscoMGC/var/log and mml.log - alarm.log, the current PGW 2200 configuration, the previous two configuration directories (CFG_) when the alarm was seen, and platform.log from both PGW 2200 to the service request.
This is an example of a troubleshooting procedure:
Check the release notes for any items linked to this error message. These are fixed in later Cisco PGW 2200 releases.
Make sure you do not run into any corrupted patch. Verify the platform.log files at the moment the problem is reported under the /opt/CiscoMGC/var/log directory. Also check for the file messages related to UNIX error messages under the /var/adm directory.
Cisco recommends that you upgrade to the latest Cisco PGW 2200 patches.
If everything in this step is OK, proceed to step 2.
Issue the netstat -a command to see if the replication is in an Established mode (for example, Active <-> Standby).
Issue the MML prov-sync command to see if this works correctly. Also, issue a sw-over::confirm command again and check the status. The Cisco PGW 2200 uses Replication TCP port 2970,2974.
On an Active Cisco PGW 2200, run the UNIX command netstat -a | grep 29\[0-9\]\[0-9\].
On the Standby Cisco PGW 2200, run the UNIX command netstat -a | grep 29\[0-9\]\[0-9\].
For example, check the Active system to see if it is in an ESTABLISHED mode.
Check the configuration on the Cisco PGW 2200 and create a Cisco PGW 2200 Standby file under the /opt directory. This is a temporary directory that you remove after a final check.
Use FTP to copy all the information from the Cisco PGW 2200 Active under the /opt/CiscoMGC/etc directory. Move this information over to the Cisco PGW 2200 Standby under the /opt/temp directory and the subdirectories. Be sure you have a backup of Cisco PGW 2200 Active/Standby before you do this.
Note: Only XECfgParm.dat changes during the UNIX dircmp command. You can also run the UNIX command diff.
# dircmp -d /opt/temp /opt/CiscoMGC/etc/
May 31 13:52 2004 Comparison of /opt/temp /opt/CiscoMGC/etc/ Page 1
To help you troubleshoot, you also need to think about what has changed in the network around the time these issues occurred. For instance, gateway upgrades, configuration changes, any new circuits added, and so forth.
Proceed to step 4 if everything in this step is OK.
In most instances, this error message is linked to I/O channel controller (IOCC) processes that do not run or a failure on the Standby Cisco PGW 2200. If this is the case, stop and start the Cisco PGW 2200 application on Standby with the UNIX command ./CiscoMGC stop. Then restart the application with the ./CiscoMGC start command under /etc/init.d directory.
Run the MML command rtrv-softw:all on the Cisco PGW 2200 Standby host ensure that all processes correctly run.
PGW2200 mml> rtrv-softw:all
MGC-01 - Media Gateway Controller 2004-05-31 13:04:21.410 MSD
"DSKM-01:RUNNING IN N/A STATE"
"MMDB-01:RUNNING IN N/A STATE"
"ss7-i-1:RUNNING IN N/A STATE"
"mgcp-1:RUNNING IN N/A STATE"
"TCAP-01:RUNNING IN N/A STATE"
"eisup-1:RUNNING IN N/A STATE"
"FOD-01:RUNNING IN N/A STATE"
"sip-1:RUNNING IN N/A STATE"
If all processes show that they correctly run but still display the error message during MML command sw-over, proceed to step 5. Otherwise, check the reason for the failure.
An example is if you update and add some new SS7 trunks and run into this sw-over failure message. At that point, change the ss7-i-1 process into debug mode. This provides more details of the error message in the /opt/CiscoMGC/var/log/platform.log file. The default equals error status.
If all these steps are tested/checked, you can proceed with this step since the problem can still exist on the Active Cisco PGW 2200.
During the maintenance window, you need to shutdown the active Cisco PGW 2200 with the /etc/init.d/CiscoMGC stop command.
The Standby needs to take over. However, before you perform this step, ensure that all the configuration information from the Active system (step 3) and the rtrv-tc:all command show that the status of the calls are greater than or equal to the Active Cisco PGW 2200. Also use the rtrv-softw:all command to check that all processes are in STANDBY status.
If this step fails, open a Service Request that includes all details and information related to the error message.