This document describes how to troubleshoot issues related to this error : " Database Communication Error " while CUCM page is accessed.
Cisco recommends that you have knowledge of this topic:
Cisco Unified Communications Manager (CUCM) version 11.5
The information in this document is based on CCM version 11.5
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
This document helps you to understand the scneario and TAC techniques to troubleshoot when you get Database Communication error while the CUCM GUI page is accessed. This message indicates that there is a problem with the A Cisco DB service, or it could be related to the ODBC driver, but, this document deals with all what the user can check and a little of what TAC checks when the A Cisco DB service does not work as expected.
One of the biggest causes of this can be an unexpected shutdown of the system. Ungraceful shutdown on Linux OS can result in corruption of files which get closed abruptly when the system shuts down. When this happens, there is a series of files that needs to be closed gracefully. These files might then be needed by the system to complete the boot up process later.
Other causes can be a change in the FQDN, A change from IP address to FQDN or vice-versa without the proper procedure.
When the above issues arise, there are some action items that should be followed in a bid to save the system. Saving the System is mentioned because more often than not, if any particular service in Linux is not starting properly (stuck in starting or stopped state), then it might be a problem in the daemon/process responsible to start that particular service. It can only be corrected as the server is rebuild.
Procedure to Troubleshoot
Step 1. Sanity Check of the System.
Make use of utils diagnose test and show status command outputs to see if there are any other errors being thrown up so that further actions can be planned accordingly. For example, ensure that the active partition is not 100% filled through show status. If this is not true, then it needs to troubleshot before you resolve other problems.
Host Name : CUCM11
Date : Wed Jul 25, 2018 00:10:07
Time Zone : India Standard Time (Asia/Kolkata)
Locale : en_US.UTF-8
Product Ver : 184.108.40.20645-1
Unified OS Version : 220.127.116.11-2
00:10:09 up 48 days, 10:56, 1 user, load average: 0.17, 0.29, 0.27
CPU Idle: 97.74% System: 01.26% User: 01.00%
IOWAIT: 00.00% IRQ: 00.00% Soft: 00.00%
Memory Total: 3925432K
Total Free Used
Disk/active 14154228K 1154116K 12854984K (92%)
Disk/inactive 14154228K 1195212K 12813888K (92%)
Disk/logging 49573612K 3454524K 43594160K (93%)
Step 2. Restart the Service.
utils service restart A Cisco DB - restart the service through CLI.
admin:utils service restart A Cisco DB
Do not press Ctrl+C while the service is restarting. If the service has not restarted properly, execute the same command again.
Service Manager is running
A Cisco DB[STOPPING]
A Cisco DB[STARTING]
A Cisco DB[STARTED]
Step 3. Check hosts, rhosts and sqlhosts Files.
Although only hosts files can be matched through normal CLI of the server (remember, GUI is not accessible for you to go to the reporting page), make use of show tech network hosts command to match the entries in all servers of the cluster. If there is a mismatch in any server, you can restart the Cluster Manager service once it tries correct them.
admin:show tech network hosts
-------------------- show platform network --------------------
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed from the cluster.
10.106.112.122 cucmsub.emea.lab cucmsub
10.106.112.123 imnp10.emea.lab imnp10
10.106.112.126 CUCM-10.emea.lab CUCM-10
Step 4. Check the Files from Root.
This and the steps after it are only followed by TAC after getting root account access to your system. controlcentre.sh script is used in order to restart the service once from the shell.
From the locations /home/informix/.rhosts and $INFORMIXDIR/etc/sqlhosts, then the files are manually to match in all servers. After this, restart the Cluster Manager service to update the details in any file that might be needed during bootup.
Step 5. Check Informix.
Informix is the process responsible for the A Cisco DB service and it should show as on-line when the root user switches as informix and checks the status.
Note: All these steps, once checked, can help to bring the service back up if and only if the issue was either because of a mismatch in the host/rhosts file or informix stuck temporarily. As mentioned earlier, there can be many other reasons that could have caused these mismatches. The document above highlights the steps that be checked one by one just to narrow down where the problem might be.
In most of the stuation we need to rebuild the nodes if we are not able to restart the service from root of if the system files are corrupt.