When trying to login to the Cisco Unified Communications Manager (CUCM) Publisher server, the Database Communication Error error message appears. This document provides information how to troubleshoot this issue.
Cisco recommends that you have knowledge of these topics:
The information in this document is based on these software and hardware versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
In a Cisco Unified Communications Manager cluster that has a few subscribers and a publisher, when you try to login to the publisher after entering the username and password, the Database Communication Error error message appears.
This issue can occur when you try to login to the server after changes are made in the Publisher server, such as changing the hostname or IP address either through CLI or the OS admin page. In this case, complete these steps to resolve the issue:
Revert the changes made back to the old settings in order to let you log in.
Go to Cisco Unified CM Administration > System > Server and make the required changes for the hostname or IP address.
Make the same changes via CLI or the OS admin page.
The CUCM will restart itself.
After restarting check if services are loaded properly.
Use the utils service list CLI command in order to check if the services are up and running.
Restart the whole cluster manually again to have all the services loaded properly.
If there were no changes made to the CUCM Publisher and you still receive the Database Communications Error error message, then complete this procedure to resolve the issue:
Check the database replication status on all the Cisco Unified Communications Manager nodes (publisher and each subscriber) in the cluster to ensure that all servers replicate database changes successfully. You can check the replication state with RTMT by accessing the Database Summary and inspecting the replication status.
The Number of Replicates Created and State of Replication object provides information about the replication state on the system. Status 2 indicates that the replication is good. If you receive a status number of 3 or 4, it indicates either a broken database or that replication is not set up correctly between the publisher and subscribers.
If the Database Summary indicates that the replication on the Publisher is 2 and on the Subscribers is 3, go to the next step.
Check the DB replication using the utils dbreplication status CLI command and check what the output shows. If there are mismatched rows reported in the output file, run the utils dbreplication repair all CLI command to synchronize tables on all nodes.
Run the utils dbreplication stop command in all the subscribers and on the publisher. Then run the utils dbreplication reset all to reset replication on publisher and all subscribers. This command might take some time to complete.
After you run the commands, check in RTMT. If the replication status shows 2 for both the Publisher and the Subscribers, then the issue is resolved.
You should be able to login to the Publisher and Subscribers without any error.
Note: If the issue is not resolved after you complete the above steps, restart the DBL service.
A problem that can occur is the inability to access the Cisco Unified CM administration page of the publisher from the site because of database communication error informix not accepting any more connections. All connections are eaten by dbcef.
The utils dbreplication runtimestate does not work:
admin:utils dbreplication runtimestate
Traceback (most recent call last):
File "/usr/local/cm/bin/DbReplRTstate.py", line 578, in ?
fin = open(tfile, 'r')
IOError: [Errno 2] No such file or directory: '/var/log/active/cm/trace/dbl/sdi/getNodes'
This issue is also documented by Cisco bug ID CSCtl74037 (registered customers only) .
It appears this network's disconnects/flaps can help this issue to appear as connections between CEF and DB are not properly closed. The CEF connection leak occurs mainly in these two cases:
There is a pub-sub cluster, and the network goes down.
The CEF service is stopped on the pub. As a result, the subscribers run an algorithm to determine the host who will take over those tasks.
In order to resolve this from the CLI, issue the command:
utils service stop A Cisco DB
utils service start A Cisco DB
Stop the CEF service from serviceability page