Server Failure Recovery
This chapter includes the following sections:
•Recovering from a Failed Primary Database Server
•Replacing a Database Server
•Replacing an Administration or Call Engine Server
Recovering from a Failed Primary Database Server
If the current primary database server and its integrated management module (IMM) lose power or otherwise fail, the current secondary server cannot automatically take over the primary role. Under these conditions, all calls to or from the system fail, and meetings cannot be scheduled or modified.
To avoid this situation, see the "Power Recommendations for High Availability of the Database Servers" section on page 4-3.
To recover from this situation, see the following tasks:
•Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role
•Enabling HA After Recovering a Database Server
Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role
The high availability (HA) implementation requires access to the IMM of the failed node to ensure that the node is no longer accessing the Distributed Replicated Block Device (DRBD) disk, which is a shared resource, before allowing a role transfer. Therefore, if the IMM interface of the current primary database server becomes unavailable, you need to complete the following procedure to manually enable the current secondary database server to take over the primary role.
Procedure
Step 1 Log in to the CLI of the database server that is still working.
Step 2 Enter the utils service database status command to verify that the node has not already taken over the primary HA role.
admin: utils service database status
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node : secondary
The database vip address : 10.22.130.54
The database primary node name : ctx-db-1
The database primary node IP address : 10.22.130.49
The database secondary node name : ctx-db-2
The database secondary node IP address : 10.22.130.57
Mon status : Not running (only runs on primary)
MySQL status : Not running (only runs on primary)
Heartbeat status : Running pid 17337
--------------------------------------------------------------------------------
drbd driver loaded OK; device status:
version: 8.3.2 (api:88/proto:86-90)
m:res cs ro ds p mounted fstype
0:mysql WFConnection Secondary/Unknown UpToDate/DUnknown C
--------------------------------------------------------------------------------
Note If the current HA role is primary, do not complete the rest of this procedure. You already have a working current primary database server. If the failed server needs to be replaced, proceed to the "Replacing a Database Server" section.
Step 3 Enter utils service database drbd disable-ha.
admin: utils service database drbd disable-ha
Step 4 Enter the utils service database status command to verify that the node takes over the primary HA role.
admin: utils service database status
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node : primary
The database vip address : 10.22.130.54
The database primary node name : ctx-db-1
The database primary node IP address : 10.22.130.49
The database secondary node name : ctx-db-2
The database secondary node IP address : 10.22.130.57
Mon status : Running pid 20494
MySQL status : Running pid 20445
Heartbeat status : Running pid 18030
--------------------------------------------------------------------------------
drbd driver loaded OK; device status:
version: 8.3.2 (api:88/proto:86-90)
m:res cs ro ds p mounted fstype
0:mysql WFConnection Primary/Unknown UpToDate/DUnknown C /mnt/mysql ext3
--------------------------------------------------------------------------------
You may need to wait a few minutes for the current HA role to change to "primary" and for the MySQL database to become available (MySQL status of "Running").
Step 5 If the MySQL status continues to show the value "Not running," enter the utils service database drbd keep-node command:
admin: utils service database drbd keep-node
This command will make this node as Primary
Trying to assume primary role......... [Done]
Reconnecting to MySQL......... [Done]
What To Do Next
Determine whether or not the other database server can be recovered, for example, by reconnecting its power cable or fixing its power source.
•If the server can be recovered, proceed to the "Enabling HA After Recovering a Database Server" section.
•If the server cannot be recovered, proceed to the "Replacing a Database Server" section.
Related Topics
•Command Reference, page C-1
Enabling HA After Recovering a Database Server
Before You Begin
•Complete this task only if you had previously completed the procedure in the "Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role" section.
•Do not complete this task for a replacement server. Instead, see the "Replacing a Database Server" section.
Caution
This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.
Procedure
Step 1 Turn off the recovered server.
Step 2 Log in to the CLI of the current primary database server.
Step 3 Enter utils service database drbd enable-ha.
admin: utils service database drbd enable-ha
Stopping mon daemon: [ OK ]
Shutting down MySQL. SUCCESS!
Unmounting DRBD Volume...
Entering DRBD Secondary mode...
Step 4 Turn on the recovered server but do not take any further actions on that server.
After the IMM becomes available, the HA implementation will automatically set up the peer communications and reboot the recovered node.
Step 5 After the reboot is complete, verify that the database servers are not in split brain mode. See the "Diagnosing Split Brain Mode" section on page 30-1.
Related Topics
•Command Reference, page C-1
Replacing a Database Server
See the following sections:
•Preparing to Replace a Database Server
•Setting Up the Replacement Database Server
•Installing the Software on and Synchronizing the Replacement for the Initial Secondary Database Server
•Installing the Software on and Synchronizing the Replacement for the Initial Primary Database Server
Preparing to Replace a Database Server
Procedure
Step 1 Obtain the Cisco TelePresence Exchange System installation DVD, or download the software from the following URL and burn the disk image onto a DVD: http://www.cisco.com/go/ctx-download.
Note Make sure that the software version on the installation DVD is the same as the version that is currently running on the peer server of the same role. If you want to upgrade the software, you may do so after you successfully replace the failed server.
Step 2 Find your completed Appendix A, "Installation Worksheets," from when you installed the Cisco TelePresence Exchange System.
If you cannot find your completed worksheet, or if the information has become obsolete, gather the following information for the database server:
•Hostname, IP address, and subnet mask of the individual database server.
•Hostname, virtual IP (VIP) address, and subnet mask that are shared by both database servers.
•Default gateway.
•Administrator username and password—These are used to access the CLI on the server. To simplify management, Cisco recommends that you use the same username and password on all Cisco TelePresence Exchange System servers.
•Security password—You must use the same security password that is defined on all of the other Cisco TelePresence Exchange System servers. The database server uses this password to authenticate data requests from the administration and call engine servers.
•Network and access information for the integrated management module (IMM) interface, which is required to implement active/standby redundancy for the database servers, and which enables remote control of the individual database server:
–IP address and subnet mask.
–Default gateway.
–Username and password.
•Information for generating the locally significant certificate (LSC):
–Organization—typically your company name.
–Unit—typically your business unit and department.
–Location—typically the building, floor, and rack in which the server is installed.
–State and Country—where the server is located.
Use the following guidelines to determine each entry for generating LSCs:
–Refer to your company guidelines for format and entry requirements.
–Supported characters include alphanumeric, space, and the following special characters: .,-_:;{}()[]#.
–Each field supports up to 255 characters.
•IP addresses, hostnames, or pool names for external Network Time Protocol (NTP) clocking sources—You must configure the same NTP entries that are defined on all of the other Cisco TelePresence Exchange System servers.
•(Optional) Domain Name System (DNS) information:
–IP address of a primary DNS server.
–(Optional) IP address of a secondary DNS server.
–Domain name.
Setting Up the Replacement Database Server
Before You Begin
Complete the procedure in the "Replacing a Database Server" section.
Procedure
Step 1 Follow the hardware installation instructions for the server to properly rack mount the server.
Also see the "Recommendations for Rack Mounting the Cisco TelePresence Exchange System and Other Solution Components" section on page 4-2.
Step 2 Connect the power, network, and console access cables to the server. See the following sections:
•Power Recommendations for High Availability of the Database Servers, page 4-3
•Cabling Requirements for the Database Servers, page 4-3
Step 3 Set up the IMM interface, which is required to implement active/standby redundancy for the database servers. See the "Setting Up the IMM" section on page 4-7.
Step 4 Proceed to one of the following sections, depending on the initial HA role of the database server that you are replacing:
•Installing the Software on and Synchronizing the Replacement for the Initial Secondary Database Server
•Installing the Software on and Synchronizing the Replacement for the Initial Primary Database Server
Installing the Software on and Synchronizing the Replacement for the Initial Secondary Database Server
Before You Begin
Complete the procedure in the "Setting Up the Replacement Database Server" section.
Caution
This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.
Procedure
Step 1 Install the software on the replacement server. See the "Installing the Database Server Software" section on page 5-4.
Note Make sure that you enter No when the installer asks whether to configure this node as the primary database server.
Step 2 Verify that the initial configured HA role of this node is secondary.
See the "Checking the Initial High-Availability Role of the Database Servers" section on page 5-8.
Step 3 Turn off the replacement server.
Step 4 Log in to the CLI of the current primary database server.
Step 5 Enter utils service database drbd enable-ha.
admin: utils service database drbd enable-ha
Stopping mon daemon: [ OK ]
Shutting down MySQL. SUCCESS!
Unmounting DRBD Volume...
Entering DRBD Secondary mode...
Step 6 Turn on the replacement server.
Step 7 Log in to the CLI of the replacement server.
Step 8 Enter utils service database sync.
Step 9 Log in to the CLI of the current primary database server.
Step 10 Enter utils service database drbd keep-node.
admin: utils service database drbd keep-node
This command will make this node as Primary
Trying to assume primary role......... [Done]
Reconnecting to MySQL......... [Done]
Step 11 Proceed to the "Verifying Synchronization and Network Connectivity of the Database Servers" section on page 5-12.
Related Topics
•Command Reference, page C-1
Installing the Software on and Synchronizing the Replacement for the Initial Primary Database Server
Before You Begin
Complete the procedure in the "Replacing a Database Server" section.
Caution
This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.
Procedure
Step 1 Install the software on the replacement server. See the "Installing the Database Server Software" section on page 5-4.
Note Make sure that you enter Yes when the installer asks whether to configure this node as the primary database server.
Step 2 Verify that the initial configured HA role of this node is primary by entering utils service database status.
admin: utils service database status
Unable to run CLI as root due to unsuccessful service drbd status!
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node :
The database vip address : 10.22.130.54
The database primary node name : ctx-db-1
The database primary node IP address : 10.22.130.49
The database secondary node name : ctx-db-2
The database secondary node IP address : 10.22.130.57
Unable to run CLI as root due to unsuccessful service heartbeat status!
Mon status : Not running (only runs on primary)
MySQL status : Not running (only runs on primary)
Heartbeat status : Not running
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Executed command unsuccessfully
Step 3 Turn off the replacement server.
Step 4 Log in to the CLI of the current primary database server.
Step 5 Enter utils service database drbd enable-ha.
admin: utils service database drbd enable-ha
Stopping mon daemon: [ OK ]
Shutting down MySQL. SUCCESS!
Unmounting DRBD Volume...
Entering DRBD Secondary mode...
After a few minutes, the HA implementation on the current primary server should automatically turn on the replacement server.
Step 6 Turn on the replacement server if it does not automatically come up within several minutes.
Step 7 Log in to the CLI of the replacement server.
Step 8 Enter utils service database drbd replace-primary.
admin: utils service database drbd replace-primary
Setting up DRBD Disk
..........................................................................................
.........
initializing activity log
New drbd meta data block successfully created.
Starting DRBD resources: [ d(mysql) s(mysql) n(mysql) ].
Starting High-Availability services:
Step 9 Verify that the replacement server currently has the secondary HA role of by entering utils service database status.
admin: utils service database status
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node : secondary
The database vip address : 10.22.130.54
The database primary node name : ctx-db-1
The database primary node IP address : 10.22.130.49
The database secondary node name : ctx-db-2
The database secondary node IP address : 10.22.130.57
Mon status : Not running (only runs on primary)
MySQL status : Not running (only runs on primary)
Heartbeat status : Running pid 19094
--------------------------------------------------------------------------------
drbd driver loaded OK; device status:
version: 8.3.2 (api:88/proto:86-90)
m:res cs ro ds p mounted fstype
... sync'ed: 1.2% (45556/46080)M
0:mysql SyncTarget Secondary/Primary Inconsistent/UpToDate C
--------------------------------------------------------------------------------
Step 10 Log in to the CLI of the current primary database server.
Step 11 Enter utils service database drbd keep-node.
admin: utils service database drbd keep-node
This command will make this node as Primary
Trying to assume primary role......... [Done]
Reconnecting to MySQL......... [Done]
The database servers will automatically begin the synchronization process.
What to Do Next
Complete the following procedures:
•Verifying Synchronization and Network Connectivity of the Database Servers, page 5-12
•Verifying Data Connectivity Among the Servers, page 5-22
Related Topics
•Command Reference, page C-1
Replacing an Administration or Call Engine Server
Procedure
Step 1 Obtain the Cisco TelePresence Exchange System installation DVD, or download the software from the following URL and burn the disk image onto a DVD: http://www.cisco.com/go/ctx-download.
Note Make sure that the software version on the installation DVD is the same as the version that is currently running on the peer server of the same role. If you want to upgrade the software, you may do so after you successfully replace the failed server.
Step 2 Find your completed Appendix A, "Installation Worksheets," from when you installed the Cisco TelePresence Exchange System.
If you cannot find your completed worksheet, or if the information has become obsolete, gather the following information for the server that you need to replace:
•Hostname
•IP address and subnet mask
•Default gateway
•Administrator username and password—These are used to access the CLI on the server. To simplify management, Cisco recommends that you use the same username and password on all Cisco TelePresence Exchange System servers.
•Security password—You must use the same security password that is defined on all of the other Cisco TelePresence Exchange System servers. The database server uses this password to authenticate data requests from the administration and call engine servers.
•Information for generating the locally significant certificate (LSC):
–Organization—typically your company name.
–Unit—typically your business unit and department.
–Location—typically the building, floor, and rack in which the server is installed.
–State and Country—where the server is located.
Use the following guidelines to determine each entry for generating LSCs:
–Refer to your company guidelines for format and entry requirements.
–Supported characters include alphanumeric, space, and the following special characters: .,-_:;{}()[]#.
–Each field supports up to 255 characters.
Optionally, gather the following information for the integrated management module (IMM) interface, which enables remote control of the server:
•IP address and subnet mask
•Default gateway
•Username and password
Step 3 Follow the hardware installation instructions for the server to properly rack mount the server.
Also see the "Recommendations for Rack Mounting the Cisco TelePresence Exchange System and Other Solution Components" section on page 4-2.
Step 4 Connect the power, network, and console access cables to the server.
See the "Cabling Requirements for the Administration and Call Engine Servers" section on page 4-4.
Step 5 (Optional) Set up the IMM interface for remote control of the server.
See the "Setting Up the IMM" section on page 4-7.
Step 6 Install the software by using one of the following sections:
•Installing the Cisco TelePresence Exchange System Call Engine Servers, page 5-13
•Installing the Cisco TelePresence Exchange System Administration Servers, page 5-18
Step 7 Proceed to the "Verifying Data Connectivity Among the Servers" section on page 5-22.