Installation and Administration Guide for the Cisco TelePresence Exchange System Release 1.1
Server Failure Recovery
Downloads: This chapterpdf (PDF - 126.0KB) The complete bookPDF (PDF - 3.66MB) | Feedback

Server Failure Recovery

Table Of Contents

Server Failure Recovery

Recovering from a Situation Where Three or More Servers Failed

Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role

Enabling HA After Recovering a Database Server

Replacing a Database Server

Preparing to Replace a Database Server

Setting Up the Replacement Database Server

Installing the Software on the Replacement for the Initial Secondary Database Server

Installing the Software on the Replacement for the Initial Primary Database Server

Replacing an Administration or Call Engine Server


Server Failure Recovery


This chapter includes the following sections:

Recovering from a Situation Where Three or More Servers Failed

Replacing a Database Server

Replacing an Administration or Call Engine Server

Recovering from a Situation Where Three or More Servers Failed

To recover from this situation, see the following tasks:

Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role

Enabling HA After Recovering a Database Server

Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role

For a database server to retain the primary high availability (HA) role, a minimum of four Cisco TelePresence Exchange System nodes must be online. If three nodes are offline (for example, one administration server, one call engine server, and one failed database server), the remaining database server cannot act as the primary database server. Therefore, you must use the utils service database drbd disable-ha command to allow a database server to assume the primary role. This situation allows a database server to assume the primary HA role even when it did not meet the minimum quorum of four votes. After you have four or more Cisco TelePresence Exchange System nodes available, use the utils service database drbd enable-ha command to bring the system back to the original configuration.

Procedure


Step 1 Log in to the CLI of the database server that is still working.

Step 2 Enter the utils service database status command to verify that the node has not already taken over the primary HA role.

admin: utils service database status 
--------------------------------------------------------------------------------
The initial configured HA role of this node      : primary
The current HA role of this node                 : secondary 
The database vip address                         : 10.22.130.54
Node name                                        : ctx-db-1
Node IP address                                  : 10.22.130.49
Corosync status                                  : Running PID <17337>
Current Designated Controller (DC)               : ctx-db-2 - partition with quorum
MySQL status                                     : Not running (only runs on database 
server with current role primary.)
Connection Sync Status                           : WFConnection
Role (this-node/peer-node)                       : Secondary/Unknown
Disk Status (this-node/peer-node)                : UpToDate/Unknown
--------------------------------------------------------------------------------

Note If the current HA role is primary, do not complete the rest of this procedure. You already have a working current primary database server. If the failed server needs to be replaced, proceed to the "Replacing a Database Server" section.


Step 3 Enter utils service database drbd disable-ha.

admin: utils service database drbd disable-ha 
Disabling quorum requirement... [Done]
 
   

Step 4 Enter the utils service database status command to verify that the node takes over the primary HA role.

admin: utils service database status 
--------------------------------------------------------------------------------
The initial configured HA role of this node      : primary
The current HA role of this node                 : primary 
The database vip address                         : 10.22.130.54
Node name                                        : ctx-db-1
Node IP address                                  : 10.22.130.49
Corosync status                                  : Running PID <18030>
Current Designated Controller (DC)               : ctx-db-2 - partition with quorum
MySQL status                                     : Running pid 20445
Connection Sync Status                           : WFConnection
Role (this-node/peer-node)                       : Primary/Unknown
Disk Status (this-node/peer-node)                : UpToDate/Unknown
--------------------------------------------------------------------------------
 
   

You may need to wait a few minutes for the current HA role to change to "primary" and for the MySQL database to become available (MySQL status of "Running").

Step 5 If the MySQL status continues to show the value "Not running," enter the utils service database drbd keep-node command:

admin: utils service database drbd keep-node 
This command will make this node as Primary
Trying to assume primary role......... [Done]
Reconnecting to MySQL......... [Done]
 
   

What To Do Next

Determine whether or not the other database server can be recovered.

If the server can be recovered, proceed to the "Enabling HA After Recovering a Database Server" section.

If the server cannot be recovered, proceed to the "Replacing a Database Server" section.

Related Topics

Command Reference

Enabling HA After Recovering a Database Server

Before You Begin

Complete this task only if you had previously completed the procedure in the "Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role" section.

Do not complete this task for a replacement server. Instead, see the "Replacing a Database Server" section.


Caution This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.

Procedure


Step 1 Turn off the recovered server.

Step 2 Log in to the CLI of the current primary database server.

Step 3 Enter utils service database drbd enable-ha.

admin: utils service database drbd enable-ha 
Enabling quorum requirement... [Done]
 
   

Step 4 After the reboot is complete, verify that the database servers are not in split brain mode. See the "Diagnosing Split Brain Mode" section.


Related Topics

Command Reference

Replacing a Database Server

See the following sections:

Preparing to Replace a Database Server

Setting Up the Replacement Database Server

Installing the Software on the Replacement for the Initial Secondary Database Server

Installing the Software on the Replacement for the Initial Primary Database Server

Preparing to Replace a Database Server

Procedure


Step 1 Obtain the Cisco TelePresence Exchange System installation DVD, or download the software from the following URL and burn the disk image onto a DVD: http://www.cisco.com/go/ctx-download.


Note Verify that the software version on the installation DVD is the same as the version that is currently running on the peer server of the same role. If you want to upgrade the software, you may do so after you successfully replace the failed server.


Step 2 Find your completed "Installation Worksheets," from when you installed the Cisco TelePresence Exchange System.

If you cannot find your completed worksheet, or if the information has become obsolete, gather the following information for the database server:

Hostname, IP address, and subnet mask of the individual database server.

Hostname, virtual IP (VIP) address, and subnet mask that are shared by both database servers.

Default gateway.

Administrator username and password—These are used to access the CLI on the server. You must use the same administrator username and password on all Cisco TelePresence Exchange System servers, because the administration servers also use the administrator credentials over SSH to get the status of all nodes in the server cluster.

Security password—You must use the same security password that is defined on all of the other Cisco TelePresence Exchange System servers. The database server uses this password to authenticate data requests from the administration and call engine servers.

Information for generating the locally significant certificate (LSC):

Organization—typically your company name.

Unit—typically your business unit and department.

Location—typically the building, floor, and rack in which the server is installed.

State and Country—where the server is located.

Use the following guidelines to determine each entry for generating LSCs:

Refer to your company guidelines for format and entry requirements.

Supported characters include alphanumeric, space, and the following special characters: .,-_:;{}()[]#.

Each field supports up to 255 characters.

IP addresses, hostnames, or pool names for external Network Time Protocol (NTP) clocking sources—You must configure the same NTP entries that are defined on all of the other Cisco TelePresence Exchange System servers.

Optionally, gather the following information for the integrated management module (IMM) interface, which enables remote control of the server:

IP address and subnet mask

Default gateway

Username and password


Setting Up the Replacement Database Server

Before You Begin

Complete the procedure in the "Replacing a Database Server" section.

Procedure


Step 1 Follow the hardware installation instructions for the server to properly rack mount the server.

Also see the "Recommendations for Rack Mounting the Cisco TelePresence Exchange System and Other Solution Components" section.

Step 2 Connect the power, network, and console access cables to the server. See Cabling Requirements for the Database Servers.

Step 3 (Optional) Set up the IMM interface for remote control of the server.

See the "Setting Up the IMM" section.

Step 4 Proceed to one of the following sections, depending on the initial HA role of the database server that you are replacing:

Installing the Software on the Replacement for the Initial Secondary Database Server

Installing the Software on the Replacement for the Initial Primary Database Server

Step 5 Proceed to the "Verifying Data Connectivity Among the Servers" section.


Installing the Software on the Replacement for the Initial Secondary Database Server

Before You Begin

Complete the procedure in the "Setting Up the Replacement Database Server" section.


Caution This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.

Procedure


Step 1 Install the software on the replacement server. See the "Installing the Database Server Software" section.


Note Make sure that you enter No when the installer asks whether to configure this node as the primary database server.


Step 2 Verify that the initial configured HA role of this node is secondary.

See the "Checking the Initial High-Availability Roles of the Database Servers" section.

Step 3 Log in to the CLI of the current primary database server.

Step 4 Enter utils service database drbd enable-ha.

admin: utils service database drbd enable-ha 
Enabling quorum requirement... [Done]
 
   

Note This step is required only if you disabled HA by using the utils service database drbd disable-ha command.


Step 5 Enter the utils service database status command.

If the database servers are not synchronized, see the "Split Brain Recovery" chapter.


Related Topics

Command Reference

Installing the Software on the Replacement for the Initial Primary Database Server

Before You Begin

Complete the procedure in the "Replacing a Database Server" section.


Caution This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.

Procedure


Step 1 Install the software on the replacement server. See the "Installing the Database Server Software" section.


Note Make sure that you enter Yes when the installer asks whether to configure this node as the primary database server.


Step 2 Verify that the initial configured HA role of this node is primary by entering utils service database status.

admin: utils service database status 
--------------------------------------------------------------------------------
The initial configured HA role of this node      : primary
The current HA role of this node                 : secondary
The database vip address                         : 10.22.163.218
Node name                                        : intersp-db1
Node IP address                                  : 10.22.163.216
Corosync status                                  : Running PID <23905>
Current Designated Controller (DC)               : intersp-eng2 - partition with quorum
MySQL status                                     : Not running (only runs on database 
server with current role primary.)
Connection Sync Status                           : SyncTarget 2.7%
Role (this-node/peer-node)                       : Secondary/Primary
Disk Status (this-node/peer-node)                : Inconsistent/UpToDate
--------------------------------------------------------------------------------
 
   

After the synchronization process is complete, the disk status value will be changed to UpToDate/UpToDate.

Step 3 Log in to the CLI of the current primary database server.

Step 4 Enter utils service database drbd enable-ha.


Note This step is required only if you disabled HA by using the utils service database drbd disable-ha command.


admin: utils service database drbd enable-ha 
Enabling quorum requirement... [Done]
 
   

What to Do Next

Complete the following procedures:

Checking the Network Connectivity of the Database Servers

Verifying Data Connectivity Among the Servers

Related Topics

Command Reference

Replacing an Administration or Call Engine Server

Procedure


Step 1 Obtain the Cisco TelePresence Exchange System installation DVD, or download the software from the following URL and burn the disk image onto a DVD: http://www.cisco.com/go/ctx-download.


Note Verify that the software version on the installation DVD is the same as the version that is currently running on the peer server of the same role. If you want to upgrade the software, you may do so after you successfully replace the failed server.


Step 2 Find your completed "Installation Worksheets," from when you installed the Cisco TelePresence Exchange System.

If you cannot find your completed worksheet, or if the information has become obsolete, gather the following information for the server that you need to replace:

Hostname

IP address and subnet mask

Default gateway

Administrator username and password—These are used to access the CLI on the server. You must use the same administrator username and password on all Cisco TelePresence Exchange System servers, because the administration servers also use the administrator credentials over SSH to get the status of all nodes in the server cluster.

Security password—You must use the same security password that is defined on all of the other Cisco TelePresence Exchange System servers. The database server uses this password to authenticate data requests from the administration and call engine servers.

Information for generating the locally significant certificate (LSC):

Organization—typically your company name.

Unit—typically your business unit and department.

Location—typically the building, floor, and rack in which the server is installed.

State and Country—where the server is located.

Use the following guidelines to determine each entry for generating LSCs:

Refer to your company guidelines for format and entry requirements.

Supported characters include alphanumeric, space, and the following special characters: .,-_:;{}()[]#.

Each field supports up to 255 characters.

Optionally, gather the following information for the integrated management module (IMM) interface, which enables remote control of the server:

IP address and subnet mask

Default gateway

Username and password

Step 3 Follow the hardware installation instructions for the server to properly rack mount the server.

Also see the "Recommendations for Rack Mounting the Cisco TelePresence Exchange System and Other Solution Components" section.

Step 4 Connect the power, network, and console access cables to the server.

See the "Cabling Requirements for the Administration and Call Engine Servers" section.

Step 5 (Optional) Set up the IMM interface for remote control of the server.

See the "Setting Up the IMM" section.

Step 6 Install the software by using one of the following sections:

Installing the Cisco TelePresence Exchange System Call Engine Servers

Installing the Cisco TelePresence Exchange System Administration Servers

Step 7 Proceed to the "Verifying Data Connectivity Among the Servers" section.