Server Failure Recovery
This chapter includes the following sections:
•Recovering from a Situation Where Three or More Servers Failed
•Replacing a Database Server
•Replacing an Administration or Call Engine Server
Recovering from a Situation Where Three or More Servers Failed
To recover from this situation, see the following tasks:
•Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role
•Enabling HA After Recovering a Database Server
Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role
For a database server to retain the primary high availability (HA) role, a minimum of four Cisco TelePresence Exchange System nodes must be online. If three nodes are offline (for example, one administration server, one call engine server, and one failed database server), the remaining database server cannot act as the primary database server. Therefore, you must use the utils service database drbd disable-ha command to allow a database server to assume the primary role. This situation allows a database server to assume the primary HA role even when it did not meet the minimum quorum of four votes. After you have four or more Cisco TelePresence Exchange System nodes available, use the utils service database drbd enable-ha command to bring the system back to the original configuration.
Procedure
Step 1 Log in to the CLI of the database server that is still working.
Step 2 Enter the utils service database status command to verify that the node has not already taken over the primary HA role.
admin: utils service database status
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node : secondary
The database vip address : 10.22.130.54
Node IP address : 10.22.130.49
Corosync status : Running PID <17337>
Current Designated Controller (DC) : ctx-db-2 - partition with quorum
MySQL status : Not running (only runs on database
server with current role primary.)
Connection Sync Status : WFConnection
Role (this-node/peer-node) : Secondary/Unknown
Disk Status (this-node/peer-node) : UpToDate/Unknown
--------------------------------------------------------------------------------
Note If the current HA role is primary, do not complete the rest of this procedure. You already have a working current primary database server. If the failed server needs to be replaced, proceed to the "Replacing a Database Server" section.
Step 3 Enter utils service database drbd disable-ha.
admin: utils service database drbd disable-ha
Disabling quorum requirement... [Done]
Step 4 Enter the utils service database status command to verify that the node takes over the primary HA role.
admin: utils service database status
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node : primary
The database vip address : 10.22.130.54
Node IP address : 10.22.130.49
Corosync status : Running PID <18030>
Current Designated Controller (DC) : ctx-db-2 - partition with quorum
MySQL status : Running pid 20445
Connection Sync Status : WFConnection
Role (this-node/peer-node) : Primary/Unknown
Disk Status (this-node/peer-node) : UpToDate/Unknown
--------------------------------------------------------------------------------
You may need to wait a few minutes for the current HA role to change to "primary" and for the MySQL database to become available (MySQL status of "Running").
Step 5 If the MySQL status continues to show the value "Not running," enter the utils service database drbd keep-node command:
admin: utils service database drbd keep-node
This command will make this node as Primary
Trying to assume primary role......... [Done]
Reconnecting to MySQL......... [Done]
What To Do Next
Determine whether or not the other database server can be recovered.
•If the server can be recovered, proceed to the "Enabling HA After Recovering a Database Server" section.
•If the server cannot be recovered, proceed to the "Replacing a Database Server" section.
Related Topics
•Command Reference
Enabling HA After Recovering a Database Server
Before You Begin
•Complete this task only if you had previously completed the procedure in the "Disabling High Availability to Enable the Current Secondary Database Server to Take Over the Primary Role" section.
•Do not complete this task for a replacement server. Instead, see the "Replacing a Database Server" section.
Caution
This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.
Procedure
Step 1 Turn off the recovered server.
Step 2 Log in to the CLI of the current primary database server.
Step 3 Enter utils service database drbd enable-ha.
admin: utils service database drbd enable-ha
Enabling quorum requirement... [Done]
Step 4 After the reboot is complete, verify that the database servers are not in split brain mode. See the "Diagnosing Split Brain Mode" section.
Related Topics
•Command Reference
Replacing a Database Server
See the following sections:
•Preparing to Replace a Database Server
•Setting Up the Replacement Database Server
•Installing the Software on the Replacement for the Initial Secondary Database Server
•Installing the Software on the Replacement for the Initial Primary Database Server
Preparing to Replace a Database Server
Procedure
Step 1 Obtain the Cisco TelePresence Exchange System installation DVD, or download the software from the following URL and burn the disk image onto a DVD: http://www.cisco.com/go/ctx-download.
Note Verify that the software version on the installation DVD is the same as the version that is currently running on the peer server of the same role. If you want to upgrade the software, you may do so after you successfully replace the failed server.
Step 2 Find your completed "Installation Worksheets," from when you installed the Cisco TelePresence Exchange System.
If you cannot find your completed worksheet, or if the information has become obsolete, gather the following information for the database server:
•Hostname, IP address, and subnet mask of the individual database server.
•Hostname, virtual IP (VIP) address, and subnet mask that are shared by both database servers.
•Default gateway.
•Administrator username and password—These are used to access the CLI on the server. You must use the same administrator username and password on all Cisco TelePresence Exchange System servers, because the administration servers also use the administrator credentials over SSH to get the status of all nodes in the server cluster.
•Security password—You must use the same security password that is defined on all of the other Cisco TelePresence Exchange System servers. The database server uses this password to authenticate data requests from the administration and call engine servers.
•Information for generating the locally significant certificate (LSC):
–Organization—typically your company name.
–Unit—typically your business unit and department.
–Location—typically the building, floor, and rack in which the server is installed.
–State and Country—where the server is located.
Use the following guidelines to determine each entry for generating LSCs:
–Refer to your company guidelines for format and entry requirements.
–Supported characters include alphanumeric, space, and the following special characters: .,-_:;{}()[]#.
–Each field supports up to 255 characters.
•IP addresses, hostnames, or pool names for external Network Time Protocol (NTP) clocking sources—You must configure the same NTP entries that are defined on all of the other Cisco TelePresence Exchange System servers.
Optionally, gather the following information for the integrated management module (IMM) interface, which enables remote control of the server:
•IP address and subnet mask
•Default gateway
•Username and password
Setting Up the Replacement Database Server
Before You Begin
Complete the procedure in the "Replacing a Database Server" section.
Procedure
Step 1 Follow the hardware installation instructions for the server to properly rack mount the server.
Also see the "Recommendations for Rack Mounting the Cisco TelePresence Exchange System and Other Solution Components" section.
Step 2 Connect the power, network, and console access cables to the server. See Cabling Requirements for the Database Servers.
Step 3 (Optional) Set up the IMM interface for remote control of the server.
See the "Setting Up the IMM" section.
Step 4 Proceed to one of the following sections, depending on the initial HA role of the database server that you are replacing:
•Installing the Software on the Replacement for the Initial Secondary Database Server
•Installing the Software on the Replacement for the Initial Primary Database Server
Step 5 Proceed to the "Verifying Data Connectivity Among the Servers" section.
Installing the Software on the Replacement for the Initial Secondary Database Server
Before You Begin
Complete the procedure in the "Setting Up the Replacement Database Server" section.
Caution
This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.
Procedure
Step 1 Install the software on the replacement server. See the "Installing the Database Server Software" section.
Note Make sure that you enter No when the installer asks whether to configure this node as the primary database server.
Step 2 Verify that the initial configured HA role of this node is secondary.
See the "Checking the Initial High-Availability Roles of the Database Servers" section.
Step 3 Log in to the CLI of the current primary database server.
Step 4 Enter utils service database drbd enable-ha.
admin: utils service database drbd enable-ha
Enabling quorum requirement... [Done]
Note This step is required only if you disabled HA by using the utils service database drbd disable-ha command.
Step 5 Enter the utils service database status command.
If the database servers are not synchronized, see the "Split Brain Recovery" chapter.
Related Topics
•Command Reference
Installing the Software on the Replacement for the Initial Primary Database Server
Before You Begin
Complete the procedure in the "Replacing a Database Server" section.
Caution
This procedure will temporarily interrupt MySQL service. Cisco recommends that you complete this task during a maintenance window. During the MySQL service interruption, new calls will not be able to connect to meetings, and users will not be able to schedule meetings.
Procedure
Step 1 Install the software on the replacement server. See the "Installing the Database Server Software" section.
Note Make sure that you enter Yes when the installer asks whether to configure this node as the primary database server.
Step 2 Verify that the initial configured HA role of this node is primary by entering utils service database status.
admin: utils service database status
--------------------------------------------------------------------------------
The initial configured HA role of this node : primary
The current HA role of this node : secondary
The database vip address : 10.22.163.218
Node IP address : 10.22.163.216
Corosync status : Running PID <23905>
Current Designated Controller (DC) : intersp-eng2 - partition with quorum
MySQL status : Not running (only runs on database
server with current role primary.)
Connection Sync Status : SyncTarget 2.7%
Role (this-node/peer-node) : Secondary/Primary
Disk Status (this-node/peer-node) : Inconsistent/UpToDate
--------------------------------------------------------------------------------
After the synchronization process is complete, the disk status value will be changed to UpToDate/UpToDate.
Step 3 Log in to the CLI of the current primary database server.
Step 4 Enter utils service database drbd enable-ha.
Note This step is required only if you disabled HA by using the utils service database drbd disable-ha command.
admin: utils service database drbd enable-ha
Enabling quorum requirement... [Done]
What to Do Next
Complete the following procedures:
•Checking the Network Connectivity of the Database Servers
•Verifying Data Connectivity Among the Servers
Related Topics
•Command Reference
Replacing an Administration or Call Engine Server
Procedure
Step 1 Obtain the Cisco TelePresence Exchange System installation DVD, or download the software from the following URL and burn the disk image onto a DVD: http://www.cisco.com/go/ctx-download.
Note Verify that the software version on the installation DVD is the same as the version that is currently running on the peer server of the same role. If you want to upgrade the software, you may do so after you successfully replace the failed server.
Step 2 Find your completed "Installation Worksheets," from when you installed the Cisco TelePresence Exchange System.
If you cannot find your completed worksheet, or if the information has become obsolete, gather the following information for the server that you need to replace:
•Hostname
•IP address and subnet mask
•Default gateway
•Administrator username and password—These are used to access the CLI on the server. You must use the same administrator username and password on all Cisco TelePresence Exchange System servers, because the administration servers also use the administrator credentials over SSH to get the status of all nodes in the server cluster.
•Security password—You must use the same security password that is defined on all of the other Cisco TelePresence Exchange System servers. The database server uses this password to authenticate data requests from the administration and call engine servers.
•Information for generating the locally significant certificate (LSC):
–Organization—typically your company name.
–Unit—typically your business unit and department.
–Location—typically the building, floor, and rack in which the server is installed.
–State and Country—where the server is located.
Use the following guidelines to determine each entry for generating LSCs:
–Refer to your company guidelines for format and entry requirements.
–Supported characters include alphanumeric, space, and the following special characters: .,-_:;{}()[]#.
–Each field supports up to 255 characters.
Optionally, gather the following information for the integrated management module (IMM) interface, which enables remote control of the server:
•IP address and subnet mask
•Default gateway
•Username and password
Step 3 Follow the hardware installation instructions for the server to properly rack mount the server.
Also see the "Recommendations for Rack Mounting the Cisco TelePresence Exchange System and Other Solution Components" section.
Step 4 Connect the power, network, and console access cables to the server.
See the "Cabling Requirements for the Administration and Call Engine Servers" section.
Step 5 (Optional) Set up the IMM interface for remote control of the server.
See the "Setting Up the IMM" section.
Step 6 Install the software by using one of the following sections:
•Installing the Cisco TelePresence Exchange System Call Engine Servers
•Installing the Cisco TelePresence Exchange System Administration Servers
Step 7 Proceed to the "Verifying Data Connectivity Among the Servers" section.