WEM High Availability
Redundancy Installations
Systems that
rely on running Web Element Manager on a single server to manage
their networks face the possibility of service disruption should
the server fail. By using Oracle Clusterware or Symantec Veritas
software, it is now possible to create redundant Web Element Manager
servers with a primary host server running an active instance of
Web Element Manager, and a redundant server in standby mode. This
appendix provides information to help you successfully configure
redundant instances of Web Element Manager over multiple servers.
This appendix works closely with the Installing the WEM
Software and the WEM
Port and Hardware chapters in this guide.
IMPORTANT:
Oracle Clusterware
is supported on the Solaris operating system; however, Symantec
Veritas is supported on Solaris and RHEL. During the installation
process a radio button is provided to choose the required software.
Configuring High
Availability Redundancy Using Solaris Cluster Software
This section
describes the installation, configuration and upgrade procedures
for High Availabilty on servers using the Solaris OS. You should
also refer to the Solaris documentation. In any situation where
this guide appears in conflict with the oficial Solaris documentation,
the Solaris documentation shall take precedence.
System Requirements
Requirements for implementing
High Availability are as follows:
Web Element Manager
must be installed on a minimum of two Sun Netra™ T5220
servers equipped with the hardware described in the Server Hardware Requirements section
of this guide.
We recommend a cluster
installation restricted to two servers configured similar to that
shown in the diagram below. The sample configurations for Oracle
Cluster assume such an installation.
IMPORTANT:
Ensure you have installed
the latest version of Oracle Solaris software and all appropriate
software patches as described in the
Operating System Requirements
section.
IPMP is a feature supported
on Oracle Solaris. For more complete configuration information, refer
to Configuring IPMP
for WEM Server and also to the Oracle product documentation.
Oracle Solaris Cluster
is a feature provided and supported by Oracle. For more complete information
on configuring Resource Groups, refer to the Oracle Solaris Cluster
product documentation.
Installing Web
Element Manager for Failover Mode
This section
specifies the configuration changes required when installing WEM
in Failover Mode rather than Standalone Mode when following the
installation instructions in the Installing the WEM
Software chapter. For this release, please use the GUI to perform
the installation rather than the command line.
IMPORTANT:
Install and configure
Web Element Manager in Failover Mode on both servers before
configuring a cluster resource group.
The following
items are either different from, or prerequisites for, the installation
steps defined in the Installing
the WEM Software chapter:
- Create a file directory
path <ems_dir> or
use the default path: /users/ems.
- The logical hostname
and a floating IP address shared between the two nodes must be configured in /etc/hosts.
Ems-Service is used as the logical hostname in the examples in the
rest of this appendix.
- Create the global disk
path for a shared data directory, for example: /shareddir/ems-share.
IMPORTANT:
The
following options are not set
when installing in Failover Mode:
- WEM Service started
by default and monitored by Process Monitor. (See the WEM Process Monitor chapter
for more information on processes.)
- Start EMS on machine
start-up.
Creating and Configuring
a Cluster Resource Group
This section
explains how to create a Resource Group specifically for WEM servers
in this cluster and configure it appropriately.
IMPORTANT:
This
process is configured on only one server in the cluster.
It is reflected on both.
Creating a Resource
Group
The following
describe how to create a Resource Group.
We recommend the cluster
binary path is set in the shell environment as this means you can execute
the cluster commands from any directory path.
Before clsetup can
create a network resource for any logical hostname, that hostname
and a common floating IP address associated with it must be specified
in the /etc/hosts directory
on both servers. This example uses ems-service as
the logical hostname.
-
Login as root and run clsetup to
open the Main Menu.
-
From the Main Menu
select Option 2: Resource
Groups.
-
From the Resource
Groups Menu, select Option 1: Create a Resource Group.
A resource group
is a container into which you can place resources of various types,
such as network and data service resources and then manage them.
Only failover resource groups can contain network resources. A network
resource would include logical hostname.
-
When prompted to create
a failover group, enter yes and select
Option 1: Create
a Failover Group. For this example, call the group ems-rg.
-
When you are prompted
to select a preferred server enter yes and enter
the name of the Preferred server; for this example use Node1. Enter yes to continue
the update.
The screen will display
the following message:
clresourcegroup create
-n <Node-1 Node Name> <Node-2 Node Name> ems-rg
Command completed successfully.
With the Resource
Group created successfully, you can move on to the next step and
add the logical hostname.
Adding a Logical
Hostname to a Failover Resource Group
Follow steps
1 - 5 to add a logical hostname.
-
After the confirmation
screen from the last task displays, press Enter to
continue. Enter yes when
prompted to add network resources.
-
From the Network Resources
Menu, select Option 1: Add
a Logical Hostname.
If a failover resource
group contains logical hostname resources, the most common configuration is
to have one logical hostname resource for each subnet. Enter 1 to create
a single resource.
-
When prompted for
a logical hostname, enter the logical hostname configured in /etc/hosts
for the floating IP address, For this example use ems-service.
-
Press Enter to
continue. The screen displays:
clreslogicalhostname
create -g ems-rg -p R_description="LogicalHostname resource
for ems-service" ems-service
-
Enter no when prompted
to add any additional network resources.
Adding a Data Service
Resource
Follow steps
1 - 4 to add a data service.
-
After the logical
hostname confirmation screen, enter yes when
prompted to begin adding data services.
-
From the Data Services
Menu select Option 1: EMSSCFO
Server for Sun Cluster, and use the name ems-dsr for this
example.
The screen displays
the following message:
This data service uses
the "Port_list" property. The default "Port_list"
for this data service is as follows: <NULL>
-
Enter no when prompted
to override the default.
-
Enter no when prompted
to add more properties, then enter yes to continue.
The screen displays
the following message:
Commands completed
successfully
Bringing the Resource
Group Online
Follow steps
1 - 2 to bring the Resource Group online.
-
After the completion
confirmation screen, press Enter to
continue. Enter no when
prompted to add any additional data service resources. Enter yes when
prompted to manage and bring this resource group online.
The screen displays
the following message:
clresourcegroup online
-M ems-rg
Commands completed
successfully
-
Press Enter to
continue, then select Option q to Quit and
return to the Main Menu.
The process is now
complete. At this point you can enter the scstat command
to display the current online/offline status if required.
Upgrading Web Element
Manager in a Clustered Environment
This section
describes the process for upgrading Web Element Manager in a two-server
cluster.
IMPORTANT:
Network administrators
are advised that they should have any connected clients log out
at this time. If clients cannot reconnect after the upgrade, please
refer to the Troubleshooting appendix
for information on any Java-related errors.
Prerequisite Steps
for the Upgrade Process
For the example
configuration that follows you should confirm the following:
- The same version of
Web Element Manager software (12.0 or newer) has been installed
on both servers and configured in Failover Mode.
- Config files and scripts
are identical.
- Devices can be failed
over with no loss in connectivity. This can be confirmed either
by a software switchover, or by running the scstat command
to confirm the current node status.
- Resources have been configured in the Oracle Cluster software.
The following example names are used here:
- Two Cluster Nodes: N-1 (initially this is the active
node) and N-2 (initially this is the redundant node).
- A Resource Group ems-rg managed by Web Element Manager
has been created.
- A logical hostname ems-service and floating (shared)
IP address has been configured on both servers.
Removing an Inactive
Node from the Resource Group
Complete the
following steps to remove N-2 from
the Resource Group. Since the cluster resource group configuration
will be same for both nodes, the cluster-related commands can be
run on either node.
-
Run the scstat command. scstat is
used to verify the current status of the cluster resource group
and ensures that on switchover/failover the servers will
switch correctly. The following screen display reflects a properly
configured cluster:
Two cluster nodes:
Online
Two cluster transport
paths: Online
Quorum votes by node:
Online
Quorum votes by device:
Online
Resource Groups and
Resources: ems-rg, ems-service, ems-dsr
Ems-rg group N-1: Online
N-2: Offline
IPMP groups: Online
IMPORTANT:
N-2 must not be allowed
to run any WEM processes. This prevents the secondary node from
taking ownership of resources. Removing it from the Resource Group
prevents a failover from happening and N-1 continues to
behave like a standalone WEM thus ensuring a successful upgrade.
To do this:
-
Enter clsetup to
open the Main Menu and select Option 2: Resource Groups Menu.
-
From the Resource
Groups menu, select Option 8: Change the Properties
of a Resource Group.
-
Enter yes when
prompted to continue.
-
Select the group to
be changed by selecting Option 1: ems-rg.
-
Select Option 1: Change the Nodelist
Resource Group Properties
-
Enter yes when
prompted to continue. Both N-1 and N-2 should now
appear in the nodelist.
-
Select Option 2: Remove a Node from
the Nodelist, then select Option 1 to remove N-2.
The nodelist now
contains only N-1.
Enter yes when
prompted to update the nodelist property.
If your update was
successful you will receive on-screen confirmation.
Press Enter to
continue. You will receive confirmation that only N-1 remains in
the nodelist. Select Option q to Quit and
exit back to the Resource Group Menu.
-
From the Resource
Group Menu select Option s: Show Current Status to
confirm the current network resources (if confirmation is required).
Upgrading WEM on
the Inactive Server
Complete the
following steps to upgrade WEM on the inactive server, N-2.
-
Web Element Manager
begins installing along with Apache Server, PostgreSQL Server, and
EMS Server. Installation continues until the following warning message
appears:
Updating PostgreSQL
config file...This is an upgrade in Cluster mode; not updating postgres
config.
This message is normal
because the database is to be updated from the active server, N-1.
-
With the installation
complete, select Option 3 to finish.
Updating the Databases
Complete the
following steps on node N-1 to
update the databases.
-
Copy the sqlfiles.tar
file from the N-2 installation to a folder on N-1 and untar the
file. This process is described fully in the Installing the WEM Software chapter.
This will create a
folder called sqlfiles.
-
Go to the sqlfiles folder
and run dbClusterUpgrade.sh.
-
At the prompt, enter
the EMS directory name and press Enter.
-
Enter a complete directory
path for saving the new SQL files and press Enter.
-
Enter the postgres
administrator name assigned during the installation process. This
is postgres by default.
-
Press Enter.
-
Enter the database
port number and press Enter.
IMPORTANT:
Currently, this is port
5432, but this may change in a later release.
-
The databases update
and the screen displays the following message:
Database schema upgraded
successfully...
Returning the Inactive
Node to the Resource Group
Complete the
following steps to return N-2 to
the Resource Group and take over resource ownership in order to
upgrade the software on N-1.
-
Run clsetup and
then log in to access the Main Menu and select Option 2: Resource Groups.
-
From the Resource
Groups Menu select Option 8: Change the Properties
of a Resource Group.
-
Enter yes when
prompted to continue.
-
From the next screen
select the Resource Group name.
-
When prompted for
the property to change, select Option 1: Nodelist.
-
Enter yes to continue
and open the next screen.
-
Select Option 1: Add a Node/Zone
to the Top of the Nodelist.
-
Select Option 1: N-2.
-
Enter yes when
prompted to update the nodelist property.
The screen will display
the following message:
Command completed successfully.
-
Press Enter to
continue and select Option q to Quit and
return to the Resource Groups Menu.
Switching Active
Servers
Complete the
following steps to make N-2 the
active node so N-1 can
be updated.
-
From the Resource
Group Menu Select Option 5: Switch over a Resource.
-
Scroll through the
on-screen description and enter yes to continue.
-
Select the name of
the resource group. In this example it would be ems-rg.
-
Select Option 1: Switch Group Ownership.
-
Select the node to
take ownership of ems-rg, which
would be N-2. Enter yes to confirm.
The screen will display the following message:
Command completed successfully.
-
Press Enter to
continue and select Option q to Quit and
return to the Resource Group Menu.
-
From the Resource
Group Menu select Option s Show Current Status.
This shows that N-2 is now
online and N-1 is
offline.
At this point return
to Removing an Inactive
Node from the Resource Group and begin the update process
for N-1.
IMPORTANT:
Since the database
schema were previously updated and both N-1 and N-2 share the same
database, it is not necessary
to run the SQL scripts again for N-1.
High Availability
Mode Using Symantec Veritas Cluster Software (VCS)
This section
provides instructions specific to a Symantec VCS installation to
provide redundancy to multiple WEM servers. This software is documented
by Symantec, and you will also need to refer to the Install, the Uninstall, and
the Upgrade chapters in this guide. Server hardware requirements
are in theWEM Port and
Hardware Information chapter.
IMPORTANT:
Veritas Cluster is supported
on both Sun servers using the Solaris Operating System and Cisco
UCS servers using the RHEL OS. The VCS installation in this section
is directed to installments on the RHEL platform. The VCS itself
has a lot in common with the Solaris installation in the previous
chapter; however, IPMP is proprietary software and supported only
on the Solaris OS. A radio button on the installation screen allows
the choice between a Solaris or a RHEL installation; for this reason,
please use the GUI to perform the installation rather than the command
line.
IMPORTANT:
There are configuration
changes required when installing WEM in Failover Mode rather than
Standalone Mode. These are described in the Installing the WEM Software chapter.
For this release, please use the GUI to perform the installation
rather than the command line.
Installation
Refer to the relevant
documentation to install the appropriate operating system on the
servers.
Refer to the VCS documentation
for the following steps:
- Install the the Storage
Foundation and Cluster Server software.
- Configure the Diskgroup,
Volume and Mount Resource Groups for creating and mounting the shared-disk.
There is an example of a valid Main.cf configuration
below. These resources need to be online when installing the WEM
application on each node. WEM will be part of the 'Application resource'
and its status is monitored with the PID file of psmon (Monitor
server). In case of WEM application resource failure, VCS will first
try to restart WEM on the same node before switchover to the standby
node.
- Mount the shared disk
on the first cluster node.
- Start installing the WEM application on the first cluster node
using the instructions in the Installing
the WEM Software section in this guide. Make certain that
the WEM application does not start after the installation is complete.
- Unmount the shared disk
from the first cluster node and install it on the second node.
- Start the WEM installation
on the second cluster node. Make sure that you provide the same parameters
during installation that were used for the first installation.
- Refer to the VCS documentation
and start configuring resource groups and resources. A sample of
the Main.cf file
follows:
Main.cf File Configuration
Example
The following
is an example of the main.cf file for resource-groups and resources.
group wemFailover (
SystemList = { pnstextappsucs1 = 0,
pnstextappsucs3 = 1 } AutoStartList = { pnstextappsucs3 }
)
Application wemService
( StartProgram = "/users/ems/postgres//bin/emsctl
start" StopProgram = "/users/ems/postgres//bin/emsctl forcestop"
PidFiles = { "/users/ems/server/psmon.pid" }
RestartLimit = 1 )
DiskGroup wemDG (
DiskGroup = wemdg )
IP wemIP (
Device = eth0 Address = "10.4.83.151"
NetMask = "255.255.255.0"
)
Mount wemMount (
MountPoint = "/apps/wem/"
BlockDevice = "/dev/vx/dsk/wemdg/wemvol"
FSType = vxfs FsckOpt = "-y"
)
NIC wemNIC (
Device = eth0 )
Volume wemVolume (
DiskGroup = wemdg Volume = wemvol
)
- wemIP requires
wemNIC
- wemMount requires wemVolume
- wemService requires wemIP
- wemService requires wemMount
- wemVolume requires wemDG
as indicated below:
// resource
dependency tree // //
group wemFailover // {
// Application wemService //
{ // IP wemIP
// { //
NIC wemNIC // }
// Mount wemMount //
{ //
Volume wemVolume // {
// DiskGroup wemDG
// } //
} // }
// }
Upgrading WEM with
VCS
The process for upgrading
WEM installed in HA mode with VCS is similar to the Sun Cluster upgrade
process described earlier.
With VCS, in order to
disable the resource on the standby node, you have to set the resource-group's
Disable attribute for the standby node (system). This ensures that
the resource group does not failover in between the upgrade and
result in any sort of data corruption.
- Use the following command
to disable a cluster resource group: $ hagrp -disable
<resource-group name> -sys <node2>
- Use the following command
to switch the resource group from one node to another:$ hagrp -switch
<resource-group name> to <system>
Uninstalling WEM
with VCS
Use the following steps
to uninstall WEM in redundant mode.
- Disable the resource
on the standby node. This will make sure
that the resource group does not failover during the uninstall process
and result in any sort of data corruption.$ hagrp -disable
<resource group name> -sys <node2>
- Set the 'Critical' attribute
of wem-service resource to '0'$ hares -modify
<resource-name> Critical 0
- Offline the WEM Application
service resource on active node:$ hares -offline
<wem application resource name> -sys <node1> $ hagrp -disable
<resource group name> -sys <node1>
- Uninstall theWEM application
from the current active node <node1>.
- Enable the resource
on the standby node and disable it on current active node.$ hagrp -enable
<resource group name> -sys <node2>$ hagrp -disable
<resource group name> -sys <node1>
- Switch over the resource
group to Standby Node <node2>.
- Uninstall WEM from
the Standby Node <node2>.
- Disable/Offline
the resource groups from the Standby Node <node2>.