WEM High Availability Redundancy Installations

Systems that rely on running Web Element Manager on a single server to manage their networks face the possibility of service disruption should the server fail. By using Oracle Clusterware or Symantec Veritas software, it is now possible to create redundant Web Element Manager servers with a primary host server running an active instance of Web Element Manager, and a redundant server in standby mode. This appendix provides information to help you successfully configure redundant instances of Web Element Manager over multiple servers. This appendix works closely with the Installing the WEM Software and the WEM Port and Hardware chapters in this guide.

IMPORTANT:

Oracle Clusterware is supported on the Solaris operating system; however, Symantec Veritas is supported on Solaris and RHEL. During the installation process a radio button is provided to choose the required software.

Configuring High Availability Redundancy Using Solaris Cluster Software

This section describes the installation, configuration and upgrade procedures for High Availabilty on servers using the Solaris OS. You should also refer to the Solaris documentation. In any situation where this guide appears in conflict with the oficial Solaris documentation, the Solaris documentation shall take precedence.

System Requirements

Requirements for implementing High Availability are as follows:

Web Element Manager must be installed on a minimum of two Sun Netra™ T5220 servers equipped with the hardware described in the Server Hardware Requirements section of this guide.

We recommend a cluster installation restricted to two servers configured similar to that shown in the diagram below. The sample configurations for Oracle Cluster assume such an installation.

IMPORTANT:

Ensure you have installed the latest version of Oracle Solaris software and all appropriate software patches as described in the Operating System Requirements section.

IPMP is a feature supported on Oracle Solaris. For more complete configuration information, refer to Configuring IPMP for WEM Server and also to the Oracle product documentation.

Oracle Solaris Cluster is a feature provided and supported by Oracle. For more complete information on configuring Resource Groups, refer to the Oracle Solaris Cluster product documentation.

Installing Web Element Manager for Failover Mode

This section specifies the configuration changes required when installing WEM in Failover Mode rather than Standalone Mode when following the installation instructions in the Installing the WEM Software chapter. For this release, please use the GUI to perform the installation rather than the command line.

IMPORTANT:

Install and configure Web Element Manager in Failover Mode on both servers before configuring a cluster resource group.

The following items are either different from, or prerequisites for, the installation steps defined in the Installing the WEM Software chapter:

  • Create a file directory path <ems_dir> or use the default path: /users/ems.
  • The logical hostname and a floating IP address shared between the two nodes must be configured in /etc/hosts. Ems-Service is used as the logical hostname in the examples in the rest of this appendix.
  • Create the global disk path for a shared data directory, for example: /shareddir/ems-share.

IMPORTANT:

The following options are not set when installing in Failover Mode:

  • WEM Service started by default and monitored by Process Monitor. (See the WEM Process Monitor chapter for more information on processes.)
  • Start EMS on machine start-up.

Creating and Configuring a Cluster Resource Group

This section explains how to create a Resource Group specifically for WEM servers in this cluster and configure it appropriately.

IMPORTANT:

This process is configured on only one server in the cluster. It is reflected on both.

Creating a Resource Group

The following describe how to create a Resource Group.

We recommend the cluster binary path is set in the shell environment as this means you can execute the cluster commands from any directory path.

Before clsetup can create a network resource for any logical hostname, that hostname and a common floating IP address associated with it must be specified in the /etc/hosts directory on both servers. This example uses ems-service as the logical hostname.

  1. Login as root and run clsetup to open the Main Menu.
  2. From the Main Menu select Option 2: Resource Groups.
  3. From the Resource Groups Menu, select Option 1: Create a Resource Group. A resource group is a container into which you can place resources of various types, such as network and data service resources and then manage them. Only failover resource groups can contain network resources. A network resource would include logical hostname.
  4. When prompted to create a failover group, enter yes and select Option 1: Create a Failover Group. For this example, call the group ems-rg.
  5. When you are prompted to select a preferred server enter yes and enter the name of the Preferred server; for this example use Node1. Enter yes to continue the update. The screen will display the following message:
    clresourcegroup create
    -n <Node-1 Node Name> <Node-2 Node Name> ems-rg
    
    Command completed successfully.
    
    With the Resource Group created successfully, you can move on to the next step and add the logical hostname.

Adding a Logical Hostname to a Failover Resource Group

Follow steps 1 - 5 to add a logical hostname.

  1. After the confirmation screen from the last task displays, press Enter to continue. Enter yes when prompted to add network resources.
  2. From the Network Resources Menu, select Option 1: Add a Logical Hostname. If a failover resource group contains logical hostname resources, the most common configuration is to have one logical hostname resource for each subnet. Enter 1 to create a single resource.
  3. When prompted for a logical hostname, enter the logical hostname configured in /etc/hosts for the floating IP address, For this example use ems-service.
  4. Press Enter to continue. The screen displays:
    clreslogicalhostname
    create -g ems-rg -p R_description="LogicalHostname resource
    for ems-service" ems-service
    
  5. Enter no when prompted to add any additional network resources.

Adding a Data Service Resource

Follow steps 1 - 4 to add a data service.

  1. After the logical hostname confirmation screen, enter yes when prompted to begin adding data services.
  2. From the Data Services Menu select Option 1: EMSSCFO Server for Sun Cluster, and use the name ems-dsr for this example. The screen displays the following message:
    This data service uses
    the "Port_list" property. The default "Port_list"
    for this data service is as follows: <NULL>
    
  3. Enter no when prompted to override the default.
  4. Enter no when prompted to add more properties, then enter yes to continue. The screen displays the following message:
    Commands completed
    successfully
    

Bringing the Resource Group Online

Follow steps 1 - 2 to bring the Resource Group online.

  1. After the completion confirmation screen, press Enter to continue. Enter no when prompted to add any additional data service resources. Enter yes when prompted to manage and bring this resource group online. The screen displays the following message:
    clresourcegroup online
    -M ems-rg
    
    Commands completed
    successfully
    
  2. Press Enter to continue, then select Option q to Quit and return to the Main Menu. The process is now complete. At this point you can enter the scstat command to display the current online/offline status if required.

Upgrading Web Element Manager in a Clustered Environment

This section describes the process for upgrading Web Element Manager in a two-server cluster.

IMPORTANT:

Network administrators are advised that they should have any connected clients log out at this time. If clients cannot reconnect after the upgrade, please refer to the Troubleshooting appendix for information on any Java-related errors.

Prerequisite Steps for the Upgrade Process

For the example configuration that follows you should confirm the following:

  • The same version of Web Element Manager software (12.0 or newer) has been installed on both servers and configured in Failover Mode.
  • Config files and scripts are identical.
  • Devices can be failed over with no loss in connectivity. This can be confirmed either by a software switchover, or by running the scstat command to confirm the current node status.
  • Resources have been configured in the Oracle Cluster software. The following example names are used here:
  1. Two Cluster Nodes: N-1 (initially this is the active node) and N-2 (initially this is the redundant node).
  2. A Resource Group ems-rg managed by Web Element Manager has been created.
  3. A logical hostname ems-service and floating (shared) IP address has been configured on both servers.

Removing an Inactive Node from the Resource Group

Complete the following steps to remove N-2 from the Resource Group. Since the cluster resource group configuration will be same for both nodes, the cluster-related commands can be run on either node.

  1. Run the scstat command. scstat is used to verify the current status of the cluster resource group and ensures that on switchover/failover the servers will switch correctly. The following screen display reflects a properly configured cluster:
    Two cluster nodes:
    Online
    
    Two cluster transport
    paths: Online
    
    Quorum votes by node:
    Online
    
    Quorum votes by device:
    Online
    
    Resource Groups and
    Resources: ems-rg, ems-service, ems-dsr
    
    Ems-rg group N-1: Online
    N-2: Offline
    
    IPMP groups: Online
    

    IMPORTANT:

    N-2 must not be allowed to run any WEM processes. This prevents the secondary node from taking ownership of resources. Removing it from the Resource Group prevents a failover from happening and N-1 continues to behave like a standalone WEM thus ensuring a successful upgrade. To do this:

  2. Enter clsetup to open the Main Menu and select Option 2: Resource Groups Menu.
  3. From the Resource Groups menu, select Option 8: Change the Properties of a Resource Group.
  4. Enter yes when prompted to continue.
  5. Select the group to be changed by selecting Option 1: ems-rg.
  6. Select Option 1: Change the Nodelist Resource Group Properties
  7. Enter yes when prompted to continue. Both N-1 and N-2 should now appear in the nodelist.
  8. Select Option 2: Remove a Node from the Nodelist, then select Option 1 to remove N-2. The nodelist now contains only N-1. Enter yes when prompted to update the nodelist property. If your update was successful you will receive on-screen confirmation. Press Enter to continue. You will receive confirmation that only N-1 remains in the nodelist. Select Option q to Quit and exit back to the Resource Group Menu.
  9. From the Resource Group Menu select Option s: Show Current Status to confirm the current network resources (if confirmation is required).

Upgrading WEM on the Inactive Server

Complete the following steps to upgrade WEM on the inactive server, N-2.

  1. Web Element Manager begins installing along with Apache Server, PostgreSQL Server, and EMS Server. Installation continues until the following warning message appears:
    Updating PostgreSQL
    config file...This is an upgrade in Cluster mode; not updating postgres
    config.
    
    This message is normal because the database is to be updated from the active server, N-1.
  2. With the installation complete, select Option 3 to finish.

Updating the Databases

Complete the following steps on node N-1 to update the databases.

  1. Copy the sqlfiles.tar file from the N-2 installation to a folder on N-1 and untar the file. This process is described fully in the Installing the WEM Software chapter. This will create a folder called sqlfiles.
  2. Go to the sqlfiles folder and run dbClusterUpgrade.sh.
  3. At the prompt, enter the EMS directory name and press Enter.
  4. Enter a complete directory path for saving the new SQL files and press Enter.
  5. Enter the postgres administrator name assigned during the installation process. This is postgres by default.
  6. Press Enter.
  7. Enter the database port number and press Enter.

    IMPORTANT:

    Currently, this is port 5432, but this may change in a later release.

  8. The databases update and the screen displays the following message:
    Database schema upgraded
    successfully...
    

Returning the Inactive Node to the Resource Group

Complete the following steps to return N-2 to the Resource Group and take over resource ownership in order to upgrade the software on N-1.

  1. Run clsetup and then log in to access the Main Menu and select Option 2: Resource Groups.
  2. From the Resource Groups Menu select Option 8: Change the Properties of a Resource Group.
  3. Enter yes when prompted to continue.
  4. From the next screen select the Resource Group name.
  5. When prompted for the property to change, select Option 1: Nodelist.
  6. Enter yes to continue and open the next screen.
  7. Select Option 1: Add a Node/Zone to the Top of the Nodelist.
  8. Select Option 1: N-2.
  9. Enter yes when prompted to update the nodelist property. The screen will display the following message:
    Command completed successfully.
    
  10. Press Enter to continue and select Option q to Quit and return to the Resource Groups Menu.

Switching Active Servers

Complete the following steps to make N-2 the active node so N-1 can be updated.

  1. From the Resource Group Menu Select Option 5: Switch over a Resource.
  2. Scroll through the on-screen description and enter yes to continue.
  3. Select the name of the resource group. In this example it would be ems-rg.
  4. Select Option 1: Switch Group Ownership.
  5. Select the node to take ownership of ems-rg, which would be N-2. Enter yes to confirm. The screen will display the following message:
    Command completed successfully.
    
  6. Press Enter to continue and select Option q to Quit and return to the Resource Group Menu.
  7. From the Resource Group Menu select Option s Show Current Status. This shows that N-2 is now online and N-1 is offline. At this point return to Removing an Inactive Node from the Resource Group and begin the update process for N-1.

    IMPORTANT:

    Since the database schema were previously updated and both N-1 and N-2 share the same database, it is not necessary to run the SQL scripts again for N-1.

High Availability Mode Using Symantec Veritas Cluster Software (VCS)

This section provides instructions specific to a Symantec VCS installation to provide redundancy to multiple WEM servers. This software is documented by Symantec, and you will also need to refer to the Install, the Uninstall, and the Upgrade chapters in this guide. Server hardware requirements are in theWEM Port and Hardware Information chapter.

IMPORTANT:

Veritas Cluster is supported on both Sun servers using the Solaris Operating System and Cisco UCS servers using the RHEL OS. The VCS installation in this section is directed to installments on the RHEL platform. The VCS itself has a lot in common with the Solaris installation in the previous chapter; however, IPMP is proprietary software and supported only on the Solaris OS. A radio button on the installation screen allows the choice between a Solaris or a RHEL installation; for this reason, please use the GUI to perform the installation rather than the command line.

IMPORTANT:

There are configuration changes required when installing WEM in Failover Mode rather than Standalone Mode. These are described in the Installing the WEM Software chapter. For this release, please use the GUI to perform the installation rather than the command line.

Installation

Refer to the relevant documentation to install the appropriate operating system on the servers.

Refer to the VCS documentation for the following steps:

  1. Install the the Storage Foundation and Cluster Server software.
  2. Configure the Diskgroup, Volume and Mount Resource Groups for creating and mounting the shared-disk. There is an example of a valid Main.cf configuration below. These resources need to be online when installing the WEM application on each node. WEM will be part of the 'Application resource' and its status is monitored with the PID file of psmon (Monitor server). In case of WEM application resource failure, VCS will first try to restart WEM on the same node before switchover to the standby node.
  3. Mount the shared disk on the first cluster node.
  4. Start installing the WEM application on the first cluster node using the instructions in the Installing the WEM Software section in this guide. Make certain that the WEM application does not start after the installation is complete.
  5. Unmount the shared disk from the first cluster node and install it on the second node.
  6. Start the WEM installation on the second cluster node. Make sure that you provide the same parameters during installation that were used for the first installation.
  7. Refer to the VCS documentation and start configuring resource groups and resources. A sample of the Main.cf file follows:

Main.cf File Configuration Example

The following is an example of the main.cf file for resource-groups and resources.

group wemFailover (
       SystemList = { pnstextappsucs1 = 0,
pnstextappsucs3 = 1 }        AutoStartList = { pnstextappsucs3 } 
      )
Application wemService
(                StartProgram = "/users/ems/postgres//bin/emsctl
start"                StopProgram = "/users/ems/postgres//bin/emsctl forcestop"
               PidFiles = { "/users/ems/server/psmon.pid" } 
              RestartLimit = 1                )
   DiskGroup wemDG (
               DiskGroup = wemdg                )
   IP wemIP (      
         Device = eth0                Address = "10.4.83.151"
               NetMask = "255.255.255.0"              
 )
 Mount wemMount (  
             MountPoint = "/apps/wem/"
               BlockDevice = "/dev/vx/dsk/wemdg/wemvol"
               FSType = vxfs                FsckOpt = "-y"
            )
NIC wemNIC (       
        Device = eth0                )
 Volume wemVolume (
               DiskGroup = wemdg                Volume = wemvol
            )
  • wemIP requires wemNIC
  • wemMount requires wemVolume
  • wemService requires wemIP
  • wemService requires wemMount
  • wemVolume requires wemDG

as indicated below:

// resource dependency tree // // group wemFailover // { // Application wemService // { // IP wemIP // { // NIC wemNIC // } // Mount wemMount // { // Volume wemVolume // { // DiskGroup wemDG // } // } // } // }

Upgrading WEM with VCS

The process for upgrading WEM installed in HA mode with VCS is similar to the Sun Cluster upgrade process described earlier.

With VCS, in order to disable the resource on the standby node, you have to set the resource-group's Disable attribute for the standby node (system). This ensures that the resource group does not failover in between the upgrade and result in any sort of data corruption.

  1. Use the following command to disable a cluster resource group: $ hagrp -disable <resource-group name> -sys <node2>
  2. Use the following command to switch the resource group from one node to another:$ hagrp -switch <resource-group name> to <system>

Uninstalling WEM with VCS

Use the following steps to uninstall WEM in redundant mode.

  1. Disable the resource on the standby node. This will make sure that the resource group does not failover during the uninstall process and result in any sort of data corruption.$ hagrp -disable <resource group name> -sys <node2>
  2. Set the 'Critical' attribute of wem-service resource to '0'$ hares -modify <resource-name> Critical 0
  3. Offline the WEM Application service resource on active node:$ hares -offline <wem application resource name> -sys <node1> $ hagrp -disable <resource group name> -sys <node1>
  4. Uninstall theWEM application from the current active node <node1>.
  5. Enable the resource on the standby node and disable it on current active node.$ hagrp -enable <resource group name> -sys <node2>$ hagrp -disable <resource group name> -sys <node1>
  6. Switch over the resource group to Standby Node <node2>.
  7. Uninstall WEM from the Standby Node <node2>.
  8. Disable/Offline the resource groups from the Standby Node <node2>.