Cisco Application Velocity System User Guide (Software Version 5.0)
Availability Manager Clustering

Table Of Contents

Availability Manager Clustering

Terminology

Overview

Configuring and Activating AM

AVS 3120

Other Hardware

SSL Session Persistence

FAQ

Limitations


Availability Manager Clustering


This chapter describes how to configure and use the Availability Manager (AM), which provides a built-in high availability and load balancing capability for a cluster of application appliances. It includes the following topics:

Terminology

Overview

Configuring and Activating AM

SSL Session Persistence

FAQ

Limitations


Note The Availability Manager operates on the application appliance (not on the AVS 3180 Management Station).


Terminology

Cluster is a widely-used term meaning independent computers combined together into a unified system through software and networking. Cluster computing consists of three important branches:

High-availability (HA) clustering uses multiple machines to add an extra level of reliability for a service or group of services.

Load-balance clustering uses specialized routing techniques to dispatch traffic to a pool of servers.

Computation clustering (such as Beowulf) uses multiple machines to provide greater computing power for computationally intensive tasks. This type of clustering is not addressed by Availability Manager.

Availability Manager is a clustering solution that is based on the first two types of clustering technology.

Overview

Availability Manager (AM) provides a built-in high availability and load balancing capability for a cluster of application appliances. No additional load balancing hardware is required.

Figure 10-1 shows a typical usage of the Availability Manager solution. The AM components are enabled on appliance-1 and appliance-2. At any given time, only one AM component is active, and the other is in standby mode. The virtual IP (VIP) is always hosted on the active AM. In this example, appliance-1 plays the active AM role, appliance-2 the standby one, and appliance-3 is a pure Performance Node only. When the client issues an HTTP request, the DNS server resolves the hostname in a virtual IP that is hosted on appliance-1.

Figure 10-1 Typical AM Load Balancing Scenario

The AM manages a pool of Performance Nodes that are the actual nodes that handle the application requests. In the example in Figure 10-1, the Performance Node pool contains Performance node1, Performance node2, and Performance node3.

The AM regularly checks the availability of each Performance Node in the pool and new requests will not be directed to a failed Performance Node. The active and standby AMs monitor each other with a heartbeat checking mechanism. If the active AM is found to be down, the standby AM will take over the VIP to become the active AM. It is crucial that both AMs share exactly the same configuration for successful failover.

The AM load balancing can be based on a few different methods summarized in Table 10-1. Currently we only support Weighted Least-Connections (WLC).

Table 10-1 Load-balancing Methods 

Method
Description

Round robin

Distributes jobs equally among the Performance Nodes.

Least-connections

Distributes more jobs to Performance Nodes with fewer active connections. (The AM connection table stores active connections.)

Weighted round robin

Distributes more jobs to servers with greater capacity. Capacity is indicated by the user-assigned weight, which is adjusted upward or downward by dynamic load information.

Weighted least-connections

Distributes more jobs to servers with fewer active connections relative to their capacity. Capacity is indicated by the user-assigned weight, which is adjusted upward or downward by dynamic load information.


Configuring and Activating AM

The Availability Manager is preinstalled on the application appliance, but it is initially inactive. This section describes how to configure and activate it. The procedure is different depending on the device.

AVS 3120

Other Hardware

The configuration examples in this section correspond to the AM cluster shown in Figure 10-2.

Figure 10-2 AM Cluster Example

AVS 3120

To configure AM on the AVS 3120, use the CLI commands set am, set lb cluster, and set lb server, as discussed in the following steps:


Step 1 Configure the global AM parameters with the set am command:

velocity>set am enable backup-server active primary p_ip secondary s_ip frequency 1 
dead-detection-interval 3

where:

p_ip is the primary active AM IP address, such as 10.0.8.11.

s_ip is the secondary standby AM IP address, such as 10.0.8.12.

frequency specifies the number of seconds between heartbeats (a check to see if the active AM is still operating). Typically you use a short interval, like 1.

dead-detection-interval specifies the number of seconds to wait before declaring a non-responding AM dead and initiating failover. Typically you use a short interval, like 3, that is a multiple of the frequency parameter.

Here is an example of the set am command used to configure the cluster shown in Figure 10-2:

velocity>set am enable backup-server active primary 10.0.8.11 secondary 10.0.8.12 
frequency 1 dead-detection-interval 3

You can view the AM configuration settings by using the show am command:

velocity>show am

Step 2 Configure the load balancing cluster parameters with the set lb cluster command:

velocity>set lb cluster name name vip ip netmask mask active port port persistence p_sec 
re-entry r_sec timeout t_sec

where:

name is the virtual server name. The name must have the prefix fgncluster, for example: fgncluster_http

ip is the virtual server IP address, such as 10.0.8.1. This is a floating IP address that has been associated with a fully-qualified domain name.

mask is the virtual server network mask, such as 255.255.0.0

port is the virtual server listening port, such as 80.

p_sec, if greater than zero, enables persistent connection support and specifies a timeout value in seconds. In order to use delta optimization, you must specify a value greater than zero.

r_sec is the number of seconds that a restored Performance Node must remain alive before being re-added to the routing table.

t_sec is the number of seconds that must lapse before a Performance Node determined to be dead is removed from the routing table.

Here is an example of a set lb cluster command used to configure the cluster shown in Figure 10-2:

velocity>set lb cluster name fgnclusterhttp vip 10.0.8.1 netmask 255.255.0.0 active port 
80 persistence 360 re-entry 15 timeout 60

You can view the AM cluster configuration settings by using the show lb cluster command:

velocity>show lb cluster all

Step 3 Configure the load balancing parameters for the first server in the cluster with the set lb server command:

velocity>set lb server cluster v_name server name ip ip weight 1 active

where:

v_name is the virtual server name under which this real server appears. This is the name specified for the virtual server in the set lb cluster command.

name is the real server name, such as fgn1. The name must be unique.

ip is the real server IP address, such as 10.0.8.11. It must be on the same subnet of the VIP.

weight is an integer that specifies this server's processing capacity relative to that of other Performance Nodes. For example, a server assigned 2000 has twice the capacity of a server assigned 1000.

Here is an example of a set lb server command used to configure the primary AM server shown in Figure 10-2:

velocity>set lb server cluster fgnclusterhttp server fgn1 ip 10.0.8.11 weight 1 active

You can view the AM cluster configuration settings by using the show lb cluster command:

velocity>show lb cluster all

Step 4 Configure the load balancing parameters for the second server in the cluster with the set lb server command. This command is just like the first set lb server command, expect that the server name and IP address will be different. Here is an example used to configure the secondary AM server shown in Figure 10-2:

velocity>set lb server cluster fgnclusterhttp server fgn2 ip 10.0.8.12 weight 1 active


The CLI commands discussed above result in the generation and/or modification of the lvs.cf configuration file that resides on the device at $AVS_HOME/appliance/cluster/conf/lvs.cf. You can also edit this file directly to change the AM configuration, or copy it to other AVS devices. For more details on the lvs.cf file, see the next section, Other Hardware. Note that not all parameters in the lvs.cf file can be set by using CLI commands.

After the configuration file is created, you may want to copy it to the standby AM as it is crucial that both AMs share exactly the same configuration for successful failover. Alternatively, you can execute the same CLI commands on the standby AM. Then reboot both AM servers to invoke the new configuration.

After configuration, to activate the AM service, use the following CLI commands.

To start the AM service, use this command:

velocity>set lb status am-active

To stop the AM service, use this command:

velocity>set lb status am-inactive

To make the application appliance operate as a pure Performance Node, like appliance-3 in Figure 10-1, use this command:

velocity>set lb status server-only

Other Hardware

If you are operating AVS on other hardware, there is no CLI available. Use the configuration file and shell commands documented in this section to configure and start AM.

The configuration file for load balancing and failover is $AVS_HOME/appliance/cluster/conf/lvs.cf. You need to use a text editor like vi or emacs to create such a configuration file. In $AVS_HOME/appliance/cluster/conf, there is a sample configuration file named lvs.cf.example that can be used as a reference for your version of lvs.cf. The network diagram corresponding to this example configuration file is shown in Figure 10-2.

Below is the lvs.cf example file. Following the listing, the different parts of the file are explained. Table 10-2 describes the global parameters. Table 10-3 describes the virtual server parameters that appear in each virtual block in the file. Table 10-4 describes the real server parameters that appear in the server blocks that appear within the virtual block.

primary = 10.0.8.11
service = lvs
backup_active = 1
backup = 10.0.8.12
heartbeat = 1
heartbeat_port = 539
keepalive = 1
deadtime = 3
network = direct
virtual fgncluster_http {
     active = 1
     address = 10.0.8.1 eth1:0
     vip_nmask = 255.255.0.0
     port = 80
     persistent = 360
     pmask = 255.255.255.255
     send_program = "/usr/avs/appliance/cluster/bin/fgn_heartbeat.pl %h 80"
     expect = "Succeed"
     scheduler = wlc
     protocol = tcp
     timeout = 60
     reentry = 15
     server fgn1 {
         address = 10.0.8.11
         active = 1
         weight = 1
     }
     server fgn2 {
         address = 10.0.8.12
         active = 1
         weight = 1
     }
}
virtual fgncluster_https {
     active = 1
     address = 10.0.8.1 eth1:1
     vip_nmask = 255.255.0.0
     port = 443
     persistent = 360
     pmask = 255.255.255.255
     send_program = "/usr/avs/appliance/cluster/bin/fgn_heartbeat.pl %h 443"
     expect = "Succeed"
     scheduler = wlc
     protocol = tcp
     timeout = 60
     reentry = 15
     server fgn1 {
         address = 10.0.8.11
         active = 1
         weight = 1
     }
     server fgn2 {
         address = 10.0.8.12
         active = 1
         weight = 1
     }
}

Table 10-2 Global Parameters

Parameter
Description

primary =

Enter the IP address of the adapter connecting the active AM.

backup =

Enter the IP address of the adapter connecting the standby AM.

backup_active = [0 | 1]

Disables or enables the failover of AM.

heartbeat = [0 | 1]

Disable or enables heartbeat checking; the default is 1.

heartbeat_port =

Enter the port number used for the heartbeat on the active and standby AM; the default is 539.

keepalive =

Enter the number of seconds between heartbeats.

deadtime =

Enter the number of seconds to wait before declaring a non-responding AM dead and initiating failover.

rsh_command = [rsh|ssh]

Enter the command family to use for synchronizing the configuration files on the primary and backup routers. Important: You must enable the selected command on the primary (active) and backup (standby) AMs.

network = [nat|direct|tunnel]

Currently, only direct routing is supported.


Table 10-3 Virtual-Server Parameters 

Parameter
Description

virtual name

Enter a unique identifier for the virtual server. The name must have the prefix fgncluster, for example: fgncluster_http

address =

Enter the virtual server's IP address: a floating IP address that has been associated with a fully-qualified domain name.

vip_nmask =

Enter the virtual server's netmask.

active = [0|1]

Enables (1) or disables (0) this address.

load_monitor = [none]

Currently we do not support load monitoring.

timeout =

Enter the number of seconds (default 10) that must lapse before a Performance Node determined to be dead is removed from the routing table.

reentry =

Enter the number of seconds (default 180) that a restored Performance Node must remain alive before being re-added to the routing table.

port = [80]

Enter the listening port on this virtual server; the default is 80.

scheduler = [wlc]

Select the scheduling algorithm (default wlc) for distributing jobs from this virtual server to the Performance Nodes. Currently we do not support a scheduling method other than wlc (Weighted Least Connections).

send_program =

Use a program to check regularly if a Performance Node is up. The default is "fgn_heartbeat.pl %h port". Make sure the port is the same as the virtual server port.

expect =

The send_program produces a string to indicate the heartbeat check result. The string "Succeed" is for the expected result of fgn_heartbeat.pl.

persistent =

If greater than zero, enables persistent connection support and specifies a timeout value. In order to use delta optimization, you must specify a value greater than zero.

pmask =

If persistence is enabled (the persistent keyword), this is a netmask to apply to routing rules for enabling subnets for persistence.


Table 10-4 Real-Server Parameters

Parameter
Description

server name

Enter a unique name for the Performance Node.

address =

Enter the IP address of the Performance Node. It must be on the same subnet of the VIP.

active = [0|1]

Enables (1) or disables (0) the Performance Node.

weight =

Enter an integer (default is 1) specifying this server's processing capacity relative to that of other Performance Nodes. For example, a server assigned 2000 has twice the capacity of a server assigned 1000.


After the configuration file is created, you must copy lvs.cf file into the standby AM as it is crucial that both AMs share exactly the same configuration for successful failover. Then reboot both AM servers to invoke the new configuration.

To start the AM service, use this command:

$AVS_HOME/appliance/cluster/bin/cluster_service on

To stop the AM service, use this command:

$AVS_HOME/appliance/cluster/bin/cluster_service off

To make the application appliance operate as a pure Performance Node, like appliance-3 in Figure 10-1, use this command:

$AVS_HOME/appliance/cluster/bin/cluster_service pure_perfnode

SSL Session Persistence

SSL Session Persistence is supported through source-IP persistence.

FAQ

Is your AM Solution Active-Active or Active-Passive?

It is both. In terms of Load Balancing decision making, it's Active-Passive because at any given time, there is only one AM node allowed to host the VIP. In terms of Performance Nodes receiving requests, it is active-active. In other words, requests will be distributed to all the Performance Nodes placed into the cluster.

How do I know which server is the active AM and how do I know an AM failover has happened?

Run this command on each application appliance:

$AVS_HOME/appliance/cluster/bin/cluster_whoami

There are three possible results:

"I am the ACTIVE AM."

"I am the STANDBY AM."

"I have no AM running."

How do I know if the AM is functioning according to my configuration file?

Run this command: ipvsadm -L

This will display the active load balancing rules and the connection statistics. For example, on the active AM you might see this:

ipvsadm -L
IP Virtual Server version 1.0.8 (size=65536)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.8.1:http wlc
  -> 10.0.8.11:http               Local   1      0          0         
  -> 10.0.8.12:http               Route   1      0          0         

On the standby AM, you should see an empty rule like this:

ipvsadm -L
IP Virtual Server version 1.0.8 (size=65536)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

I have modified lvs.cf and saved the file on both AMs. How come the AM rule display is still not updated?

You have to restart the pulse daemon to make the new configuration changes effective on the active AM server by typing this command:

service pulse restart

How do I know if the AM has detected a Performance Node failure?

Use the command ipvsadm -L to find out what the AM thinks to be good Performance Nodes. For example, if the Performance Node on 10.0.8.12 is down, the active AM will not display a rule for 10.0.8.12:

ipvsadm -L
IP Virtual Server version 1.0.8 (size=65536)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.8.1:http wlc
  -> 10.0.8.11:http               Local   1      0          0         

How do I make sure that my server is in the correct state of AM service?

There are three kinds of servers: active AM server (like appliance-1 in Figure 10-1), standby AM server (like appliance-2 in Figure 10-1), and pure Performance Node (like appliance-3 in Figure 10-1). Table 10-5 describes what components are expected to run on these three kinds of servers.

Table 10-5 Normal States of AM Cluster

Cluster Components
Active AM
Standby AM
Pure Performance Node

iptables redirect rule

no

yes

yes

ipvsadm rule

yes

no

no

process pulse

up

up

no

process nanny

up

up

no

cluster_whoami

I am the PRIMARY AM.

I am the STANDBY AM.

I have no AM running.

vipredirect log record (latest)

This is the Active AM.

This is the Standby AM.

No pulse running. This could be a pure Performance Node server, or the pulse is down.


What is the performance penalty for combining AM with Performance Node on the same physical application appliance?

The AM cluster is kernel-based and extremely light weight. Our stress test shows no degradation on throughput when both AM and Performance Node operate together. When it comes to the maximum number of Performance Nodes that can be put into the AM cluster, the limiting factor is the network link bandwidth on the Active AM server.

What happens to an active connection when the AM failover takes place?

In a cluster with two application appliances, where one is an active AM and the other is a standby AM, if the active connection is to the standby AM server, the connection will stay inactive when the failover happens. The application will use send-retry to handle any packet loss during the period between the AM node failure and the AM failover. In the other scenarios, the connection state is unpredictable.

How can I quickly simulate an AM node failure?

You can stop the pulse process with the command service pulse stop, or you can stop the network service with the command service network stop. If this happens on the Active AM server, then a failover should take place in three seconds by default. For information on how to verify if a failover has happened, refer to the question "How do I know which server is the active AM and how do I know an AM failover has happened?"

Can I use your AM cluster for my origin servers?

Technically it is possible, but we do not currently support that configuration.

After the failover takes place, will the originally Active AM node take back the VIP if it is back online?

Not until the new Active AM node fails.

Can I use a proxy in front of the AM cluster?

Yes, however, a proxy typically forwards all requests to the AM cluster with the proxy's IP address as the source IP address. The AM uses the source IP address for persistence determination, which is required for delta optimization. This means that all requests sent through a proxy will be handled by only one Performance Node. If you have two Performance Nodes, the other Performance Node will act as a standby node.

If delta optimization is not used, then the persistence parameter can be removed from the AM configuration file, and both Performance Nodes will be used in an active-active mode.

Limitations

The AM solution currently has the following limitations:

The Performance Node must listen to the same port as the virtual server. Therefore a Performance Node must listen on port 80 for HTTP virtual service and on port 443 for HTTPS virtual service.

Cookie persistence is not supported. A workaround is to use source-IP persistence which is illustrated in the lvs.cf.example file.

Only one AM node is allowed to be the active AM, and only one AM node is allowed to be the standby AM node. This means that you will not have more redundancy for AM when you deploy more than two application appliances. But certainly deploying more than two application appliances will increase your Performance Node capacity.

The AM scheduling algorithm only supports Weighted Least Connection mode.

The Performance Node IP must be on the same subnet as the VIP. Separating Performance Nodes into a different network from the VIP is currently not supported.