Table Of Contents
Availability Manager Clustering
Terminology
Overview
Configuring and Activating AM
lvs.cf File
SSL Session Persistence
FAQ
Limitations
Availability Manager Clustering
This chapter describes how to configure and use the Availability Manager (AM), which provides a built-in high-availability and load-balancing capability for a cluster of application appliances. This chapter includes the following topics:
•
Terminology
•
Overview
•
Configuring and Activating AM
•
SSL Session Persistence
•
FAQ
•
Limitations
Note
The Availability Manager operates on the AVS 3120 (not on the AVS 3180 Management Station).
Terminology
Cluster is a widely-used term meaning independent computers combined together into a unified system through software and networking. Cluster computing consists of three important branches:
•
High-availability (HA) clustering uses multiple machines to add an extra level of reliability for a service or group of services.
•
Load-balance clustering uses specialized routing techniques to dispatch traffic to a pool of servers.
•
Computation clustering (such as Beowulf) uses multiple machines to provide greater computing power for computationally intensive tasks. This type of clustering is not addressed by Availability Manager.
Availability Manager is a clustering solution that is based on the first two types of clustering technology.
Overview
Availability Manager (AM) provides a built-in high-availability and load-balancing capability for a cluster of application appliances. No additional load-balancing hardware is required.
Figure 11-1 shows a typical example of an Availability Manager solution. The AM components are enabled on appliance-1 and appliance-2. At any given time, only one AM component is active, and the other is in standby mode. The virtual IP (VIP) is always hosted on the active AM. In this example, appliance-1 plays the active AM role, appliance-2 is the standby role, and appliance-3 is a performance node only. When the client issues an HTTP request, the DNS server resolves the hostname in a virtual IP that is hosted on appliance-1.
Figure 11-1 Typical AM Load-Balancing Scenario
The AM manages a pool of performance nodes that are the actual nodes that handle the application requests. In the example in Figure 11-1, the performance node pool contains performance node1, performance node2, and performance node3.
The AM regularly checks the availability of each performance node in the pool, and new requests will not be directed to a failed performance node. The active and standby AMs monitor each other with a heartbeat-checking mechanism. If the active AM is down, the standby AM will take over the VIP to become the active AM. For successful failover, both AMs share exactly the same configuration.
The AM load balancing can be based on a few different methods, which are described in Table 11-1. Currently only Weighted Least-Connections (WLC) is supported.
Table 11-1 Load-balancing Methods
Method
|
Description
|
Round robin
|
Distributes jobs equally among the performance nodes.
|
Least-connections
|
Distributes more jobs to performance nodes with fewer active connections. (The AM connection table stores active connections.)
|
Weighted round robin
|
Distributes more jobs to servers with greater capacity. Capacity is indicated by the user-assigned weight, which is adjusted upward or downward by dynamic load information.
|
Weighted least-connections
|
Distributes more jobs to servers with fewer active connections relative to their capacity. Capacity is indicated by the user-assigned weight, which is adjusted upward or downward by dynamic load information.
|
Configuring and Activating AM
The Availability Manager is preinstalled on the application appliance, but it is initially inactive. This section describes how to configure and activate it.
The configuration examples in this section correspond to the AM cluster shown in Figure 11-2.
Figure 11-2 AM Cluster Example
To configure AM using the CLI commands set am, set lb cluster, and set lb server, follow these steps:
Step 1
Configure the global AM parameters with the set am command:
velocity>set am enable backup-server active primary p_ip secondary s_ip frequency 1
dead-detection-interval 3
where:
p_ip is the primary active AM IP address, such as 10.0.8.11.
s_ip is the secondary standby AM IP address, such as 10.0.8.12.
frequency specifies the number of seconds between heartbeats (a check to see if the active AM is still operating). Typically you use a short interval, like 1.
dead-detection-interval specifies the number of seconds to wait before declaring a non-responding AM dead and initiating failover. Typically you use a short interval, like 3, that is a multiple of the frequency parameter.
Here is an example of the set am command used to configure the cluster shown in Figure 11-2:
velocity>set am enable backup-server active primary 10.0.8.11 secondary 10.0.8.12
frequency 1 dead-detection-interval 3
To view the AM configuration settings, enter the show am command:
Step 2
Configure the load balancing cluster parameters with the set lb cluster command:
velocity>set lb cluster name name vip ip netmask mask active port port persistence p_sec
re-entry r_sec timeout t_sec
where:
name is the virtual server name. The name must have the prefix fgncluster, for example, fgncluster_http
ip is the virtual server IP address, such as 10.0.8.1. This is a floating IP address that has been associated with a fully qualified domain name.
mask is the virtual server network mask, such as 255.255.0.0
port is the virtual server listening port, such as 80.
p_sec, if greater than zero, enables persistent connection support and specifies a timeout value in seconds. In order to use delta optimization, you must specify a value greater than zero.
r_sec is the number of seconds that a restored performance node must remain alive before being readded to the routing table.
t_sec is the number of seconds that must lapse before a performance node determined to be inoperative is removed from the routing table.
Here is an example of a set lb cluster command used to configure the cluster shown in Figure 11-2:
velocity>set lb cluster name fgnclusterhttp vip 10.0.8.1 netmask 255.255.0.0 active port
80 persistence 360 re-entry 15 timeout 60
To view the AM cluster configuration settings, enter the show lb cluster command:
velocity>show lb cluster all
Step 3
Configure the load-balancing parameters for the first server in the cluster with the set lb server command:
velocity>set lb server cluster v_name server name ip ip weight 1 active
where:
v_name is the virtual server name under which this real server appears. This is the name specified for the virtual server in the set lb cluster command.
name is the real server name, such as fgn1. The name must be unique.
ip is the real server IP address, such as 10.0.8.11. It must be on the same subnet of the VIP.
weight is an integer that specifies this server's processing capacity relative to that of other Performance Nodes. For example, a server assigned 2000 has twice the capacity of a server assigned 1000.
Here is an example of a set lb server command used to configure the primary AM server shown in Figure 11-2:
velocity>set lb server cluster fgnclusterhttp server fgn1 ip 10.0.8.11 weight 1 active
To view the AM cluster configuration settings, enter the show lb cluster command:
velocity>show lb cluster all
Step 4
Configure the load-balancing parameters for the second server in the cluster with the set lb server command. This command is just like the first set lb server command, expect that the server name and IP address will be different. Here is an example used to configure the secondary AM server shown in Figure 11-2:
velocity>set lb server cluster fgnclusterhttp server fgn2 ip 10.0.8.12 weight 1 active
The CLI commands described in the procedure generate and modify the lvs.cf configuration file that resides on the device at $AVS_HOME/appliance/cluster/conf/lvs.cf. You can also edit this file directly to change the AM configuration, or copy it to other AVS devices. For more details on the lvs.cf file, see the next section, lvs.cf File. Not all parameters in the lvs.cf file can be set by using CLI commands.
After the configuration file is created, you may want to copy it to the standby AM because it is crucial that both AMs share exactly the same configuration for a successful failover. Alternatively, you can execute the same CLI commands on the standby AM, and then reboot both AM servers to invoke the new configuration.
After you do the configuration, to activate the AM service use the following CLI commands:
•
To start the AM service, use this command:
velocity>set lb status am-active
•
To stop the AM service, use this command:
velocity>set lb status am-inactive
•
To make the application appliance operate as a pure performance node (such as appliance-3 shown in Figure 11-1), use this command:
velocity>set lb status server-only
lvs.cf File
The configuration file that is used for load balancing and failover is $AVS_HOME/appliance/cluster/conf/lvs.cf. In $AVS_HOME/appliance/cluster/conf, there is a sample configuration file named lvs.cf.example that can be used as a reference for your version of lvs.cf. The network diagram corresponding to this example configuration file is shown in Figure 11-2.
An example of the lvs.cf file is shown here. The different parts of the file are described in the tables that follow. Table 11-2 describes the global parameters; Table 11-3 describes the virtual server parameters that appear in each virtual block in the file; and Table 11-4 describes the real server parameters that appear in the server blocks that appear within the virtual block.
virtual fgncluster_http {
address = 10.0.8.1 eth1:0
send_program = "/usr/avs/appliance/cluster/bin/fgn_heartbeat.pl %h 80"
virtual fgncluster_https {
address = 10.0.8.1 eth1:1
send_program = "/usr/avs/appliance/cluster/bin/fgn_heartbeat.pl %h 443"
Table 11-2 Global Parameters
Parameter
|
Description
|
primary =
|
IP address of the adapter connecting the active AM.
|
backup =
|
IP address of the adapter connecting the standby AM.
|
backup_active = [0 | 1]
|
Disables or enables the failover of AM.
|
heartbeat = [0 | 1]
|
Disables or enables heartbeat checking; the default is 1.
|
heartbeat_port =
|
Port number used for the heartbeat on the active and standby AM; the default is 539.
|
keepalive =
|
Number of seconds between heartbeats.
|
deadtime =
|
Number of seconds to wait before declaring a nonresponding AM dead and initiating failover.
|
rsh_command = [rsh|ssh]
|
Command family to use for synchronizing the configuration files on the primary and backup routers.
Note You must enable the selected command on the primary (active) and backup (standby) AMs.
|
network = [nat|direct|tunnel]
|
Currently only direct routing is supported.
|
Table 11-3 Virtual Server Parameters
Parameter
|
Description
|
virtual name
|
Unique identifier for the virtual server. The name must have the prefix fgncluster, for example, fgncluster_http
|
address =
|
Virtual server's IP address: a floating IP address that has been associated with a fully-qualified domain name.
|
vip_nmask =
|
Virtual server's netmask.
|
active = [0|1]
|
Enables (1) or disables (0) this address.
|
load_monitor = [none]
|
Currently load monitoring is not supported.
|
timeout =
|
Number of seconds (default 10) that must lapse before a performance node that is determined to be inoperable is removed from the routing table.
|
reentry =
|
Number of seconds (default 180) that a restored performance node must remain alive before being readded to the routing table.
|
port = [80]
|
Listening port on this virtual server; the default is 80.
|
scheduler = [wlc]
|
Scheduling algorithm (default wlc) for distributing jobs from this virtual server to the performance nodes. Currently no scheduling method other than Weighted Least Connections (wlc) is supported.
|
send_program =
|
Program to check regularly if a performance node is up. The default is "fgn_heartbeat.pl %h port". Make sure the port is the same as the virtual server port.
|
expect =
|
The send_program produces a string to indicate the heartbeat check result. The string "Succeed" is for the expected result of fgn_heartbeat.pl.
|
persistent =
|
If greater than zero, enables persistent connection support and specifies a timeout value. In order to use delta optimization, you must specify a value greater than zero.
|
pmask =
|
If persistence is enabled (the persistent keyword), this is a netmask to apply to routing rules for enabling subnets for persistence.
|
Table 11-4 Real Server Parameters
Parameter
|
Description
|
server name
|
Unique name for the performance node.
|
address =
|
IP address of the performance node; it must be on the same subnet of the VIP.
|
active = [0|1]
|
Enables (1) or disables (0) the performance node.
|
weight =
|
Integer (default is 1) that specifies this server's processing capacity relative to that of other performance nodes. For example, a server assigned 2000 has twice the capacity of a server assigned 1000.
|
SSL Session Persistence
SSL Session Persistence is supported through source-IP persistence.
FAQ
This section contains frequently asked questions.
Is your AM solution active-active or active-passive?
The solution is both. In terms of load balancing decision making, it is active-passive because at any given time, there is only one AM node allowed to host the VIP. In terms of performance nodes receiving requests, it is active-active because requests will be distributed to all of the performance nodes placed into the cluster.
How do I know which server is the active AM and how do I know an AM failover has happened?
To determine which server is the active AM, enter this command on each application appliance:
$AVS_HOME/appliance/cluster/bin/cluster_whoami
One of these responses is displayed:
"I am the ACTIVE AM."
"I am the STANDBY AM."
"I have no AM running."
How do I know if the AM is functioning according to my configuration file?
Enter this command:
The active load-balancing rules and the connection statistics are displayed. For example, on the active AM you might see this result:
IP Virtual Server version 1.0.8 (size=65536)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
-> 10.0.8.11:http Local 1 0 0
-> 10.0.8.12:http Route 1 0 0
On the standby AM, you should see an empty rule similar to this result:
IP Virtual Server version 1.0.8 (size=65536)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
I have modified lvs.cf and saved the file on both AMs. Why is the AM rule display still not updated?
You have to restart the pulse daemon to make the new configuration changes effective on the active AM server by entering this command:
How do I know if the AM has detected a performance node failure?
Use the ipvsadm -L command to find out what the AM determines to be good performance nodes. For example, if the performance node on 10.0.8.12 is down, the active AM will not display a rule for 10.0.8.12:
IP Virtual Server version 1.0.8 (size=65536)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
-> 10.0.8.11:http Local 1 0 0
How do I make sure that my server is in the correct state of AM service?
Three types of servers are available: active AM server (such as appliance-1 in Figure 11-1), standby AM server (such as appliance-2 in Figure 11-1), and performance node only (such as appliance-3 in Figure 11-1). Table 11-5 describes what components are expected to run on these servers.
Table 11-5 Normal States of AM Cluster
Cluster Components
|
Active AM
|
Standby AM
|
Performance Node Only
|
iptables redirect rule
|
No
|
Yes
|
Yes
|
ipvsadm rule
|
Yes
|
No
|
No
|
process pulse
|
Up
|
Up
|
No
|
process nanny
|
Up
|
Up
|
No
|
cluster_whoami
|
I am the PRIMARY AM.
|
I am the STANDBY AM.
|
I have no AM running.
|
vipredirect log record (latest)
|
This is the Active AM.
|
This is the Standby AM.
|
No pulse running. This could be a pure Performance Node server, or the pulse is down.
|
What is the performance penalty for combining AM with performance node on the same physical application appliance?
The AM cluster is kernel-based and extremely light weight. A stress test shows no degradation on throughput when both AM and performance node operate together. The network link bandwidth on the active AM server determines the maximum number of performance nodes that can be put into the AM cluster.
What happens to an active connection when the AM failover takes place?
In a cluster with two application appliances, where one AM is active and the other is standby, if the active connection is to the standby AM server, the connection will stay inactive when the failover happens. The application will use send-retry to handle any packet loss during the period between the AM node failure and the AM failover. In the other scenarios, the connection state is unpredictable.
How can I quickly simulate an AM node failure?
You can stop the pulse process with the service pulse stop command, or you can stop the network service with the service network stop command. If these actions occur on the Active AM server, then a failover should take place in three seconds by default. For information on how to verify if a failover has occurred, refer to the question "How do I know which server is the active AM and how do I know an AM failover has happened?"
Can I use your AM cluster for my origin servers?
Technically it is possible, but that configuration is not currently supported.
After the failover takes place, will the original active AM node take back the VIP if it is back online?
Not until the new active AM node fails.
Can I use a proxy in front of the AM cluster?
Yes, however, a proxy typically forwards all requests to the AM cluster with the proxy's IP address as the source IP address. The AM uses the source IP address for persistence determination, which is required for delta optimization. All requests sent through a proxy will be handled by only one performance node. If you have two performance nodes, the other performance node will act as a standby node.
If delta optimization is not used, then the persistence parameter can be removed from the AM configuration file, and both performance nodes will be used in an active-active mode.
Limitations
The AM currently has the following limitations:
•
The performance node must listen to the same port as the virtual server. A performance node must listen on port 80 for HTTP virtual service and on port 443 for HTTPS virtual service.
•
Cookie persistence is not supported. A workaround is to use source-IP persistence, which is illustrated in the lvs.cf.example file.
•
Only one AM node is allowed to be the active AM, and only one AM node is allowed to be the standby AM node. You will not have more redundancy for AM when you deploy more than two application appliances. If you deploy more than two application appliances, your performance node capacity will increase.
•
The AM scheduling algorithm only supports Weighted Least Connection (WLC) mode.
•
The performance node IP must be on the same subnet as the VIP. Separating performance nodes into a different network from the VIP is currently not supported.