In the existing setup, the HA Manager supports monitoring of Proxy Mobile IPv6 (PMIP) sessions for up to 256 peers through the heartbeat mechanism. Now there is a requirement to increase the monitoring of peers from 256 to 128000.
To increase the number of PMIP sessions to enable more peers to be monitored for path failure with the heartbeat mechanism, a new CLI monitor-max-peers is added under the LMA Service Configuration mode. This feature supports the following behavior:
-
When configured, the maximum number of peers that can be supported for heartbeat monitoring can be increased from 256 to 128000 peers.
-
The first 128000 peers are identified during the calls irrespective of whether the heartbeat mechanism is enabled or not.
-
A separate list is maintained for retransmission heartbeats and periodic heartbeats for batch processing.
-
The decision to monitor peers is done at the time of call setup, recovery, and ICSR. For example, consider that there are more than 256 peers (considering the CLI is configured for a maximum of 128000 peers) that are being monitored for heartbeat. Later, this configuration is changed to default, which is for a maximum of only 256 peers. Then, monitoring continues for all peers until HA Manager recovery or ICSR (with monitor-max-peers configuration of a maximum 256 peers) occurs.
-
The parameters for batch processing for heartbeat messages are changed as follows:
|
Batch Size (before)
|
Batch Size
|
Batch Interval (before)
|
Batch Interval
|
Periodic heartbeat batch
|
100
|
550
|
200 ms
|
200 ms
|
Retransmission heartbeat batch
|
100
|
550
|
200 ms
|
100 ms
|
-
If more than 10% of peers (12800 peers) are not responding, then the detection of path failure of nodes is delayed. This delay is to avoid a huge impact on performance when such a condition occurs.
If retransmissions start occurring for more than the batch size expected based on the calculations, then the heartbeat messages follow the periodic timer for sending heartbeat messages. For example, if the configuration of the heartbeat interval is 60 seconds, retransmission timeout is 3 seconds, and maximum retries is 3.
Now if the number of heartbeat messages for retransmissions exceed the expected batch size, then instead of a retransmission occurring every 3 seconds, retransmissions of heartbeat messages start with interval of 60 seconds. Therefore, under normal condition if a peer path failure was detected at a maximum of 9 seconds (3*3), it is now detected at 180 seconds (60*3).
-
Minimum heartbeat interval must be 60 seconds.
If 128000 peers are configured for monitoring heartbeat, then heartbeat interval must not be configured for less than 60 seconds. If the heartbeat interval is configured for less than 60 seconds, then a configuration error is displayed.
-
Minimum heartbeat retransmission timeout should be three seconds.
If 128000 peers are configured for monitoring heartbeat, then heartbeat retransmission timeout must not be configured for less than three seconds. If the heartbeat interval is configured for less than 60 seconds, then a configuration error is displayed.
-
The CLI is configured at the service level but the list is maintained at the instance level. Therefore, it is recommended that all services have the same configuration.
If services have different configuration, then the limitation is based on that service level configuration. However, the maximum number of peers is determined based on how many peers are already there in that instance.
For example, consider two services: lma1 and lma2. lma1 has the monitor-max-peers configured as 128000 peers. lma2 has monitor-max-peers configured as 256 peers. Now if the call comes from lma1, it checks the max peers limitation of 128000 peers. If the call comes from lma2, it checks max peers limitations of 256. However, for lma2 it may include all 256 peers that are being monitored in lma1.