Geo Redundancy Overview

This chapter contains the following topics:

Introduction

The chapters in this part explains the requirements and processes to install or upgrade Geo Redundancy in the Crosswork Network Controller solution.


Attention


Geo Redundancy is an optional functionality offered by Crosswork Network Controller solution. For any assistance, please contact the Cisco Customer Experience team.


The geo redundancy solution ensures business continuity in case of a region or data center failure for on-premise deployment. It adds another layer of protection in the high availability stack for Crosswork through geographical or site redundancy. Geo redundancy protects against entire site failure, reduces disruption during system upgrades, and reduces overall data loss. 

Geo redundancy involves placing physical servers in geographically diverse availability zones (AZ) or data centers (DC) to safeguard against catastrophic events and natural disasters.

Key factors

Some of the key factors that ensure geo redundancy are:

  • VM Node availability: Ensure that both the active and standby clusters are configured with the same number of virtual machines (cluster nodes, NSO, Crosswork Data Gateway, etc.) and maintain same level of network accessibility between the clusters and the network nodes.

  • Geo availability of Nodes: Physical data centers must not share any common infrastructure, such as power and network connectivity. It is recommended to place them in different availability zones (AZ) or regions to avoid a single point of failure that could impact all the VM nodes.

  • Network Availability: To keep the clusters synchronized, the network link between the data centers must meet the availability and latency requirements detailed later in this chapter.

Deployment scenarios

  • Traditional geo redundancy: Active and standby sites

  • Arbiter-based geo redundancy: Active, standby, and arbiter sites.


    Important


    You can deploy a geo-HA setup with AZ1 alone, without AZ2 or the arbiter node (AZ3). While AZ1 starts successfully in this scenario, this setup defeats the purpose of HA, which requires at least AZ1 and AZ2. Keep in mind that this configuration will significantly limit the geo-HA functionalities offered by the Crosswork Network Controller, and will result in these issues:

    • Attempting to save the cross-cluster settings triggers an error pop-up that indicates the settings failed to save on the peer cluster(s).

    • The sync and switchover buttons remain active in the drop-down menu on the cross-cluster UI window, regardless of whether peer clusters are present. However, if you press either button to initiate a job, the operation will fail with error messages indicating the absence of peer cluster(s).


Auto-arbitration in Crosswork Network Controller

Auto-arbitration is a functionality in Crosswork Network Controller that automates the switchover process between active and standby clusters during communication failures, eliminating the need for manual intervention. It introduces an arbiter VM (AZ3), which acts as a mediator to prevent split-brain scenarios and ensures the integrity of the system.

Figure 1. Cross cluster with active, standby, and arbiter AZs

Key features

  1. Split-brain scenario prevention:

    • In a geo redundant setup, communication failures can cause the standby cluster to mistakenly assume the active cluster is down, leading to both clusters becoming active (split-brain scenario). This can result in data loss if one cluster fails.

    • Auto-arbitration uses an arbiter VM and the leader election protocol to prevent such conflicts by ensuring consensus on the active cluster.

  2. Arbiter VM role:

    • The arbiter VM is deployed on a single VM with a small resource footprint.

    • It provides the essential quorum needed to form a majority vote for electing a leader among the three AZs.

    • It includes only essential infrastructure components and does not allow application installations after deployment.


    Important


    By design, the arbiter VM in the geo HA cluster is a minimal node used primarily for quorum voting. It does not host workload services like other cluster nodes. By default, only the admin user is created on the arbiter node for administrative operations. The arbiter VM does not synchronize AAA users or settings from other nodes. However, you can create and manage additional users independently on the arbiter VM if needed.


  3. Switchover automation: During a switchover (manual request or network failure), the arbiter node performs these steps.

    When auto arbitration mode is enabled, the elected cluster leader automates the three steps of a switchover, whether triggered manually or by a failure in a network node or link.

    • Updates the active cluster's role to standby.

    • Updates the standby cluster's role to active.

    • Updates the DNS FQDN records for data and management to point to the new active cluster.

    For more information, see Auto-arbitration workflow.

  4. Day 0 and day N deployments:

    • Day 0: All three clusters (active, standby, and arbiter) are deployed and configured sequentially on day-0. For more information, see Geo redundancy workflow (Day 0).

    • Day N: You can add an arbiter VM to an existing two-cluster geo-redundant model (active and standby) by reimporting the updated cross cluster inventory file, which includes new parameters for the arbiter VM. For more information, see Geo redundancy workflow (Day N).

Auto-arbitration is a critical enhancement for ensuring reliable and fail-safe operations in distributed environments, enabling seamless cluster management with minimal manual effort.