Geo Redundancy Requirements

This chapter contains the following topics:

Crosswork Cluster Requirements

Geo redundancy solution requires double the number of VMs required for a regular Crosswork Cluster installation. For more information, see Installation Prerequisites for VMware vCenter.

Important Notes

  • To ensure synchronization between the clusters, the network connection between the data centers should be set up with a minimum bandwidth of 5 Gbps and a latency of less than 100 milliseconds.

  • While preparing a geo inventory file, you must include details of cluster constituents along with connectivity information.

  • Configure the DNS server to resolve the unified multi-cluster FQDN (data and management) domain (for example, *.cw.cisco) that you want to use. The DNS server must be reachable from both clusters, the Crosswork Data Gateway, NSO, and SR-PCE. For more information on the DNS setup procedure, see the Cisco Prime Network Registrar Caching and Authoritative DNS User Guide.

  • The DNS server should forward any outside domains to the external DNS servers.

  • In case of dual stack deployments, Crosswork cluster and Crosswork Data Gateway deployments should be configured in dual stack mode, with the geo FQDN (data and management) pointing to the dual stack IP addresses.

  • You should sequentially bring up the active cluster, standby cluster, and arbiter VM using the existing installer mechanism. Ensure you use the previously identified DNS server during the installation of the Crosswork cluster, Crosswork Data Gateway, and NSO. It is recommended to have multiple DNS servers, with one in each data center.

  • Applications should be installed only after enabling geo redundancy and completing the initial on-demand synchronization. Begin by installing applications on the active cluster, followed by the standby cluster. Other configuration information (such as devices, providers, or destinations) must be onboarded only on the active cluster and will be synchronized between the clusters as part of the activation process.

  • You should ensure that the Day 0 inventory is onboarded before enabling geo redundancy.

  • The NSO in HA mode requires provider connectivity onboarding via an L3 VIP or FQDN in the Crosswork Network Controller.

  • Before enabling geo-redundancy mode, you are recommended to make a backup of the active and standby cluster.

  • An arbiter VM requires a resource footprint of 8 vCPUs, 48 GB of RAM, and 650 GB of storage. For installation instructions, see Deploy an arbiter VM.


Warning


Once the geo-redundancy mode is set up, it cannot be undone because the certificates are regenerated using common root CA. To revert to non-geo redundancy mode, you must restore the backup made before enabling the geo redundancy mode.


Port requirements

Port management best practices

As a general best practice, any unused ports should be disabled. To identify all open listening ports after installing and activating the applications, log in as a Linux CLI admin user on any Crosswork cluster VM and run the netstat -aln command:


Note


All IP addresses, including Virtual IP addresses) between Crosswork Cluster, Crosswork applications, and Crosswork Data Gateway must be reachable (to be pinged to/from) between each other.


Additional ports for Crosswork Network Controller cluster

The additional ports, 30180 and 30190, are outside the normal Crosswork cluster deployment. Both ports are used for database replication (PQ binary protocol) and are secured by SSL.

Required TCP or UDP ports

The TCP or UDP port numbers must be allowed through any external firewall or access control list (ACL) deployed by the data center administrator. Depending on the NIC deployment, these ports may be applicable to only one or both NICs.

Traffic types and ports for Crosswork communication with Data Gateway

Figure 1. Communication flow between Crosswork Network Controller and different components

The key processes involved in the replication and accessibility between the sites:

  • The Crosswork VM intercluster checks are performed, with replication happening over the eth0 management links between both sites.

  • NSO replication occurs over the management links between the sites.

  • Crosswork cluster replication is asynchronous, while NSO replication is synchronous.

  • Devices and the active or standby Crosswork cluster at each site must be able to reach the Crosswork Data Gateway.

  • The Crosswork Network Controller NBI is accessible through its DNS FQDN.

Table 1. Two NICs Crosswork communication with Crosswork Data Gateway: Comparison of traffic types across sites

Traffic Type

Ports Used

Site 1 (Active)

Site 2 (Standby)

Site 3 (Arbiter)

The eth0 interfaces on both the Active and Standby Crosswork devices exchange bidirectional traffic with each other.

30651, 30652, 30653, 30180, 30190, 22, 30603, 30649, and 30650.

The ports are used for the following purposes:

  • 30651: used for external east/west gRPC server communication..

  • 30653: used for Raft node to peer node communication.

  • 22 and 30603: used as TCO ports for intersite communication where 22 for SSH and 30603 for the HTTPS web server. Ensure that these ports are accessible over the eth0 interface on all sites, allowing each site to communicate with every other site through eth0.

  • 30649 and 30650: used to facilitate communication between internal services across sites.

Crosswork eth0

Crosswork eth0

Crosswork eth0

The Arbiter node uses only ports 30651 and 30653.

The eth0 interface on Crosswork Data Gateway (Active and Standby) and eth0 on Crosswork (Active) exchange bidirectional traffic with each other.

The ports are used for the following purposes:

  • 30607: REST API calls to download the collector images, system resource, and custom resource bundle.

  • 30607: Sending the Crosswork Data Gateway status information to Crosswork.

  • 30608: Handling the gRPC communication to send Crosswork Data Gateway’s heartbeat.

Crosswork Data Gateway eth0 (Management)

Crosswork Data Gateway eth0 (Management)

Not applicable

The eth1 interface on Crosswork Data Gateway (Active) and eth1 Crosswork (Active) exchange bidirectional traffic with each other.

The ports are used for these purposes:

  • 30607: Handling the gRPC communication with Crosswork's Magellan service.

  • 30993, 30994, and 30995: Kafka ports that are used to distribute data across Crosswork's Kafka or gRPC destinations.

Crosswork Data Gateway eth1 (Northbound Data)

Crosswork Data Gateway eth1 (Northbound Data)

Not applicable

The eth1 interface on Crosswork Data Gateway (Active) and eth1 Crosswork (Active) exchange bidirectional traffic with each other.

The 30993, 30994, and 30995 ports are used for distributing Icon's data to the standby Crosswork's Kafka.

Crosswork Data Gateway eth1 (Northbound Data)

Crosswork Data Gateway eth1 (Northbound Data)

Not applicable

The eth1 interface on Crosswork Data Gateway (Active) and device exchange bidirectional traffic with each other.

The ports are used for these purposes:

  • 1062: SNMP trap.

  • 9514, 9898, and 6514: Syslog UDP, TCP, and TLS.

Crosswork Data Gateway eth1 (Southbound Data)

Crosswork Data Gateway eth1 (Southbound Data)

Not applicable

Table 2. Two NICs Crosswork communication with three NIC Crosswork Data Gateway: Comparison of traffic types across sites

Traffic Type

Ports Used

Site 1 (Active)

Site 2 (Standby)

Site 3 (Arbiter)

The eth0 interfaces on both the Active and Standby Crosswork devices exchange bidirectional traffic with each other.

30651, 30652, 30180, 30653, 30190, 22, and 30603.

The TCP ports are used for these purposes:

22 and 30603: used to facilitate intersite communication where 22 for SSH and 30603 for the HTTPS web server. Ensure that these ports are accessible over the eth0 interface on all sites, allowing each site to communicate with every other site through eth0.

30649 and 30650: used to facilitate communication between internal services across sites.

Crosswork eth0

Crosswork eth0

Crosswork eth0

The Arbiter node uses only 30651 and 30653.

The eth0 interface on Crosswork Data Gateway (Active and Standby) and eth0 on Crosswork (Active) exchange bidirectional traffic with each other.

The ports are used for these purposes:

  • 30607: REST API calls to download the collector images, system resource, and custom resource bundle.

  • 30607: Sending the Crosswork Data Gateway vitals to Crosswork.

  • 30608: Handling the gRPC communication to send Crosswork Data Gateway’s heartbeat.

Crosswork Data Gateway eth0 (Management)

Crosswork Data Gateway eth0 (Management)

Not applicable

The eth1 interface on Crosswork Data Gateway (Active) and eth1 Crosswork (Active) exchange bidirectional traffic with each other.

The ports are used for these purposes:

  • 30607: Handling the gRPC communication with Crosswork's Magellan service.

  • 30993, 30994, and 30995: Kafka ports that are used to distribute data across Crosswork's Kafka or gRPC destinations.

Crosswork Data Gateway eth1

(Northbound Data)

Crosswork Data Gateway eth1

(Northbound and Southbound Data)

Not applicable

The eth1 interface on Crosswork Data Gateway (Active) and Crosswork (Active) exchange bidirectional traffic with each other.

The 30993, 30994, and 30995 ports are used for distributing Icon's data to the standby Crosswork's Kafka.

Crosswork Data Gateway eth1

(Northbound Data)

Crosswork Data Gateway eth1

(Northbound and Southbound Data)

Not applicable

The eth2 interface on Crosswork Data Gateway (Active) and device exchange bidirectional traffic with each other.

The ports are used for these purposes:

  • 1062: SNMP trap port.

  • 9514, 9898, and 6514: Syslog UDP, TCP, and TLS ports.

Crosswork Data Gateway eth1 (Northbound Data)

Crosswork Data Gateway eth1

(Northbound and Southbound Data)

Not applicable

Unified Endpoint Requirements

  • Unified endpoint hides multiple instances in high availability for various components.

  • DNS allows endpoints to be referred via FQDN, which should point to the active instance IP (data and management).

  • Domain zone provisioning is needed where Crosswork components have data and management VIP mapped to FQDN.

  • The DNS authoritative server must have an A or AAAA entry for the IP address from you for the domain zone dedicated to Crosswork components.

Data Store Roles

Following roles are applicable for both Geo HA or non-Geo HA deployments when Postgres or Timeseries data stores are operational:

  • Leader: should be in running status.

  • Replica: indicates a regular standby.

  • Sync Standby: similar to standby replica, but applicable when synchronous mode is ON in postgresql.conf.

For robot-postgres, when synchronous mode is ON, the replica is assigned the Sync Standby role. For cw-timeseries-db, when synchronous mode is OFF, the replica retains the Replica role. The state will be streaming on the respective replica to indicate that local replication is occurring correctly within the PG cluster.

Additionally, in Geo HA deployments, when Cross Cluster replication is enabled and the standby cluster is streaming from a remote active cluster, its leader will assume the Standby Leader role, and the status will be streaming.

Sync operation

One of the initial steps in a synchronization operation cycle involves setting up asynchronous data replication for local Postgres and Timescale data stores. The system first verifies the health status of the data store. It then aligns the Cross Cluster state and data store state to ensure consistency (for example, if the service cw-timeseries-db-0 is already active for Timescale DB, the system will assign the ACTIVE_REPLICATION_ROLE to the Timescale data store). After setting the role for the active side, the system will replicate this process for the standby side. Once asynchronous replication is complete, the system initiates backup and restore operations. These operations can also be monitored in the Backup and restore jobs window.

During a sync, the data stores in active and standby clusters have some expected roles. After the sync, the data store roles are verified to confirm a successful sync. The active cluster should contain one Leader with running status. See the example below of the Postgres and Timescale data stores on an active cluster.

Figure 2. Roles in an active cluster during sync

Similarly, the standby cluster should contain one Standby Leader with streaming status. See the example below of the Postgres and Timescale data stores on a standby cluster.

Figure 3. Roles in a standby cluster during sync