Manage the Crosswork Cluster

This section contains the following topics:

Cluster Management Overview

The Cisco Crosswork platform uses a cluster architecture. The cluster distributes platform services across a unified group of virtual machine (VM) hosts, called nodes. The underlying software architecture distributes processing and traffic loads across the nodes automatically and dynamically. This architecture helps Cisco Crosswork respond to how you actually use the system, allowing it to perform in a scalable, highly available, and extensible manner.

A single Crosswork cluster consists of a minimum of three nodes, all operating in a hybrid configuration. These three hybrid nodes are mandatory for all Cisco Crosswork deployments. If you have more demanding scale requirements, you can add up to two worker nodes. For more information, see Deploy New Cluster Nodes.

Only users assigned to the admin role or a role with proper permissions will have access to all of the cluster configuration.

Table 1. Cluster Overview

Action

Description

Navigation

Use the Crosswork Manager window to check the health of the cluster. To display this window, from the main menu, choose Administration > Crosswork Manager.

Crosswork Summary window

The Crosswork Manager window gives you summary information about the status of the nodes, the Platform Infrastructure, and the applications you have installed.

Figure 1. Crosswork Manager (cluster deployment)
Crosswork Manager Window

Cluster Management window

Note

 

Applicable only when Crosswork is deployed as a cluster.

Click on the System Summary tile to see the details of the nodes in the cluster.

Figure 2. Cluster Management Window
Cluster Management window

The top left section of the window provides details about the cluster while the top right provides details about overall cluster resource consumption. The bottom section breaks down the resource utilization by node, with a separate detail tile for each node. The window shows other details, including the IP addresses in use, whether each node is a hybrid or worker, and so on.

Note

 

When the Crosswork Network Controller is deployed on AWS EC2, the VM status initially appears as "unknown" by default. If you update the inventory file, the status changes to "initializing." This behavior is normal for EC2 deployments.

On the top-right corner, click the View more visualizations link to Visually Monitor System Functions in Real Time.

To see details for a specific node, click on the tile of the node, and choose View Details. The VM Node window displays the node details, including the list of components, microservices, and alarms running on the node.

  • To request metrics or logs, click under the Action column, and select the relevant option.

  • To restart a microservice, click under the Action column, and choose Restart.

Figure 3. Node details
VM Node Details window

For information on how to use the Crosswork Health tab, see Monitor Platform Infrastructure and Application Health.

Single VM based Crosswork Network Controller

Starting with 7.0 release, Crosswork can also be deployed on a single VM that delivers all the functionality with a limited capacity for devices. When deployed as a single VM, all the functions run on a single machine with limited redundancy.

Table 2. Single VM overview

Action

Description

Navigation

Use the Crosswork Manager window to check the health of the cluster. To display this window, from the main menu, choose Administration > Crosswork Manager.

Crosswork Summary window

The Crosswork Manager window gives you summary information about the status of the nodes, the Platform Infrastructure, and the applications you have installed.

Figure 4. Crosswork Summary (single VM deployment)
Crosswork Summary

System Summary window

Note

 

Applicable only when Crosswork is deployed on a single VM.

Click on the System Summary tile to see the VM details.

Figure 5. VM details
VM details

Note


  • If one of the hybrid nodes is faulty, along with one or more worker nodes and applications, try the Clean System Reboot procedure described in Cluster System Recovery.

  • If more than one hybrid node is faulty, follow the Redeploy and Recover procedure described in Cluster System Recovery.

  • On the Cluster Management window, it is normal to see deviation on the last_updated_time across the nodes in the cluster based on when the data was updated. This is an expected behavior.


View and Edit Data Center Credentials

This section explains the procedure to view and edit the credentials for the data center (such as VMware vCenter) where Cisco Crosswork is deployed.

Before you begin

Ensure you have the current credentials for vCenter.


Note


In case you have changed your password since Crosswork was originally deployed, you may need to update the stored credentials that Crosswork will use when deploying the new VM.


Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Step 3

Choose Actions > View/Edit Data Center to display the Edit Data Center window.

The Edit Data Center window displays details of the data center.

Step 4

Use the Edit Data Center window to enter values for the Access fields: Address, Username, and Password.

Step 5

Click Save to save the data center credential changes.


Import Cluster Inventory

If you have installed your cluster manually using the vCenter UI (without the help of cluster installer tool), you must import an inventory file (.tfvars file) to Cisco Crosswork to reflect the details of your cluster. The inventory file contains information about the VMs in your cluster along with the data center parameters.


Attention


Crosswork cannot deploy or remove VM nodes in your cluster until you complete this operation.



Note


Please uncomment the "OP_Status" parameter while importing the cluster inventory file manually. If you fail to do this, the status of the VM will incorrectly appear as "Initializing" even after the VM becomes functional. 


Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Step 3

Choose Actions > Import Cluster Inventory to display the Import Cluster Inventory dialog box.

Step 4

(Optional) Click Download sample template file to download and edit the template. For more details on the installation parameters, see the Installation Parameters section in the Crosswork Network Controller 7.0 Installation Guide.

Step 5

Click Browse and select the cluster inventory file.

Step 6

Click Import to complete the operation.


Deploy New Cluster Nodes

As your network expands and you install additional Crosswork applications, it may become necessary to add more resources to handle the increasing workload. This topic explains how to deploy a new VM node.

The steps necessary to deploy a new node via the UI and the API are essentially the same. For details on using the API, see cluster APIs. This guide will only present the procedure for using the UI.


Important


If you installed your cluster manually, you must import the cluster inventory file to Cisco Crosswork before you can deploy a new node. For more information, see Import Cluster Inventory. The Deploy VM option will be disabled until you complete the import operation.


Before you begin

You must know the following:

  • Details about the Cisco Crosswork network configuration, such as the management IP address.

  • Details about the VMware host where you are deploying the new node, such as the data store and data VM interface IP address.

  • The type of node you want to add. Your cluster can have a minimum of three hybrid nodes and up to two worker nodes.

Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Note

 

The Crosswork Summary window and the Cluster Management window display information about your cluster. While both windows display the status of the same cluster, there may be slight mismatches in the representation. This occurs because the Crosswork Summary window displays the node status based on Kubernetes, while the Cluster Management window also considers the node status in the data center.

An example of this mismatch is when a worker node deployment fails in the Crosswork UI due to insufficient data center resources. In this case, the status of the failed worker node is displayed as "degraded" in the Cluster Management window, while the same status appears as "down" in the Crosswork Summary window.

Step 3

Choose Actions > Deploy VM to display the Deploy VM Node window.

Figure 6. Deploy VM Node Window
Deploy VM Node

Step 4

Fill the relevant values in the fields provided.

Step 5

Click Deploy. The system starts to provision the new node in VMware. Cisco Crosswork adds a tile for the new node in the Crosswork Manager window. The tile displays the progress of the deployment.

You can monitor the node deployment status by choosing Cluster Management > Actions > View Job History, or from the VMware user interface.

If you have added the VM node using Cisco Crosswork APIs: On the newly added VM node tile, click and choose Deploy to complete the operation.

Step 6

If this node was added to reduce the heavy load (running > 90%) on the existing nodes, you can rebalance the resources (see Rebalance Cluster Resources for details), or restart some processes to force the system to move them to the newly added node.


Rebalance Cluster Resources

As part of cluster management, Crosswork constantly monitors the resource utilization in each cluster node. If the CPU utilization in any of the nodes becomes high, Crosswork triggers a notification prompting you to take action. The alarm system has two levels. The first alarm triggers at around 70-80% usage, prompting you to plan for adding worker nodes (see Deploy New Cluster Nodes). Ideally, new nodes should be deployed before usage exceeds 90% to avoid performance issues. If you already have 5 or 6 nodes and still face resource shortages, please contact the Cisco Customer Experience team.

You can then use the Rebalance feature to reallocate the resources between the existing VM nodes in your cluster. Rebalancing is necessary if some nodes are busier than others. When a new worker is added, active workloads are not automatically moved to it, so rebalancing is required.


Caution


Rebalancing can take from 15 to 30 minutes during which the Crosswork Applications will be unavailable. Once initiated, a rebalance operation cannot be canceled.


Before you begin

  • Crosswork must be in maintenance mode before rebalancing to ensure data integrity.

  • Any users logged in during the rebalancing will lose their sessions. Notify other users beforehand that you intend to put the system in maintenance mode for rebalancing, and give them time to log out. You can use the Active Sessions window (Administration > Users and Roles > Active sessions tab) to see who is currently logged in (or sessions that were abandoned and have not been cleaned up yet).

Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

For the sake of this procedure, a sample cluster (day0-control) with 3 hybrid nodes and 1 worker node is considered. The CPU utilization is high in one of the hybrid nodes (100% in cw-tb2-cluster-01). See the below image for more details.

The image below shows a cluster with four nodes and a banner recommending the user to add more worker nodes.

Figure 7. Rebalance notification

On the tile for the node, you can click and choose View Details to see more details.

Step 3

Click Rebalance, and the Rebalance Requirements are displayed. Read through the requirements and select the two check boxes once you are ready to start the rebalancing.

Figure 8. Rebalancing Requirements

Step 4

Click Rebalance to initiate the process. Crosswork begins to reallocate the resources in the over utilized VM node to the other nodes in the cluster.

A dialog box indicating the status of rebalancing is displayed. Kindly wait for the process to complete.

Figure 9. Rebalancing Status

Step 5

After the rebalancing process is completed, you may see one of the following result scenarios:

  • Success scenario: A dialog box indicating successful rebalancing operation. Follow the instructions in the dialog box to proceed further.

    Figure 10. Rebalancing Result - Success
  • Failure scenario - scope available to add new worker nodes: A dialog box indicating rebalancing failure is displayed. In this case, the system prompts you to add a new worker node and try the rebalance process again.

    Figure 11. Rebalancing Result - Add new Worker node
  • Failure scenario - no scope to add new worker nodes: A dialog box indicating rebalancing failure is displayed. In this case, the system prompts you to contact the TAC as new worker nodes cannot be added.

    Figure 12. Rebalancing Result - Add new Worker node

View Job History

Use the Job History window to track the status of jobs, such as deploying a VM or importing cluster inventory.

Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Step 3

Choose Actions > View Job History.

The Job History window displays a list of cluster jobs. You can filter or sort the Jobs list using the fields provided: Status, Job ID, VM ID, Action, and Users.

Step 4

Click any job to view it in the Job Details panel at the right.


Export Cluster Inventory

Use the cluster inventory file to monitor and manage your Cisco Crosswork cluster.

Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Step 3

Choose Actions > Export Cluster Inventory.

Cisco Crosswork downloads the cluster inventory gzip file to your local directory.


Retry Failed Nodes

Node deployments with incorrect information can fail. After providing the correct details, you can retry the deployment.

Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Figure 13. Cluster Management Window: Failed VM Deployment
Retry Failed Nodes

Step 3

Click Retry on the failed node tile to display the Deploy New Node window.

Step 4

Provide corrected information in the fields provided.

Step 5

Click Deploy.


Erase Nodes

As an administrator, you can erase (that is, remove or delete) any failed or healthy node from the Cisco Crosswork cluster. Erasing a node removes the node reference from the Cisco Crosswork cluster and deletes it from the host VM.

The steps to erase a node are the same for both hybrid and worker nodes. However, the number and timing of erasure is different in each case:

  • The system must maintain three operational hybrid nodes at all times. If one of the hybrid nodes stops functioning, Crosswork will attempt to compensate, however the system performance and protection against further failures will be severely impacted. In such cases, the faulty node is erased and a new hybrid node needs to be deployed to replace it.

  • You can have up to two worker nodes. While you can erase all of them without consequences, we recommend that you erase and replace them one at a time.

  • If you are still having trouble after taking these steps, contact the Cisco Customer Experience team for assistance.


Warning


  • Erasing a node is a disruptive action and can block some processes until the action is completed. To minimize disruption, conduct this activity during a maintenance window only.

  • Removing worker and hybrid nodes places extra workload on the remaining nodes and can impact system performance. You are encouraged to contact the Cisco Cisco Customer Experience team before removing nodes.

  • While removing a Hybrid or Worker node, the Cisco Crosswork UI may become unreachable for 1-2 minutes, due to the relocation of the robot-ui pod to a new node.



Note


For manual cluster installation, you must erase the VM from Crosswork UI and then delete the VM from the data center (e.g. vCenter).


Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Step 3

On the tile for the node you want to remove, click and select Erase to display the Erase VM Node dialog box .

Step 4

Click Erase again to confirm the action.

Note

 

A removed node will continue to be visible in the Grafana dashboard as an entry with only historical data.


Manage Maintenance Mode Settings

Maintenance mode provides a means for shutting down the Crosswork system temporarily. The maintenance mode shutdown is graceful. Crosswork synchronizes all application data before the shutdown.

It can take several minutes for the system to enter maintenance mode and to restart when maintenance mode is turned off. During these periods, users should not attempt to log in or use the Crosswork applications.

Before you begin


Attention


  • Make a backup of your Crosswork cluster before enabling the maintenance mode.

  • Notify other users that you intend to put the system in maintenance mode and give them a deadline to log out. The maintenance mode operation cannot be canceled once you initiate it.


Procedure


Step 1

To put Crosswork in maintenance mode:

  1. From the main menu, choose Administration > Settings > System Settings > Maintenance Mode.

  2. Drag the Maintenance slider to the right, or On position.

  3. Crosswork warns you that it is about to initiate a shutdown. Click Continue to confirm your choice.

    It can take several minutes for the system to enter maintenance mode. During that period, other users should not attempt to log in or use the Crosswork applications.

    Note

     

    If you wish to reboot the cluster, wait for 5 minutes after system has entered maintenance mode in order to allow the Cisco Crosswork database to sync, before proceeding.

Step 2

To restart Crosswork from maintenance mode:

  1. From the main menu, choose Administration > Settings > System Settings > Maintenance Mode.

  2. Drag the Maintenance slider to the left, or Off position.

    It can take several minutes for the system to restart. During this period, users should not attempt to log in or use the Crosswork applications.

    Note

     

    If a reboot or restore was performed when the system was previously put in maintenance mode, the system will boot up in the maintenance mode and you will be prompted with a popup window to toggle the maintenance mode off. If you do not see a prompt (even when the system was rebooted while in maintenance mode), you must toggle the maintenance mode on and off to allow the applications to function normally.


Cluster System Recovery

Before you Begin

  • For cluster recovery, it is essential to have a recent backup.

  • The cluster you are restoring should have the same operational architecture, including the same number of hybrid and worker nodes.

When System Recovery Is Needed


Caution


The methods explained in this topic may fail if you use a cluster profile consisting of only 3 hybrid VM nodes (and no worker nodes). The failure happens due to the lack of VM resiliency caused by the absence of worker nodes.


At some time during normal operations of your Cisco Crosswork cluster, you may find that you need to recover the entire system. This can be the result of one or more malfunctioning nodes, one or more malfunctioning services or applications, or a disaster that destroys the hosts for the entire cluster.

A functional cluster requires a minimum of three hybrid nodes. These hybrid nodes share the processing and traffic loads imposed by the core Cisco Crosswork management, orchestration, and infrastructure services. The hybrid nodes are highly available and able to redistribute processing loads among themselves, and to worker nodes, automatically.

The cluster can tolerate one hybrid node reboot (whether graceful or ungraceful). During the hybrid node reboot, the system is still functional, but degraded from an availability point of view. The system can tolerate any number of failed worker nodes, but again, system availability is degraded until the worker nodes are restored.

Cisco Crosswork generates alarms when nodes, applications, or services are malfunctioning. If you are experiencing system faults, examine the alarm and check the health of the individual node, application, or service identified in the alarm. You can use the features described in Cluster Management Overview to drill down on the source of the problem and, if it turns out to be a service fault, restart the problem service.

If you see alarms indicating that one hybrid node has failed, or that one hybrid node and one or more worker nodes have failed, start by attempting to reboot or replace (erase and then readd) the failed nodes. If you are still having trouble after that, consider performing a clean system reboot.

The loss of two or more hybrid nodes is a double fault. Even if you replace or reboot the failed hybrid nodes, there is no guarantee that the system will recover correctly. There may also be cases where the entire system has degraded to a bad state. For such states, you can deploy a new cluster, and then recover the entire system using a recent backup taken from the old cluster.


Important


  • Unintentional VM shutdown is not supported on a 3 VM cluster that is running the Crosswork Network Controller solution. If a VM fails, the remaining two VMs cannot support all the pods being migrated from the failed VM. You must deploy additional worker nodes to enable the VM shutdown.

  • Reboot of one of the VMs is supported in a 3 VM cluster. In case of a reboot, the VM restore can take from 5 minutes (if the orch pod is not running in the rebooted VM) up to 25 minutes (if the orch pod is running in the rebooted VM).


The following two sections describe the steps to follow in each case.

Clean System Reboot (VMware)

Follow these steps to perform a clean system reboot:

  1. Put Crosswork in Maintenance mode. See Manage Maintenance Mode Settings for more details.


    Note


    (Optional) Before switching to maintenance mode, shut down the Crosswork Data Gateways and any other non-essential components (such as NSO and SR-PCE) that communicate with Crosswork.


  2. Power down the VM hosting each node:

    1. Log in to the VMware vSphere Web Client.

    2. In the Navigator pane, right-click the VM that you want to shut down.

    3. Choose Power > Power Off.

    4. Wait for the VM status to change to Off.

  3. Repeat Step 2 for each of the remaining VMs, until all the VMs are shut down.

  4. Power up the VM hosting the first of your hybrid nodes:

    1. In the Navigator pane, right-click the VM that you want to power up.

    2. Choose Power > Power Up.

    3. Wait for the VM status to change to On, then wait another 30 seconds before continuing.

  5. Repeat Step 4 for each of the remaining hybrid nodes, staggering the reboot by 30 seconds before continuing. Then continue with each of your worker nodes, again staggering the reboot by 30 seconds.

  6. The time taken for all the VMs to be powered on can vary based on the performance characteristics of your hardware. After all VMs are powered on, wait for a few minutes and login to Crosswork.

  7. Move Crosswork out of Maintenance mode. See Manage Maintenance Mode Settings for more details.


    Note


    If your Crosswork cluster is not in a healthy state, attempts to force maintenance mode will likely fail. Despite a successful attempt, application sync issues may still happen. In such cases, alarms will be generated indicating the list of failed services and the failure reason. If you face this scenario, you may still proceed with the "Redeploy and Restore" method mentioned below.


  8. Restart the Crosswork Data Gateways and any other components in your ecosystem that communicate with Crosswork.

Redeploy and Restore (VMware)

Follow these steps to redeploy and recover your system from a backup. Note that this method assumes you have taken periodic backups of your system before it needed recovery. For information on how to take backups, see Manage Crosswork Network Controller Backup and Restore.

  1. Power down the VM hosting each node:

    1. Log in to the VMware vSphere Web Client.

    2. In the Navigator pane, right-click the VM that you want to shut down.

    3. Choose Power > Power Off.

    4. Wait for the VM status to change to Off.

    5. Repeat these steps as needed for the remaining nodes in the cluster.

  2. Once all the VMs are powered down, delete them:

    1. In the VMware vSphere Web Client Navigator pane, right-click the VM that you want to delete.

    2. Choose Delete from Disk.

    3. Wait for the VM status to change to Deleted.

    4. Repeat these steps as needed for the remaining VM nodes in the cluster.

  3. Deploy a new Cisco Crosswork cluster, as explained in Cisco Crosswork Network Controller 7.0 Installation Guide.

  4. Recover the system state to the newly deployed cluster, as explained in Restore Crosswork Network Controller After a Disaster.

Collect Cluster Logs and Metrics

As an administrator, you can monitor or audit the components of your Cisco Crosswork cluster by collecting periodic logs and metrics for each cluster component. These components include the cluster as a whole, individual node in the cluster, and the microservices running on each of the nodes.

Crosswork Network Controller provides logs and metrics using the following showtech options:

  • Request All to collect both logs and metrics.

  • Request Metrics to collect only metrics.

  • Collect Logs to collect only logs.

  • View Showtech Jobs to view all showtech jobs.


    Note


    Showtech logs must be collected separately for each application.


Procedure


Step 1

From the main menu, choose Administration > Crosswork Manager.

Step 2

On the Crosswork Summary tab, click the System Summary tile to display the Cluster Management window.

Step 3

To collect logs and metrics for the cluster, click Actions and select the showtech option that you want to perform.

Step 4

To collect logs and metrics for any node in the cluster:

  1. Click the node tile.

  2. Click Showtech Options and select the operation that you want to perform.

Step 5

To collect logs and metrics for the individual microservices running on the VM node, click under the Actions column. Then select the showtech option that you want to perform.

Step 6

Click View Showtech Jobs to view the status of your showtech jobs. The Showtech Requests window displays the details of the showtech jobs.

  1. Under the Actions column, click , and select Publish to publish the showtech logs. The Publish Details dialog box is displayed. Enter the relevant details and click Publish.

  2. Under the Actions column, click , and select Delete to delete the showtech log.

  3. In the Showtech Requests window, click Details to view details of the showtech log publishing.