Programmable Closed-Loop Remediation

This section explains the following topics:

Overview

Objective

Detect anomalies and generate alerts that can be used for notifying an operator or triggering automation workflows.

Challenge

Discovering and repairing problems in the network usually involves manual network operator intervention and is time-consuming and error prone.

Solution

Incorporating Cisco Crosswork Change Automation and Cisco Crosswork Health Insights into Cisco Crosswork Network Controller gives service providers the ability to automate the process of discovering and remediating problems in the network by allowing an operator to match an alarm to pre-defined remediation tasks. These tasks will be performed after a defined Key Performance Indicator (KPI) threshold has been breached. Remediation can be implemented with or without the network operator’s approval, depending on the setting and preferences of the operator.

Using such closed-loop remediation reduces the time taken to discover and repair a problem while minimizing the risk of making a mistake and creating an additional error through high-stakes manual network operator intervention.

How Does it Work?

Smart Monitoring

  • The Smart Monitoring feature helps operators collect, filter, and present the data in useable formats, such as graphs and tables. Operators can remain focused on their business goals while the configuration required for the data collection is done by the Cisco Crosswork Network Controller and Cisco Crosswork Change Automation and Cisco Crosswork Health Insights using the feature Zero-touch telemetry.

  • By using a common collector to collect network device data over SNMP, CLI, and model-driven telemetry, and making it available as modelled data described in YANG, duplicate data collection is avoided, optimizing the load on both the devices and the network.

  • Recommendation Engine analyzes network device hardware and software, configuration, and employs a pre-trained model built from data mining, producing KPI relevant recommendations facilitating per use-case monitoring.

  • KPIs cover a wide range of statistics from CPU, memory, disk, layer 1/2/3 network counters, to per protocol, LPTS and ASIC statistics.

Smart Filtering

  • Cisco Crosswork Health Insights builds dynamic detection and analytics modules that allow operators to monitor and see alerts on network events based on user-defined logic (KPI).

  • Key Performance Indicators (KPIs) Alerting Logic can be:

    • Simple static thresholds (TCA); e.g., CPU load above 90 percent.

    • Moving average, standard-deviation, and percentile based, etc., e.g., CPU load above mean and staying there for five minutes.

    • Streaming jobs which provide real-time alerts or batch jobs which run periodically.

    • Customized for threshold values and visualization dashboards.

    • Customized operator-created KPIs based on business logic.

    • TCAs can be exported or integrated with other systems via HTTP, Slack, and socket interfaces.

  • KPIs are associated with dashboards, which provide real-time and historical views of the data and corresponding TCAs.

  • KPIs also provide purpose-built dashboards that go beyond raw data and provide valuable information in various infographic style charts and graphs useful for triaging and root-causing complex issues.

Smart Remediation

  • Health Insights KPIs can be associated with Cisco Crosswork Change Automation playbooks, which can be either executed manually or via auto-remediation. Remediation workflow could be used to fix the issue or collect more data from the network devices. By proactively remediating the situation, instead of resorting to ad hoc debugging and unscheduled downtime, operators can save time and money, providing better QOE to their customers.

  • Health Insights does the correlation of alerts or anomalies on the topology of the network, allowing easy visualization of the impact of events.

Scenario: Achieve predictive traffic load balancing using segment routing affinity

Scenario context

To maintain smooth and optimal traffic flow, operators need to be able to monitor traffic on the interfaces, identify errors such as CRC, watchdog, and overrun, and then reroute the traffic so that the SLA is maintained. This process can be automated using the Cisco Crosswork Network Controller with the installed Cisco Crosswork Health Insights and Cisco Crosswork Change Automation applications.

Assumptions and prerequisites

Cisco Crosswork Health Insights and Cisco Crosswork Change Automation must be installed and running.

Workflow

The following is a high-level workflow for executing this scenario:

Procedure


Step 1

Deploy Day0 ODN templates on edge nodes with dynamic path calculation delegated to SR-PCE and the ODN template configured to exclude links that are tagged with a specific affinity, such as RED affinity. ODN allows a service head-end router to automatically instantiate an SR-TE policy to a BGP next-hop when required (on-demand). The ODN template defines the required SLA using a specific color.

For information on creating an ODN template, refer to Step 1 Create an ODN template to map color to an SLA objective and constraints in Scenario: Implement and maintain SLA for an L3VPN service for SR-MPLS (using ODN).

Step 2

Create an L3VPN route policy to specify the prefixes to which the SLA applies and mark them with the same color used in the ODN template. When traffic from the specified network with a matching color is received, paths are computed based on the SLA defined in the ODN template.

For information on creating a route policy, refer to Step 1 Create an ODN template to map color to an SLA objective and constraints.

Step 3

Provision an L3VPN across the required endpoints and create an association between the VPN and the route policy. This will connect the VPN to the ODN template that defines the SLA.

For information on provisioning an L3VPN, refer to Step 3 Create and provision the L3VPN service.

Step 4

Define and enable the KPIs on the devices. This will continuously monitor the uplink interfaces on the L3VPN endpoints.

For information on defining KPIs, see Crosswork Network Controller 7.0 Closed-Loop Network Automation.

Step 5

When there is an error on monitored interfaces, mark the dirty link with RED affinity so that it will be excluded based on the specifications of the ODN template. This is achieved by creating a custom playbook. The Cisco Crosswork Network Controller identifies the interface where an error has occurred and generates an alert. This information is then used in a custom playbook to push the affinity configuration to the relevant router, creating a closed-loop automation process. This helps ensure that the customer does not experience any outages.

For information on defining playbooks, see the Crosswork Network Controller 7.0 Closed-Loop Network Automation.

Step 6

Crosswork Network Controller will continue monitoring the link, and once there are no longer any alerts, the RED affinity tag can be removed. Another playbook should be defined for this purpose.