Monitor Network Health and KPIs

This section contains the following topics:

Health Insights Overview

Health Insights is a network health application that performs real-time key performance indicator (KPI) monitoring, alerting, and troubleshooting. The Health Insights application enables programmable monitoring and analytics. With Health Insights, network operators can create a platform to dynamically address changes to the network infrastructure. Health Insights builds dynamic detection and analytics modules that allow operators to monitor and alert network events with user-defined logic. Health Insights uses the Device Management component to bring devices on board. For more information, see Inventory.

Health Insights provides prebuilt KPIs that are based on model-driven and SNMP-based telemetry. The Health Insights Recommendation Engine uses data mining to analyze your network and then recommends which telemetry paths you should enable and monitor.


Note

For the recommendation engine to work in Health Insights, the reachability should be established between Cisco Crosswork Change Automation and Health Insights and the device. When a device is onboarded, Cisco Crosswork Change Automation and Health Insights installs an explicit static route pointing to the data network gateway on the device subnet. You need to ensure that direct connectivity is established between Cisco Crosswork Change Automation and Health Insights and the device.


The following high-level example shows how Health Insights interacts with the other Cisco Crosswork Network Automation components:

  1. Health Insights detects an anomaly: The optical bit error rate that you are monitoring on each of the links in your network suddenly increases.

  2. Change Automation Playbooks automate remediation: Switch to the backup link immediately. Restore service. Open an RMA ticket (manually initiated by the user). Alert the network engineer.

    Any network remediation can be orchestrated via Change Automation Playbooks, which closes the loop on problem detection and resolution.

Health Insights Alert Dashboard

The Health Insights alert dashboard provides device health summary information that is based on real-time network state events. The dashboard displays a network view of KPI sensors that are paired to specific device groups. Health Insights raises customizable events and alerts that are based on user-defined logic.


Note

Alert dashboard displays individual KPI alerts, even though the mechanism of enabling KPI on a device is done through a KPI profile.


To display the Health Insights dashboard, choose Health Insights > Alert Dashboard from the main menu.

Health Insights Alert Dashboard
Item Description

1

Device/KPI Alert Selector: Click here to toggle between device alert and KPI alert information.

2

Alerts History: This dashlet shows the total number of device alerts or KPI alerts that have been raised during the chosen time period, with detailed time lines showing both individual sets of alerts and the overall alert trend.

3

Alerts Trend Line: This line shows the overall trend in alerts for the chosen time period. You can use the Alerts Trend Line to select and zoom in on a specific time period within the Alerts History Line, as follows:

  1. Click the time-period starting point in the Alerts Trend Line and hold down the mouse.

  2. Drag the cursor to the endpoint and then release the mouse.

The time range you selected is indicated by light gray shading on the Alerts Trend Line, with + and - zoom icons shown above the Alerts History Line. Click the + icon to zoom in on the time range you selected. Click the - to zoom out. To restore the full view of the Alerts History Line, click on any point outside of the light gray shading on the Alerts Trend Line.

4

Top 20 Impacted Devices/ Top 20 Impacted KPIs: When selected, this dashlet displays a map of tiles, each tile representing one of the 20 devices or KPIs with the most alerts during the selected time period. The amount of space that each tile occupies in the map corresponds to the number of alerts raised: the more alerts, the bigger the tile. To view more detailed information for a particular device or KPI, click the device or KPI name link in the center of the tile.

5

Filter By Tags: This field lets you filter the alert dashboard information by associated tag names. To select a tag, do one of the following:

  • If you know the tag that you want to use, enter it in the Type to filter by Tags field and then check its check box. Repeat this step to select more tags.

  • If you want to select a tag from the tags that are currently available:

    1. In the Type to filter by Tags field, type any character to open the results list.

    2. Click the View All Tags link at the bottom of the list.

    3. Check the check box for each tag you want to use and then click Apply Filters.

    4. Delete the character that you typed in Step 1 to clear the results list.

Tag filters you create are not saved. If you open another window and then return to the alert dashboard, you will need to re-create tag filters.

6

All Impacted Devices/All Impacted KPIs: When selected, this dashlet provides a complete list of all devices or KPIs affected by alerts. The information for each affected device or KPI includes:

  • Device Name or KPI Name

  • Device or KPI Type

  • IP address: The IP address of the impacted device. This column is only displayed for devices.

  • Alert count: The total number of alerts for that device or KPI during the selected period.

  • Impact score—This value is determined using the following formula: (4 x number of critical alerts) + (3 x number of major alerts) + (2 x number of minor alerts) + number of warning alerts. When monitoring the health of your network, focus on devices or KPIs with a higher impact score.

  • Severity distribution—Provides a visual breakdown of the severity that is associated with a device or KPI's alerts. To view a tooltip that indicates the number of raised alarms (by severity and in total), place your cursor over the appropriate bar segment.

7

Alerts History: The Alerts History line shows alerts as discrete bar indicators whose height represents the total number of alerts gathered at each point in time. To see the total for each type of alert, hover your mouse cursor over the bar indicator. You can also use the Alerts Trend line to zoom in on particular portions of the alert history.

8

Time Period: Specifies the time period for which the dashboard provides alert information: The past one hour, past day, past week, and so on. Please note that the dashboard provides alert information only, not telemetry information.

9

Severity Legend: Maps the bar indicator colors that are used in the Alert History dashlet to the corresponding alert severity. To display or hide the alerts for a particular severity, click the circle representing that severity. A filled circle indicates that alerts of that severity have been raised and are being displayed. An empty circle indicates that alerts of that severity are either not being displayed or have not been raised during the displayed time period.

10

Auto Refresh: Specifies how often the dashboard is automatically refreshed.

11

Refresh Icon: Refreshes the dashboard.


Note

Composite Alerting is not displayed in the Alert dashboard.


Create a New KPI Profile

A KPI Profile is a collection of KPIs and their corresponding parameters such as alert frequency, alert type, cadence, and more. You can group relevant KPIs into a KPI Profile, give it meaningful name based on the purpose (for example, environmental or health check), and configure parameters that are relevant to monitoring a specific type of devices (for example, edge routers). Once the KPI profiles are created and validated by the system, they are ready to be used. You can select the device(s) in Health Insights, select appropriate KPI Profiles and enable them. This action enables all the KPIs in the selected KPI Profile. Similarly, you can select the device(s) and choose to disable the KPI Profiles. This removes all KPIs enabled as part of the selected KPI profile(s).

You can create a KPI Profile and enable it on the desired devices. The workflow is as follows:

  1. Supply basic information, such as the Profile name and a description.

  2. Add KPI(s) and save the profile.

  3. Edit KPI parameters and create alert groups.

  4. Enable the KPI Profile on the devices.

The following steps explain how to perform all of these tasks.

Procedure


Step 1

From the main menu, choose Health Insights > Manage KPI Profiles. The Manage KPI Profiles window opens.

Step 2

Click the Add icon. The Create New Profile window opens.

Step 3

In the text fields provided, enter a unique Profile Name, a short Description. The Profile Name can contain a maximum of 32 alphanumeric characters, plus underscores ("_"). No other special characters are allowed.

Step 4

Add KPI to the profile, using the following filter options:

  1. All KPIs: By default, this option is selected and the list of all KPIs are displayed in the list. You can select the required KPI by checking the relevant checkbox.

  2. Recommended KPIs: You may also select KPIs based on the KPI recommendation for a specific device. Click Recommended KPIs and the device list is displayed. You may filter the device list by entering relevant values in the Name and State fields, or filter by tags. using tags. Select a device from the list and the recommended KPI list is displayed on the right side. Select the required KPI by checking the relevant checkbox.

    Note 

    Selecting KPIs from the recommended KPI list of a selected device does not automatically enable the KPI Profile in the selected device. The KPI Profile can be enabled after it is created. For more information, see Enable KPI Profile on Devices

Step 5

Click Save save the new KPI Profile and display the Manage KPI Profiles window.

Step 6

In the KPI Profiles area on the left side, choose the KPI Profile that you created, and the individual KPI details are displayed on the right side.

Step 7

You can leave the KPI parameters at the default or choose a different value. To edit the KPI parameters, click Edit Details, and the KPI Details window is displayed. Edit the parameter values as appropriate for the purpose of your KPI. The common parameters are:

  • Alert: This is an on/off toggle switch for alerting. Based on the Alert parameter value, the corresponding alerting logic is deployed. Alerting can be enabled even after the KPI Profile has been applied to the devices.
    Note 

    Any KPI using the composite alerting logic need to have the alerting flag set to ON.

  • Cadence (sec): Set the frequency of sensor data. Set the frequency (in seconds) in which the KPI will gather sensor data from the devices on which the KPI Profile is enabled.
  • Alerting Down Sample Rate: Alert frequency rate. It determines how often KPI data will be evaluated for any alert conditions, and is relative to the Cadence. For example, if Cadence is 60 seconds and you want to do an alerting evaluation every 300 sec, then specify Alerting Down Sample Rate as "5".
Note 

Setting the alert flag as ON for an enabled KPI profile is not displayed on the corresponding Health Insights job details page as the update operation is an internal system transaction. If the job completes successfully, the alert triggered can be viewed on the alert dashboard.

Step 8

You can also edit the alert logic parameters of the selected KPI. To learn more about a parameter, hover your mouse cursor over the Reachability Unknown icon shown next to the parameter name.

Note 

When different thresholds are desired for different types of devices in the network, it is advisable to create multiple profiles and split the KPIs across them to meet the needs of different device types.

Step 9

When you are finished making changes, click Save to save the new KPI Profile. Health Insights validates your input parameters and displays the Manage KPI Profiles window.

Note 

You can create up to 50 KPI profiles, and an individual KPI Profile can consist up to 50 KPIs. KPI profile creation can fail if the total number is exceeded, or if Health Insights could not create the required tags in Inventory manager. This status is reflected in the profile state. Once profile is ready, it can be applied on devices.

With the Manage KPI Profiles window displayed, you can enable the new KPI Profiles on one or more devices immediately, following the steps given in Enable KPI Profile on Devices.

See Disable KPI Profile on Devices or Device Groups for instructions to disable KPI Profiles.

Step 10

(Optional) You can also create alert groups for a KPI Profile. Alert groups use boolean logic (cascaded OR and AND) to combine alert outputs from primary KPIs in your KPI profile and create a composite logic query. To create an alert group, click + Alert Group. The Create Alert Group window is displayed.

Note 

Configuring an alert provider enables composite alert forwarding. For information on adding alert providers, see Add an Alert Provider

Step 11

Provide a relevant entry in the Name field. Summary and Details are optional fields.

Step 12

The Alert Group Conditions area on the right side lets you select a logic gate (AND/OR) and add a KPI on which the logic is applied. Your alert group can be based on the alert criteria of a single KPI, or it can be a combination of mutliple KPI outputs. Click the desired logic (AND gate is selected by default), and click the + ADD dropdown list to add an Item or a Group.

Item allows you to add individual KPI items and set the corresponding alert level, and Group allows you to add a nested alert group.

Step 13

Choose the desired KPI from the Select KPI dropdown, and select the desired level(s) for which the alerts need to be set for the chosen KPI. The alert levels are CRITICAL, MAJOR, MINOR, WARNING and INFO. Based on the logic gate and alert criteria you select, the output of the KPIs are evaluated and the alert is generated.

In the example shown above, the alert is set based on the output of two logic gates. The first logic gate is the output of an OR operation between the Memory Utilization and Interface Bandwidth monitor KPIs. If the set alert levels are met for either of the KPIs, the output of the first logic gate is set as true. This output is considered as the input for the second logic gate, which is an AND operation with the CPU Utilization KPI. If the alert levels of both the KPIs are met, the output of the second logic gate is set as true.

Step 14

Click Save to save the new alert group and display the Manage KPI Profiles window. Click Edit Details or Delete icon to edit or delete an existing alert group respectively.


Enable KPI Profile on Devices

With Health Insights, you can enable and monitor the KPI Profiles in which you are interested. Instead of sifting through all the data that a given device can supply, you choose to monitor only the information relevant to the role the device plays in your network. Your equipment and management infrastructure operates as efficiently as possible, without requiring the collection and storage of data that is unrelated to device roles. This operational efficiency reduces the amount of time required to set up specific monitoring, leading to faster problem identification and resolution.

Note that some KPIs trigger alerts based on deviation from an established level of performance. For these types of KPIs, it is necessary to allow the system some annealing time in order to establish normal performance levels.


Important

You can only enable KPI Profiles with MDT-based KPIs on a device that has been mapped to a Cisco Network Services Orchestrator (Cisco NSO) provider. See the following topics for more information:


To enable KPI Profile on devices:

Procedure


Step 1

From the main menu, choose Health Insights > Enable-Disable KPI Profiles. The Enable-Disable KPI Profiles window is displayed.

Step 2

Select the devices for which you want to enable KPI Profiles. You can click the Devices or Device Tags buttons above the table on the left to toggle between selecting the devices by name or by tagged device group membership. Depending on your selection, the device list or the device tag list is displayed on the left.

If you choose to select by Devices:

  • Click Set Filter icon in the table on the right. Type a Name or Device Type in the filter fields. As you type, the table displays only the devices whose name or type match the text you typed.

  • Click the check box next to the device(s) you want. You can select multiple devices at the same time.

If you choose to select by Device Tags:

  • Type a tag name in the Name field to find a Device Group in the table. As you type, the table displays only the tag names that match the text you typed.

  • Click the check box next to the group you want. The names of all the devices in that group appear in the devices table on the right.

Step 3

Click Enable KPI Profiles to continue. Health Insights detects the selected devices, their types and models, and retrieves and analyzes their running configurations. The KPI Profiles window presents the KPI Profiles available for your selected devices.

Step 4

Choose the KPI Profiles you want to enable by clicking the check box next to the KPI Profile name. :

Step 5

When you are finished, click Next. The Verify Details window appears, listing all the KPI Profiles you have chosen to be enabled on the selected devices, as shown in the following figure.

Step 6

(Optional) To get information about the KPIs included in the KPI Profile. Click the KPI Profile in the Selected Profile(s) table, and the content of the selected KPI Profile is displayed on the right side. Click View More Details to view the parameters of a specific KPI. A popup window provides the details of the KPI. Click the Close icon to close the popup window.

Step 7

To enable the selected KPI Profiles on the selected devices, click Enable. Health Insights schedules the KPI Profile(s) as a series of job sets.

Step 8

From the main menu, choose Health Insights > Job History to watch the progress of each job set, as shown below. You should see job sets completing with a status of "Success". If job sets complete with a "Partial" or "Failed" status, be sure to read the job completion messages, and check that the selected devices are still reachable.

When the job sets complete successfully, the KPIs are now associated to the devices and the platform begins the process of enabling the relevant collection procedures for those network elements. In making these changes, you are automating the configuration of both the platform and the devices themselves to collect only the information required.

Step 9

From the main menu, choose Health Insights > Alert Dashboard. The dashboard shows the alert status for the devices on which you have enabled KPI monitoring.



Note

  • SNMP/MDT jobs may take more time than expected to reach the completed state when there is an increase in the number of devices, interfaces and KPIs.

  • Enabling KPI profile per device takes around 3 to 5 seconds. If the device is not reachable, it will keep trying until it is timed out. This may result in the job taking more time to reach the completed state.


View Alerts for Network Devices

After enabling KPIs on a device, you can view alerts for that device and get data for each performance indicator being monitored.


Note

The KPIs shown in the following steps are examples. There are many more KPIs available in Health Insights. For the complete list, see List of Health Insights KPIs.

Procedure


Step 1

From the main menu, choose Health Insights > Alert Dashboard. The Health Insights dashboard opens.

Step 2

Make sure that the Device Alerts view is displayed (select the Device Alerts toggle, if needed). Then scroll down below the Total Alerts panel and click on the All Impacted Devices tab. The dashboard displays a list of devices with alerts.

Step 3

Click on the Device Name for the device whose details you want to see. Health Insights displays the device's basicOverview information, Total Alerts, a Topology map, and the list of the device's currently Enabled KPIs.

The Topology map is a version of the map you see when you select Network Visualization > View Topology. For help using it, see Network Visualization Overview.

Step 4

To see detailed KPI data, scroll down to the Enabled KPIs panel at the bottom of the window and click on one of the enabled KPIs in the list at left. A graphical representation of that KPIs data, along with a list of alert messages and other information, is displayed on the right.

In the example shown below, the Interface rate counters KPI was selected, so the accompanying information includes a list of the interfaces at the right of the graphic time-series data.

Item

Description

1

Click on the KPI name in the list to display its data on the right.

2

Graphical time-series data for the selected KPI appears here. In the example shown, the KPI graphic shows real-time data that has been collected on the interfaces and the points at which the sensor detected an alert event.

To have the graphic display data for a single interface instead of all interfaces, click on the interface name in the list at right.

To zoom in on a period within the time series graph, click and drag within the graph.

3

Hover the mouse cursor over any data point in the KPI graphic to see additional popup information for that data point.

A red line or tag represents a point at which the KPI was triggered. This can occur on any subscribed statistic the KPI is monitoring. Health Insights collects and identifies the time points and frequency, which help determine when these events become an operational concern.

4

Below the KPI timeline, the alerts panel shows each KPI alert that was generated on the device. It also shows which link or interface was affected and when the alert occurred. The blue color indicates that the alert has cleared and is informational only. Red signifies a critical alert.

Search for alert messages in the list below by severity, any part of the message text, the alert ID, or by time and date.

5

View additional information by clicking the Summary and Raw graph pages in the View widget.

6

Click the empty square icon to use the whole page to display the Enabled KPIs panel only.

If you choose to enlarge the Enabled KPIs panel, a blue < icon appears next to the list of KPIs. Click the icon to hide the list.

Click the inverted versions of these icons to restore the KPI list and return the Enabled KPIs panel to its normal size.


Telemetry Data Retention

Telemetry data is collected from devices and stored in the time-series database. This data is retained for one hour, and is used in the Health Insights Alert dashboard to identify alerts using a process known as stream based alerting. The resulting 'alerts', if any, are stored in the same time-series database. The alerts are retained for 30 days, and the messages showing the duration of alerts are displayed in the top right corner of the Device/KPI view in the Alert dashboard. For more information, see View Alerts for Network Devices. The alerts can also be queried using REST APIs.


Note

The telemetry data displayed in the Alerts dashboard is limited to last only for one hour.


Manage KPI Profiles

The Health Insights Manage KPI Profiles window allows you to create, edit, and delete KPI Profiles.

To display the Health Insights Manage KPI Profiles window, choose Health Insights > Manage KPI Profiles from the main menu.

Manage KPI Profiles
Item Description

1

Filter KPI Profile: To find a KPI category, enter all or part of the KPI Profile name in this field, and the list is automatically filtered based on your input. Click Set Filter icon to clear any filters you have set.

2

Create KPI Profile: Click Add icon to create a new, user-created KPI Profile. For help with this task, see Create a New KPI Profile.

3

Edit KPI Profile: Select a user-created KPI Profile in the list and then click Edit icon to edit it. For help with this task, see Create a New KPI Profile.

4

Delete KPI Profile: Select a user-created KPI Profile in the list and then click Delete icon to delete it. You cannot delete a KPI Profile that has been enabled on any device(s).

5

KPI On Profile: The KPI(s) added on the selected KPI Profile and the associated parameters are displayed here. You can edit the KPI parameters, or remove a KPI from the selected KPI Profile using the appropriate options here. For more information, see Create a New KPI Profile.

6

#KPIs on Profile: This is the number of KPIs added on the selected KPI Profile.

7

Enabled Devices: This is the number of devices on which the selected KPI Profile is enabled.

8

+Alert Group: Click this option to create Alert Group for the selected KPI Profile. For help with this task, see Create a New KPI Profile

Manage KPIs

The Health Insights Manage KPIs window gives you complete access to Cisco-supplied and user-created KPIs. You can add, edit, delete, import, and export your KPIs. You can also link your KPIs to the Change Automation application's Playbooks, which enable scripted responses to KPI changes.

To display the Health Insights Manage KPIs window, choose Health Insights > Manage KPIs from the main menu.

Item Description

1

Add KPIs: Click Add icon to add a new, user-created KPI. For help with this task, see Create a New KPI.

2

Delete KPIs: Select one or more existing user-created KPIs in the list and then click Delete icon. You will be prompted to confirm that you want to delete the KPIs. Click Delete to confirm.

Note that you can delete user-created KPIs only. You cannot delete Cisco-supplied KPIs.

3

Import KPIs: Click Import icon to import new user-written or Cisco-supplied KPIs.

You will be prompted to browse to the gzipped tar archive containing the KPIs to be imported. When you have selected the archive, click OK to begin importing it. Once imported, the new KPIs will appear immediately in the list of KPIs, with each KPI name and category assigned based on the definition in the KPI itself.

In order for Cisco Crosswork Change Automation and Health Insights to import them, KPI files must:

  • Be packaged as a gzip tar archive. You can include more than one KPI in a single archive; each will be imported as a separate KPI.

  • Have unique names and descriptions. These must not match the name or description of any Cisco-supplied KPI. If the name or description of the KPI matches an existing user-created KPI, the import will overwrite the existing KPI.

  • Meet other minimum requirements for Health Insights KPIs, as explained in the Cisco Crosswork Network Automation Custom KPI Tutorial Documentation on Cisco DevNet.

4

Export KPIs: Select one or more existing KPIs in the list and then click Export icon to export them. Health Insights will package the exported KPIs as a single TGZ archive with a unique name. Your browser will then prompt you to save the archive to a name and location in your local file system that you select.

5

Link Playbooks: Select a KPI and then click Link Playbook icon to link it to a Playbook. That Playbook will execute whenever the KPI raises an alert thereafter. You can specify the values the Playbook will use when operators trigger it in response to the KPI alert. For help with this task, see Link KPIs to Playbooks.

6

Unlink Playbooks: Select a KPI with a linked Playbook and then click Unlink Playbook icon to unlink the Playbook. You will be prompted to confirm that you want to unlink the Playbook. Click Unlink to confirm.

7

Clear Filters: Click Clear All Filters to clear any filters you have set.

8

Filter KPI Categories: To find a KPI category, enter all or part of the KPI Category name in this field. Then click Set Filter icon to filter the list below.

9

Filter KPIs: To find a KPI, enter all or part of the KPI Name, Category, Description, or Linked Playbook in the fields provided. The list below is automatically filtered to match your typed entry.

Create a New KPI

You can create a custom KPI and enable it on the desired devices. The workflow is as follows:

  1. Supply basic information, such as the KPI name and a summary description.

  2. Set the KPI cadence.

  3. Select a YANG module and choose sensor paths

  4. Select an alert template and set its parameters

  5. Enable the KPI on the devices.

The following steps explain how to perform all of these tasks.

Procedure


Step 1

From the main menu, choose Health Insights > Manage KPIs. The Manage KPIs window opens.

Step 2

Click the Add icon. The Create KPI window opens.

Step 3

In the text fields provided, enter a unique KPI Name, a short KPI Summary description, and KPI details. The KPI Group is preset to User Created.

Step 4

The Cadence field sets the number of times per minute the KPI will gather sensor data from the devices on which the KPI is enabled. Leave it at the default or use the numerical selector to choose a different value.

Step 5

In the Select YANG Paths area, choose one module and one or more sensor leaf paths from which to stream data:

  1. Use Filter Modules to filter and choose the desired Cisco IOS XR YANG module.

  2. Use Filter Paths to filter and choose the desired sensor path. When you choose a path, the leaf node gets resolved to the base encoding path. If the YANG module is hierarchical, the field names are concatenated down from the base path. Note that only one gather path is supported for user-created KPIs.

  3. Click Next to display the Select Alert Templates window.

Step 6

Choose the alert template you want to use with your new KPI: No Alert, Standard Deviation, Two-Level Threshold or Rate Change. Then click Next to display the Alert Parameters window appropriate for the type of alert template you chose.

Step 7

Edit the alert template parameter values as appropriate for the template and the purpose of your KPI, as follows:

  • Use the Basic and Advanced Parameters dropdowns to view and edit the parameter sets you need.
  • Change alert parameter numerical values using the selectors or by editing the field contents
  • Change alert parameters with discrete choices using parameter field dropdowns and select each choice as needed.
  • Learn more about an alert parameter: Hover your mouse cursor over the Reachability Unknown icon shown next to the parameter name.
  • Click the View Tick Script link to view the tick script code you are generating with your changes. The tick script code updates as you make your edits. At any time, click the Hide Tick Script to close the tick script code window.
Step 8

When you are finished making changes, click Finish to save the new KPI and display the Manage KPIs window.


Link KPIs to Playbooks

You can link any Health Insights KPI to one Change Automation Playbook of your choice. A user can run the linked Playbook whenever the linked KPI raises an alert in response to the event associated with the performance indicator the KPI is monitoring. The KPI alert can be raised in response to a threshold crossing, topology changes, flapping conditions, and other parameters. These parameters will vary, as appropriate, for each KPI.

You can specify the Source of the parameter values the linked Playbook will use when you run it. You can select these sources:

  • Playbook: Use default values coded into the Playbook itself

  • KPI Alert: Use values taken from the alert raised by the linked KPI.

  • Runtime Input: Use values you enter only at the moment you run the Playbook.

The ability to set the source of these Playbook parameter values gives you flexibility in how you use the linked Playbook. For example: Link the KPI Interface flap detection, which detects interface flapping, to the Playbook Interface state change, which can be used to set the interface up or down. Depending on circumstances, you might want to set the Playbook parameters as follows:

  • Playbook: You want to run the Playbook as it normally does, so you would set the Source as Playbook for the provider, collection_type and mop_timeout parameters. In the case of the collection_type, you can still choose between telemetry and snmp, depending on whether you want to use MDT or SNMP to gather device data.

  • KPI Alert: You want the Playbook to run only on the host device and interface affected by the flapping, which are identified in the flap-detection Alert. So set the Source of the Playbook's hosts and if_names parameters to KPI Alert. You can then use the alert's data about the Producer device and the interface_name of the flapping interface on that device.

  • Runtime Input: You want the freedom to decide at runtime whether to bring the flapping interface up or down. So set the Source of the Playbook parameter admin_state to Runtime Input. The Playbook will prompt you for an up or down choice when you initiate the run.

The following figure shows what this set of choices will look like:

Figure 1. Example: Specifying Parameter Value Sources for a Linked Playbook
Example: Specifying Parameter Value Sources for a Linked Playbook

Procedure


Step 1

From the main menu, choose Health Insights > Manage KPIs. The Manage KPIs window opens, displaying lists of the KPI categories and the KPIs available in each category.

Step 2

Select the KPI you want to link to a Playbook. You can use filters to find the KPI you want, as explained in Manage KPIs.

Step 3

Click Link Playbook icon. The Link KPI to Playbook window opens.

Step 4

The left side of the window lists the name of the selected KPI and the Playbooks appropriate for linking to it. Scroll through the list, or use the Playbook Name field and the Set Filter icon to restrict the list to just the Playbooks you want.

Step 5

When you have found the Playbook you want to link, click on its name. The right side of the window will then list the Playbook Details for the selected Playbook, including:

  • The hardware and software platforms with which it is compatible.

  • The minimum software version requirement

  • The Source and default values that will be used when the Playbook runs. In many cases, you can select from a range of default values, or enter your own.

Step 6

Verify or modify the Source and parameter values as needed.

During the maintenance cycle, the Playbook will perform a variety of actions. To see a list of these actions, click the View Maintenance link. A popup Maintenance panel opens, listing them. Click Pin View icon if you want to continue to refer to this action list while you adjust the Playbook runtime parameter sources and values. Click Close icon at any time to close the Maintenance panel.

Step 7

When you are finished making changes, click Link to KPI. Change Automation displays the Manage KPIs window again, this time with the linked Playbook shown next to name of the KPI in the Key Performance Indicators (KPIs) list.

Step 8

To change the Playbook linked to a given KPI, repeat steps 3 through 7 for that KPI, this time choosing the Playbook you want. To unlink a Playbook entirely, select the KPI and click Unlink Playbook icon.


Verify the Deployment Status of Enabled KPIs

After you enable KPIs, you can verify their deployment status.

Procedure


Step 1

From the main menu, choose Health Insights > Job History The Job History window lists the jobs that have been run most recently, indicating whether they succeeded or failed, when they ran, and on what devices.

Step 2

Click the transaction ID in the job listing to view detailed KPI job information, including the device on which the KPI was enabled and the KPI ID.


Disable KPI Profile on Devices or Device Groups

You can use the Enable-Disable KPI Profiles window to disable all of the KPI Profiles running on device(s).

Procedure


Step 1

From the main menu, choose Health Insights > Enable-Disable KPI Profiles. The Enable-Disable KPI Profiles window opens.

Step 2

To disable all KPI Profiles enabled on all the devices within a device group:

  1. Click the Device Tags button above the table on the left. The table displays the list of device tags.

  2. Click the checkbox next to the device tag(s) on which you want to disable KPI Profiles.

    When you select a device tag, the Devices table on the right shows all the devices that are associated with that tag. All of the devices are preselected.

  3. Click Disable KPI Profiles. You will be prompted to confirm that you want to disable all the KPIs running on all the devices in the group. Click Disable to confirm.

Step 3

To disable KPIs enabled on one or more devices:

  1. Click the Devices button above the table on the left. The Devices table on the right shows all the devices, with the total number of KPIs enabled on each device.

  2. Click the checkbox next to the devices on which you want to disable KPIs.

    If you select one device, you can disable all KPI Profiles for the device or just some of the KPI Profiles. If you select more than one device, you can only disable all KPIs for them.

  3. Click Disable KPI Profiles. You will be prompted to confirm that you want to disable the KPIs running on all the selected devices. If you selected only one device, click the checkboxes next to the KPI Profiles you want to disable on that device, or click the checkbox at the top of the column to disable all the KPI Profiles running on that device. Click Disable to confirm.


List of Health Insights KPIs

The table below lists the prebuilt Health Insights KPIs supplied with Cisco Crosswork Change Automation and Health Insights.

Alerting types in the table that you can select when you create a new KPI (see Create a New KPI are:

  • No Alert: The KPI gathers, tracks and reports performance data without triggering alerts.

  • Standard Deviation: The KPI detects spikes or drops in measured values and alerts when these values deviate some number of standard deviations away from their normal values.

  • Two-Level Threshold: The KPI detects abnormal measured values using two custom thresholds and the ability to provide dampening intervals on the thresholds.

  • Rate Change: The KPI detects abnormal rates of change in measured values to detect rising or falling values.

Additional alerting types that you can use when you export and use a prebuilt KPIs to create KPIs with custom parameters are:

  • Standard Deviation of Rate Change: The KPI alerts on standard deviations of the rate of change.

  • Low Single Threshold: The KPI alerts on a single threshold when the value falls below that threshold.

  • Direct Alarm Forwarding: The KPI uses the alarm from the device directly, as a Health Insights KPI alert.

  • Major/Minor/Low/High Thresholds: The KPI alerts on Major high, Minor high, Minor low, and Major low values.

  • Line State Changes: The KPI alerts on shutdowns and flapping in line states.

For more on creating KPIs with custom parameters from exported KPIs, see the Cisco Crosswork Network Automation Custom KPI Tutorial Documentation on Cisco DevNet.

Table 1. Health Insights KPIs

Category

KPI Name Description Alerting MDT or SNMP

Dataplane-Counters

CEF drops

Monitors CEF drop counters and baseline. Generates an alert for an unusual number of drops.

Rate Change

MDT

CPU

CPU threshold

Monitors CPU usage across route policies and line cards on routers. Generates an alert when CPU utilization exceeds the configured threshold

Two-Level Threshold

MDT

CPU

CPU utilization

Monitors CPU usage across route policies and line cards on routers. Generates an alert when CPU utilization is unusual.

Standard Deviation

MDT

Basics

Device uptime

Monitors device uptime.

Low Single Threshold

MDT

Layer 1-Traffic

Ethernet port error counters

Monitors port transmit and receive error counters.

Rate Change

MDT

Layer 1-Traffic

Ethernet port packet size distribution

Monitors port transmit and receive packet size distributions.

No Alert

MDT

Layer 1-Traffic

Ethernet port packet statistics

Monitors port transmit and receive packet statistics.

Standard Deviation of Rate Change

MDT

Layer 2-Traffic

Interface bandwidth monitor

Monitors bandwidth utilization across all interfaces on a router. Generates an alert when bandwidth exceeds the configured threshold.

Two-Level Threshold

MDT

Layer 3-Traffic

Interface counters by protocol

Monitors interface statistics (such as incoming and outgoing packets or byte counters) organized by protocol.

Standard Deviation

MDT

Layer2-Interface

Interface flap detection

Monitors interface flaps and alerts when flap count reaches set threshold.

Two-Level Threshold

MDT

Layer 2-Traffic

Interface packet counters

Monitors interface transmit and receive counters. Generates an alert when unusual traffic rates occur.

No Alert

MDT

Layer 2-Traffic

Interface packet error counters

Monitors interface transmit and receive error counters. Generates an alert when unusual error rates occur.

Rate Change

MDT

QOS

Interface QoS (egress)

Monitors interface QoS on the egress direction for queue statistics, queue depth, and so on.

No Alert

MDT

QOS

Interface QoS (ingress)

Monitors interface QoS on the ingress direction for queue statistics, queue depth, and so on.

No Alert

MDT

Layer 2-Traffic

Interface rate counters

Monitors interface statistics as rate counters. Generates an alert when unusual traffic rates occur.

Standard Deviation

MDT

IPSLA

IP SLA UDP echo RTT

Monitors IP SLA UDP echo RTT. Generates an alert when unusual RTT values occur.

Standard Deviation

MDT

IPSLA

IP SLA UDP jitter monitoring

Monitors IP SLA UDP jitter. Generates an alert when an abnormal UDP jitter occurs.

Standard Deviation

MDT

Layer 3-Routing

IPv6 RIB BGP route count

Monitors IPv6 RIB for route count and memory used by BGP. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIB IS-IS route count

Monitors RIB for route count and memory used by IS-IS. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

IPv6 RIB IS-IS route count

Monitors IPv6 RIB for route count and memory used by IS-IS. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

IPv6 RIB OSPF route count

Monitors IPv6 RIB for route count and memory used by OSPF. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Protocol-ISIS

ISIS neighbor summary

Monitors ISIS neighbor summaries for changes in neighbor status. Generates an alert when an anomaly is detected (such as neighbors down or flapping).

Standard Deviation

MDT

Layer 1-Optics

Layer 1 optical alarms

Monitors per-port optical alarms (current and past).

Direct Alarm Forwarding

MDT

Layer 1-Optics

Layer 1 optical errors

Monitors per-port Layer 1 errors. Generates an alert when error rates exceed the configured threshold.

Rate Change

MDT

Layer 1-Optics

Layer 1 optical FEC errors

Monitors per-port optical FEC errors. Generates an alert when FEC errors exceed the configured threshold.

Rate Change

MDT

Layer 1-Optics

Layer 1 optical power

Monitors per-port optical power. Generates an alert when power levels exceed the configured threshold.

Major/Minor/Low/High Thresholds

MDT

Layer 1-Optics

Layer 1 optical temperature

Monitors per-port optical temperature. Generates an alert when temperature exceeds the configured threshold.

Major/Minor/Low/High Thresholds

MDT

Layer 1-Optics

Layer 1 optical voltage

Monitors per-port optical voltage. Generates an alert when voltages exceed the configured threshold.

Major/Minor/Low/High Thresholds

MDT

Layer 2-Interface

Line state

Monitors interface line states. Generates an alert when link states change.

Line State Changes

MDT

LLDP

LLDP neighbors

Monitors LLDP neighbors. Generates an alert when any sudden changes are detected.

Standard Deviation

MDT

Memory

Memory utilization

Monitors memory usage across route processor and line cards on routers. Generates an alert when memory utilization is unusual.

Standard Deviation

MDT

Memory

Memory utilization (cXR)

Monitors memory usage across route processor and line cards on classic XR devices. Generates an alert when memory utilization is unusual.

Standard Deviation

MDT

Layer 3-Routing

RIB BGP route count

Monitors RIB for route count and memory used by BGP. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIB connected route count

Monitors RIB for route count and memory used by connected. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIB IS-IS route count

Monitors RIB for route count and memory used by IS-IS. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts)

Standard Deviation

MDT

Layer 3-Routing

RIB local route count

Monitors RIB for route count and memory used by local. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIB OSPF route count

Monitors RIB for route count and memory used by OSPF. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIB static route count

Monitors RIB for route count and memory used by static. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIBv6 connected route count

Monitors RIBv6 for route count and memory used by connected. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIBv6 local route count

Monitors RIBv6 for route count and memory used by local. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIBv6 static route count

Monitors RIBv6 for route count and memory used by static. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 3-Routing

RIBv6 subscriber route count

Monitors RIBv6 for route count and memory used by subscriber. Generates an alert when an anomaly is detected (such as significant increase or decrease in route counts).

Standard Deviation

MDT

Layer 2-Traffic

SNMP interface packet error counters

Monitors interface transmit and receive error counters. Generates an alert when unusual error rates occur.

No Alert

SNMP

Layer 2-Traffic

SNMP interface packet counters

Monitors interface transmit and receive counters. Generates an alert when unusual traffic rates occur.

Rate Change

SNMP

Layer 2-Traffic

SNMP interface rate counters

Monitors interface statistics as rate counters. Generates an alert when unusual traffic rates occur.

Standard Deviation Rate of Change

SNMP

Layer 2-Traffic

SNMP traffic black hole

Monitors input and output data rates for black hole behavior.

Checks the ratio of output data rate to input data rate and verifies that the ratio is within acceptable ranges, otherwise a black hole is occurring.

Two-Level Threshold

SNMP

Layer 2-Traffic

Traffic black hole

Monitors input and output data rates for black hole behavior.

Checks the ratio of output data rate to input data rate and verifies that the ratio is within acceptable ranges, otherwise black hole.

Two-Level Threshold

MDT

Troubleshoot Health Insights

The following table describes issues you may encounter when using the Health Insights application, and their solutions or workarounds.

Table 2. Health Insights Troubleshooting

Issue

Solution

Apply a KPI to a device fails with messages indicating that Cisco Network Services Orchestrator (Cisco NSO) and the target device are out of sync or otherwise out of communication. Message text will vary, but may include "device out of sync", "NC client timeout", and other text indicating that there are connectivity or sync issues between NSO and the device.

Apply the KPI again. Under normal circumstances, doing so will initiate a sync operation between the device and NSO.

Health Insights not receiving data.

Check the following and ensure they are responsive: