Telemetry-Traffic Collector (TM-TC) Troubleshooting Procedures

This section explains the troubleshooting scenarios encountered for the Telemetry-Traffic Collector (TM-TC) service.

Handling Zombies

Telemetry – Traffic Collector (TM-TC) service is implemented using nano-services and Reactive FASTMAP (RFM) design pattern.

There are two nano-plans in TM-TC:

  • External user facing plan: This plan provides an interface for tracking the configuration status of each node.

  • Internal hidden plan: This plan applies TM-TC service configuration to a node. The internal service is created for each device by a stacked service.

Zombies are the internal operational data model in NSO to store deleted service data. Zombies are helpful when performing staged deletions and RFM (RFM is the NSO version of eventual consistency). When a service deletion is triggered, NSO maintains references of the deleted services (zombies) in operational data. The zombies are deleted from the configuration database (CDB) when all the configurations for the service are removed from the devices. Zombies inform the data interface the progress of a service deletion. It also informs the stage it is waiting on, which helps to point to the problematic area. For more information, see NSO documentation in Cisco DevNet.

On Cisco Crosswork Change Automation and Health Insights, when you trigger a deletion to clean up the configuration on a device (DLM ADMIN_DOWN / UNMANAGED / DELETION), depending on the connectivity of the device, deleting the configuration at once may lock down the database until the time the last configuration is removed. Once the configuration is successfully removed from the device, the TM-TC service will update the nano-plan state to communicate the deletion progress to the data interface. After the deletion process is completed, TM-TC service removes the nano-plan, zombies, and all the service-related operational data from the CDB.

In some scenarios, as mentioned below, the zombies may not be deleted even after deleting the device configuration and may require manual intervention to delete the configuration references from the devices. In such cases, run the cleanup action on the device/service. Device and service are inter-usable terms in this context as Cisco Crosswork Change Automation and Health Insights creates services per device.

  1. Device is not reachable during deletion.

  2. Device is reachable, but the configuration removal fails on the device for other reasons.

If a device/service goes into the zombie state, user should delete the existing plan to enable any new telemetry collection on the device. If the data interface (Crosswork) or a CLI/NETCONF user tries to recreate the service instance before the zombie/delete is fully processed, the following error is displayed, which indicates that the deletion process is still in progress.

Aborted: Operation failed because: Service still in zombie state: 'YYY'

Note

TM-TC Funtion Pack does not support zombie resurrect and redeploy options.


The below image shows how to check if a service is in zombie state on NSO.

Figure 1. Checking if a service is in zombie state on NSO

The below image shows the message displayed when you try to create a new configuration on a service that is in zombie state (viewed in the Performance Alerts > KPI Job History window).

Figure 2. Zombie state error message

The below image shows the NSO cleanup command to remove the plan in zombie state.

Figure 3. NSO cleanup command

Handling Device Cleanup Errors

The deletion of telemetry configuration may fail at times, and you will be notified about it in the Inventory Jobs window page.

Figure 4. Telemetry configuration deletion error

The device cleanup error can occur in two scenarios:

  • Failure in deleting a specific telemetry configuration on the device

    In this scenario, user is expected to clear the failed configuration from telemetry service manually. This automatically removes the configuration from device. If the configuration is removed from the device, funnction pack will restore the configuration, hence user should also remove it from the function pack service model.

    Follow these steps to remove the subscription manually from NSO CLI:

    1. Browse through the Telemetry – Traffic Collector (TM-TC) configuration to find out the subscription id to be deleted.

    2. Delete the subscription node found above using delete command on NSO CLI.

  • Failure in deleting the telemetry service on NSO for a device

    When ADMIN_DOWN/UNMANAGED is set on a device in DLM, or if device is removed from DLM, Cisco Crosswork Change Automation and Health Insights will remove the telemetry service associated with that device on NSO. If this fails, it would be reported as device cleanup failure. In this case, user is expected to run the cleanup command on NSO CLI. The cleanup command has a “match” option using which all the services whose name is matching with a particular string can be removed at one go.

    Below are some examples:

    The name of a service for a specific device would be Crosswork_cahi-<node key in NSO>

    To remove one service: request tm-tc-actions cleanup service <service-name> no-networking false

    To remove all services whose name matches with the string "Crosswork": request tm-tc-actions cleanup service Crosswork match true no-networking false