Polling network elements is of constant concern to most service providers for three reasons:
• Taxing valuable network element CPU cycles
• Occupying a telemetry network with extensive data transfers resulting from these polls
• Diverting network management system cycles for processing network data that may not have changed
Cisco® Active Network Abstraction (ANA) Reduced Polling VNE implements a low overhead method for keeping ANA's device information synchronized with network elements in a timely manner without the drawbacks of repeated polling.
Introduction
For a network management system to maintain a detailed and up-to-date representation of a network element's physical and logical configuration, the network management system needs to detect device configuration changes in a timely manner. Typically, changes to device configuration are detected by polling the network element in preconfigured intervals. For some network elements, each poll can result in extensive configuration retrieval requests that get executed on the device. In particular, network elements supporting large numbers of interfaces and subinterfaces may get subjected to extensive queries (command-line interface [CLI] commands or Simple Network Management Protocol [SNMP] requests), either through the sheer number of individual commands or through requests for bulk data retrieval.
Cisco ANA has largely relied on periodic polling for maintaining its internal device representation synchronized with the actual network element configuration. Observations of Cisco ANA in operation within various network deployments have shown that Cisco ANA's polling cycles can create recurring, high CPU load on the polled network element, frequently occupy much of the telemetry network bandwidth, and consume the majority of ANA server resources for processing large amounts of device configuration information that may not have changed since the last poll. Clearly, this wasn't a solution service providers could live with for very long.
To relieve the burden on network elements and reduce the overhead on the telemetry network and on ANA servers, a new feature has been created within Cisco ANA's virtual network element (VNE) framework called Reduced Polling VNE.
Reduced Polling VNE Overview
The process of keeping ANA's representation of network element state and configuration (that is, ANA's VNE model) current with the actual network has been greatly enhanced with the Reduced Polling VNE. The difference between repeated polling at regular intervals (ANA's original method for synchronizing with the network) and reduced polling (the new method of synchronizing with the network) is a dramatic reduction in:
• Number of repetitive queries that ANA sends to the network element
• Amount of data transferred from the network element to ANA
• Effort spent by ANA servers on processing device configuration data
• Reduction in CPU spike and average CPU load on the device due to the reduction in the number of queries sent to the device
• Real-time inventory updates in the application corresponding to the configuration changes since reduced polling is event based
After ANA has discovered and created an internal representation of a network element's physical and logical configuration, ANA enters a state of listening for updates from that network element. Instead of polling the network element every 3 to 15 minutes to retrieve mostly unchanged configuration information, ANA's Reduced Polling VNE polls only when an update notification arrives from the network element. On average, these updates and resulting, event-based polls occur at much lower frequencies than the standard, 15-minute polling intervals. To reduce the overhead of data queries and transfers even further, ANA retrieves only the information that has changed since the last update.
Comparative benchmarks for the new, reduced polling versus the previously used, repeated polling method showed that ANA's Reduced Polling VNE polls a typical network element 85% less than the daily standard repeated polling as standard, repeated polling. Reduced number of polls leads to fewer queries and less data retrieved from the network element, which dramatically reduces the overall impact on the network element's CPU, telemetry bandwidth, and ANA server processing.
Reduced Polling VNE Method
Figure 1. Configuration Change Syslog
The Reduced Polling VNE feature relies on the network element to indicate that a configuration change on the device has occurred. A Cisco device does this by generating a configuration change syslogmessage (Figure 1).
%MGBL-CONFIG-6-DB_COMMIT: Configuration committed by user `anauser'. Use `show configuration commit changes 1000001755' to view the changes.
Cisco IOS configuration change syslog example:
%SYS-5-CONFIG_I: Configured from console by vty1 (172.23.95.46).
Cisco ANA receives the syslog message and queries the command history archive on the device to identify the exact commands executed on the device that resulted in a change on the device. ANA uses these commands to identify the component or components in ANA's VNE model that need to be updated.
For each model component ANA contains a preconfigured query for retrieving related updates from network elements. ANA assembles the set of queries needed to update the previously identified model components. ANA, then, polls the network element by issuing this set of queries. These queries, issued as CLI commands or SNMP requests, retrieve the needed information from the network element. ANA uses the retrieved information to update its VNE model.
(1) Configuration via SNMP Not Supported by VNE.
Handling Rapid Successions of Configuration Changes
Cisco ANA may receive multiple configuration change syslog messages from the same device in rapid succession. To help the Reduced Polling VNE handle rapid configuration changes, a throttling mechanism is introduced. When the initial configuration change syslog is received, the throttling mechanism triggers a timer that causes the Reduced Polling VNE to wait for a preconfigured interval. During this wait interval, any additional configuration change syslog messages arriving at ANA from the same device are collected and collated. When the timer expires the Reduced Polling VNE queries the command history archive and then retrieves the corresponding configuration changes. When the timer expires, the throttling mechanism is also reset to trigger with the next configuration change syslog. See Figure 2.
By default the throttle is disabled and can be enabled and administered using ANA registry techniques.
Figure 2. Throttling Configuration Changes
Handling Loss of Configuration Change Syslog Messages
Since syslog messages are sent through User Datagram Protocol (UDP), it is possible that some configuration change syslog messages may not arrive at ANA. The following options are available to counter loss of configuration change syslog messages:
• Wait for next configuration change syslog message:
– Upon reception of a configuration change syslog, previously missed configuration changes exist in the command history archive on the device. The Reduced Polling VNE retrieves the configuration changes and updates the VNE representation. This method is better if configuration changes are frequent.
– The key drawback of this method is that a missed configuration change may go unnoticed for an extended period of time, if configuration changes are less frequent.
• Archive log monitoring (Failsafe) mechanism:
– With the Failsafe mechanism enabled, the Reduced Polling VNE checks the command history archive on the device in 15 minute intervals to identify if any new configuration commands were executed.
– Polling the command history archive has minimal impact on the overall device CPU.
– Configuration changes are caught within 15 minutes of the time the configuration changed on the device.
• Configure the network element to send syslog messages as an SNMP Inform message:
– Using the CISCO-SYSLOG-MIB, syslog messages can be transferred reliably as SNMP Inform messages.
– The Reduced Polling VNE can rely on change notifications to poll for device configuration changes only when changes occur.
– Since SNMP Inform messages are acknowledged messages, the overhead for message transfer and processing doubles.
• Poll Now at granular level:
– When suspecting that some configuration updates were missed for a particular feature, users can make use of the Poll Now option, which is available at the granular level. This is very lightweight and specific to the component to be updated.
• Poll Now option that can be used on demand:
– When suspecting that some configuration updates were missed across technologies, users can execute the Poll Now option from the ANA Network Vision user interface or through the ANA API for that VNE.
– The Poll Now option will retrieve the complete device configuration information and remodel the entire device representation within ANA.
– Poll Now can be time consuming and expose the network element to the burden of a complete polling cycle.
– This option is offered primarily as a last resort option and intended to augment the other options.
• Wait for the system interval cycle:
– ANA checks network elements for changes at the end of each system interval, which is typically set to every 24 hours. If no configuration change syslog message has been received by the time the system interval is up, then the Reduced Polling VNE will check the network element and retrieve any missed configuration changes as above.
– The key drawback of this method is that a missed configuration change may go unnoticed for as much as 24 hours.
Complete Network Element Configuration Refresh
Whichever option is employed for handling loss of configuration change syslog messages, it is prudent to occasionally refresh the ANA representation of a network element. Cisco ANA automatically refreshes its device representation by remodeling the entire device representation in any of the following three situations:
• Poll Now option:
– As outlined above, the Poll Now option retrieves the complete network element configuration and remodels the corresponding device representation.
• Route Switch Processor (RSP) failover:
– If ANA detects an RSP failover condition on the network element, which may have caused unrecorded configuration changes, then ANA polls for the network element configuration and remodels the device representation.
• Command history archive buffer overflow:
– The device command history archive buffer contains the configuration command that was executed on the device. For Cisco IOS devices, it is possible for the buffer to overflow when a large number of commands are executed; in this case, some commands can be lost, a gap is identified, and the VNE is assumed to be out of synch with the device. VNEs using reduced polling are more sensitive to these changes due to their different polling frequency. If ANA detects a command history archive buffer overflow condition on the network element, ANA then proceeds to refresh the entire device representation.
The VNE representation refresh is similar to the initial VNE discovery process, the main difference being what triggers the process.
Like any discovery process, the VNE refresh has the potential of raising the CPU usage on the device. However, several factors work together to keep CPU usage low: a queuing mechanism that controls command execution, the VNE logic that reuses command results, and adaptive polling's throttle mechanism that introduces a delay between commands.
The amount of time needed for the VNE refresh depends on many factors, such as device and network latency and gateway server activities. To help identify when a refresh process is in progress and when it has completed:
• The VNE moves to the Currently Unsynchronized investigation state and its icon changes to an hourglass (Figure 3).
Figure 3. Currently Unsynchronized
• When the refresh is completed the VNE moves to the Operational state and the hourglass is removed (Figure 4).
Figure 4. Operational VNE
• Cisco ANA can be configured to generate a System event when a VNE enters or exits the Currently Unsynchronized state (or any other investigation state).
Network Elements Supported Through Reduced Polling VNE
At this point, Reduced Polling VNEs are applied to Cisco devices. In its first release with ANA 3.7.2, Reduced Polling VNE support was available for the Cisco ASR 9000, 7600, and CRS-1 Series devices. Since then, support for more device types has been added through monthly Cisco ANA Device Package updates.
Table 1 lists Reduced Polling VNE support for Cisco device families as of Cisco ANA 3.7.x Device Package 6.0, which was published on August 5, 2011.
Prerequisite configuration on the device for reduced polling: Archivelog feature needs to be enabled and syslog host should be set to ANA gateway on the device managed as VNE in Reduced Polling mode.
Reduced Polling can be enabled or disabled from the ANA Manage administrator user interface when creating a new VNE or editing existing VNE properties.
When creating a VNE, the default polling approach is set to ANA default, which means that ANA controls whether the device is modeled as a regular VNE or a Reduced Polling VNE, depending on the device type. An administrator can also select to model the device as a regular polling VNE or as a Reduced Polling VNE. This selection can be done per VNE instance or as the default for a device type. See Figure 5.
Figure 5. Defining Polling Method
Before selecting the Reduced Polling option it is recommended to verify that the device or device type is supported through Reduced Polling VNE. The ANA Manage administrator user interface provides a method for listing the device types supported through Reduced Polling VNE, as illustrated in Figure 6.
Figure 6. Supported Device List
Important: Cisco network elements do not generate configuration change syslog messages when configuration changes are made through SNMP. Enabling the reduced polling feature is NOT recommended for devices that may be configured through SNMP.
Which polling method has been selected and is currently in effect for a particular device can be checked through the Communication Details view for the VNE (Figure 7).
Figure 7. Checking Polling Method
Polling for Continually Changing Networking Objects
Even with Reduce Polling VNE enabled, ANA will periodically poll for updates about continually changing network objects. ANA obtains updates for continually changing network objects at regular intervals, defined by the administrable Configuration Interval value (typically 15 min). These continually changing network objects include routing and forwarding tables, Bidirectional Forwarding Detection (BFD) sessions, label switching (LSP) tables, and so on.
Benchmark of Reduced Polling VNE
The following benchmark illustrates the drastic reduction in commands sent by ANA to a network element when the Reduced Polling VNE is enabled versus using the regular polling VNE. This illustration uses command counts collected from a 7600 and an ASR 9000 device that are located within a Carrier Ethernet network. That network was configured with Multiprotocol Label Switching (MPLS), Virtual Private LAN Services (VPLS), and corresponding Pseudowires traversing the MPLS network. The data was collected for a one-hour period after the initial discovery and after the VNE reached the operational state. Total commands include the SNMP and Telnet commands. The observed command counts for a 7600 device and an ASR 9000 device follow:
7600 device:
ASR 9000 device:
As expected, a reduction in CPU load on network elements has also been observed. The example in Figure 8 illustrates a drastic reduction in CPU load observed on the ASR 9000.
Figure 8. CPU Load Example
Summary
Cisco ANA's near-real-time model relies on accurate information about the managed network. This information can be obtained through asynchronous notifications from network elements or through repeated polling of the network element.
Repeated polling imposes a burden on network elements and overhead on the telemetry network and on ANA servers. To relieve that burden and overhead, a new feature has been created within Cisco ANA's VNE framework called Reduced Polling VNE.
This Reduced Polling VNE feature implements a low overhead method for keeping ANA's device information synchronized with network elements in a timely manner without the drawbacks of repeated polling.