Network Element Polling with Cisco Active Network Abstraction
PDF(286.7 KB) View with Adobe Reader on a variety of devices
Updated:May 8, 2007
® Active Network Abstraction (ANA) is an extensible and scalable platform and product suite that resides between the network elements and OSS management applications and provides unified end-to-end network and service management for service provider and large enterprise networks. Cisco ANA is part of the next-generation management system for the Cisco IP NGN architecture.
In common with other service management solutions, the effectiveness of Cisco ANA in managing a network is directly proportional to the timeliness and accuracy of data from the managed network. For example, root causes cannot be identified and correlated to unless the system has accurate and timely inventory and topology information, which needs to be collected and discovered from the network.
Collecting such data comes at a cost. The devices must assign CPU cycles for responding to management protocol requests and bandwidth for carrying this management data.
This white paper describes how Cisco ANA optimizes the way in which management data is obtained from network devices. Extensive flexibility and configurability, in addition to smart and adaptive polling mechanisms, help ensure that only required data is polled, at an optimal frequency for the needs of accuracy, while not affecting devices and minimizing the management protocol overhead.
Cisco Active Network Abstraction
Cisco Active Network Abstraction is a flexible and powerful multivendor device management platform that facilitates advanced network and service management applications in a multitechnology, multiservice network environment.
The core technology of Cisco ANA includes a unique virtual network element (VNE) abstraction model that dynamically discovers and identifies device and network components reflecting the near real-time state of network elements.
This abstraction model facilitates a broad set of embedded device management and network abstraction features in Cisco ANA, while providing a rich set of platform and network mediation services for advanced network and service management applications from Cisco and its partners.
The Cisco ANA approach scales directly alongside the physical network, allowing operators to view and manage the complexities of multiple services in a multitechnology, multivendor network.
Figure 1 illustrates the main tiers of the Cisco ANA system. They are, from the bottom up:
• The managed network, composed of network elements from Cisco and other vendors
• The Cisco ANA unit servers, which provide the run-time environment for the VNEs (which are illustrated in Figure 2)
• The Cisco ANA gateway server, which manages client and northbound OSS interface requests
• OSS and GUI clients, which interact with the gateway
Figure 1. Cisco ANA Physical Architecture
Figure 2 illustrates the logical VNE layer. The VNEs are autonomous "device emulators" running on the unit servers illustrated in Figure 1. Each VNE interacts with its respective network element, discovers its physical/logical inventory and connectivity, and maintains a virtual, in-memory model of the device.
Figure 2. VNE Layer over the Physical Network
The VNEs help enable cross-network processing and analysis applications, such as service path tracing, root cause analysis, service impact analysis, and others. These applications are informed by the VNEs when an event, or configuration change, is identified from a network element, as well as by the gateway upon user or external OSS requests.
The focus of this paper is on the interaction between the VNEs and the network elements and on the mechanisms offered by Cisco ANA to minimize the impact of gathering information from network elements.
The Mechanics of Polling
Cisco ANA provides extensive and highly configurable capabilities for polling behavior. These include:
• Schemes that can be used to define polling behavior for groups of VNEs and the polling behavior on a per VNE or attribute basis
• Differentiated polling groups and cycles for different types of data (for example, status, configuration, performance, and so on) based on the importance of their timeliness
• Smart polling using event-based triggers to initiate polling
• Management protocol optimizations, including minimum delays between requests, optimizing the Simple Network Management Protocol (SNMP) protocol data unit (PDU) size and reusing data from one command to satisfy multiple requests with Telnet/Shared Shell (SSH) Protocol
• Persistence of polled data to support fast and smooth recovery of VNEs
• Adaptive polling to "back off" from devices whose CPU goes above a configurable threshold
• Identification and special treatment of devices that become unreachable
• Instant creation of dynamic registration commands for use-case-specific polling requests
Advanced Polling Configuration
Cisco ANA provides a flexible framework for advanced polling configuration called a scheme. Schemes are part of the Cisco ANA configuration registry. The scheme determines what data should be retrieved for each device and through which commands and protocols, supporting fine-tuning and local overrides of default polling parameters.
The scheme settings are arranged in an inheritance tree and incorporated into the configuration registry to support default values at any level and the option to inherit or override default settings on the basis of device vendor, type, model, version, and so on. The scheme settings can be changed at the level of granularity of specific device instances and specific aspects of inventory within devices. For example, different polling frequencies can be set for different port types within a device.
The data thus collected in the VNE layer is required for the applications running in and above the VNE layer. In general this is the physical and logical inventory and, crucially, the forwarding tables.
Polling Groups and Cycles
Since there are different types of data in network elements, with varying degrees of volatility, not all data needs be collected all of the time. Cisco ANA thus has the notion of a polling group, which defines polling cycles. Polling cycles are the time between consecutive interactions with a network element to gather information.
By default there are five kinds of information for a network element and, hence, polling cycles in a group:
• Status: Sets the polling rate for status-related information, such as device status (up/down), port status, administrator status, and so on. The information is related to the operational and administrative status of the network element. Default value of 60 seconds.
• Configuration: Sets the polling rate for configuration-related information, such as Virtual Circuit tables, scrambling, and so on. Default value of 360 seconds.
• System: Sets the polling rate for system-related information, such as device name, device location, and so on. Default value of 900 seconds.
• Topology Layer 1 counters: Sets the polling rate of the topology process as an interval for the Layer 1 counter. This is an ongoing process. Default value of 60 seconds.
• Topology Layer 2 counters: Sets the polling rate of the topology process as an interval for the Layer 2 counter. This process is available on demand. Default value of 60 seconds.
Each VNE uses so-called registrations to collect the different kinds of data from the associated network element. Each registration specifies the commands to obtain a specific given item of data and can be configured with a specific polling interval or logically associated with one of the polling intervals on a per device/VNE basis.
Further, as multiple VNEs may use the same polling group, the intervals within the polling cycle are randomized for each VNE. This means that, while the period between data collection commands for a given polling interval will be the same for each VNE configured with the same group, each VNE will actually be polling at different times. This "smoothes out" the load of the management protocols on the network and reduces their impact.
Cisco ANA monitors and takes advantage of specific device change indicators, such as configuration change notifications from traps, syslogs, or "Entity last change" MIB variables, which trigger Cisco ANA to automatically invoke a new polling cycle. This capability supports a low-frequency polling rate by default, while still identifying and reconciling changes quickly and so keeping the VNE data current.
Management Protocol Optimizations
In network management, there are well-known trade-offs between the use of SNMP and Telnet/SSH for monitoring and configuring network devices. SNMP has the advantage of being designed for the purpose, with minimal overhead, simplicity, and wide adoption. SNMP has the disadvantage that it is deployed over User Datagram Protocol (UDP), that SNMP agents are typically low priority processes, and that implementations vary in quality and reliability with MIBs having incomplete coverage and, on occasion, undesirable side effects.
Telnet/SSH has the advantages of being able to obtain any information, or effect any change, available through the command-line interface (CLI) interface of a device, being session based, and thus reliable, and having a higher degree of security, access control, and auditing capabilities through TACACS. Some of these advantages also count against Telnet/SSH, though, so there is no easy answer.
Maybe, in an ideal world, all management information would be available through SNMP. In the real world, there are various reasons why SNMP alone is not sufficient. Thus, users have to make use of other sources of information, including syslog, Telnet/SSH, and other management protocols if they are available.
Cisco ANA VNEs use the most appropriate means to collect data from a device, in a way that is decoupled from the processing of that data in a VNE. This supports the efficient development of VNEs for different devices and device families and is one of the strengths of the VNE approach.
To further reduce the impact of the management protocols, protocol optimization techniques can be employed. Some of those are discussed here.
• Where different Telnet/SSH data collection commands with the same polling interval are requesting the same information, that information is retrieved once and shared among the different commands.
• For SNMP, the PDU size can be optimized, and the same PDU can be used to send multiple commands to the network element, thus saving space with the PDU overhead and CPU cycles for the PDU processing.
• For Telnet, the same session can be used by different data collection commands so as to reduce the overhead of establishing Telnet sessions and the, potentially, associated TACACS authentication, authorization, and auditing interactions.
VNE Persistency and Recovery
The data that a VNE collects from a network element is built up over time and changes as the network element itself changes. In this way, the VNE maintains a near real-time picture of the network element.
The data that is gathered from a network element by polling commands is persisted on the file system as it is gathered from the network element. When a VNE is restarted and previously persisted data exists, the initial polling command from the VNE, upon start-up, will read the persisted data from file so that the VNE's internal model is first populated with persisted data.
The subsequent execution of the same polling command, which will be within the polling cycle, will read data from the device and update the internal VNE model. As a consequence, when persistence is enabled, a VNE restart will not result in a high level of device interaction but will enable the VNE to be immediately useful, relying on the regular polling cycle to refresh the VNE state.
Cisco ANA implements a unique safety mechanism for protecting the device from overload based on thresholds defined for the CPU level of a device, where available.
When the CPU utilization for a network element goes above a configurable threshold over five polling periods, polling for that network element will be changed by introducing a delay between SNMP and Telnet/SSH commands, and an alarm will be sent to the Cisco ANA gateway. If the CPU level falls below the threshold for two polling periods, then the ticket from the first alarm is cleared, and the delay is set to zero.
If the CPU level stays above the CPU threshold for ten polling periods, the VNE will be placed in maintenance mode, where it will automatically suspend polling altogether, and another alarm will be sent to the Cisco ANA gateway, which will update the ticket created from the first alarm.
Polling will be restarted, and the VNE will be returned to normal, upon manual request from an administrator.
Adaptive Polling in the Cisco ANA Administrator's Guide for further information.
Unreachable Network Elements
The "reachability" algorithm used by the VNEs depends on the configuration of the VNE and involves multiple connectivity tests, using SNMP, Telnet/SSH, or Internet Control Message Protocol (ICMP), as appropriate.
When a device fails to respond to any of the protocols, general polling is suspended, and a "VNE Unreachable" ticket is sent to the Cisco ANA gateway. Only the reachability tests are executed thereafter to detect when the device is reachable again. When the device is reachable again, general polling is restarted, and the ticket is cleared.
The Cisco ANA polling infrastructure supports run-time control not only of polling rates (as in adaptive polling) but also of the scope and depth of collected information. The dynamic registration mechanism supports on-demand, drill-down monitors that focus on specific network scopes and dynamically increase the depth and richness of collected data, for the monitored scope only.
For example, the Cisco ANA PathTracer monitor traces a specified service path across the network and extends the retrieval to include service-specific parameters. Once completed, the extended polling is canceled.
This white paper describes the various mechanisms and configuration options within Cisco ANA for optimizing polling and protecting the network. Taking a broader look, you can see that the real concern of network operators should be that they have multiple separate systems probe the network simultaneously, accessing the same devices without any coordination, and causing unpredictable behavior as a consequence. Further, in many cases these different systems poll the same MIBs, causing redundant, unnecessary load.
Cisco ANA was designed to be a scalable, end-to-end mediation layer, positioned between the physical network and Network Management System/Operational Support System (NMS/OSS) applications. Cisco ANA ensures safe, optimal, and predictable network access, while providing open, standards-based APIs to northbound systems, which access the Cisco ANA near real-time virtual model of the network inventory and topology, resulting in a dramatic reduction of the physical network access load.
An NMS has to collect data from network elements so as to be able to manage the network. The more data it collects more often, the more complete the NMS's picture of the network will be and the better it will be able to manage the network. Collecting data imposes an overhead though, so the way in which an NMS collects data needs to be configurable and adaptive to the needs of the network and the management use cases.
Cisco ANA offers a wealth of configuration and adaptation capabilities to help strike the appropriate balance between the data required for the proper functioning of an NMS and the load imposed by that NMS on the network that it is managing. These include:
• A highly customizable polling engine with per registration levels of granularity for configuration
• Smooth polling to even out the load over time
• Persistent device information to reduce the impact of restarting VNEs