Cisco Mobile Wireless Fault Mediator 2.0.1 - Topology and Platform Modeling Reference Guide
The Discovery Process

Table Of Contents

The Discovery Process

An overview of the default discovery process

Configurable discovery data flow

Altering the discovery data flow

Altering the data flow: on-demand Stitchers

The Containment Model

Physical containment

Logical containment

Altering the Containment Model

Advanced discovery features

A stages approach to discovery

The phases of the data collection stage

The impact of the stages approach on DISCO processes

Determining the phase completion criteria

Exceptions to the phase completion criteria

Why use different data collection phases?

Criteria for multi-phasing

Managing the phases

Effect of discovery multi-phasing on network traffic

After the first discovery: rediscovery

Conclusion


The Discovery Process


This chapter introduces the default discovery process and Containment Model, configurable discovery data flow, and advanced features of discovery such as the phased approach to discovery data collection. Finally, you are introduced to the post-discovery process, i.e., full or partial rediscovery.

An overview of the default discovery process

The previous chapter introduced to the operation and the components within the auto-discovery process controller, DISCO. This chapter details the discovery, which is the process of finding devices that exist on the network and knitting together a topology. This section provides an overview of how DISCO achieves a discovery that culminates in a network topology Containment Model via a step-by-step walkthrough of a default discovery process.


Note DISCO offers the flexibility of fully configuring the entire data flow to suit your particular requirements.


Examples showing why and how you would configure the discovery process are given further on in this chapter. The default data flow is shown in the series of diagrams as shown in Figure 7-1.

Figure 7-1 Discovery process default data flow

1. Finders receive their instructions from the configuration contained within their individual schema files and the inserts made into the Finders despatch table, then proceed to the network to look for device existence.

2. Finders return information to the Finders returns table (which only shows device existence and does not contain connectivity information).

3. As soon as device existence information is placed into the Finders returns table and the processing criteria is fulfilled, a Stitcher moves information from the Finders returns table to the Finders processing table, which signifies that the network entity is being processed by DISCO.

It is worth noting that information placed into the Finders returns table is automatically moved by a Stitcher to the Finders processing table, but only whilst the discovery process is in the device discovery phase, i.e., the data collection stage (Phase 1).

Entries that are reported to the Finders returns table after the end of Phase 1 are moved by the Stitcher to the Finders pending table. Devices returned to the pending table will be processed during the next discovery cycle (the advanced concepts of stages and phases within a discovery are discussed later on in this chapter).

Upon completion of the present discovery cycle, devices that currently reside in the Finders pending table are sent to the Finders processing table and the whole process is allowed to start all over again.

4. The default set of Stitchers operates on the information contained within the Finders processing table. A Stitcher moves device existence information from the Finders processing table to the Details Agent despatch table.

5. The despatch tables for all of the Agents are active; an insertion to the despatch table of a Details Agent will trigger an attempt by that Agent to discover basic information pertaining to a device (provided that CTRL is running).

6. The Details Agent interrogates the network through the Helper Server for basic device information about the discovered devices sent to it by DISCO.

7. The Details Agent returns information (retrieved from the network) about the discovered devices to the Details Agent returns table. The Details Agent determines whether or not there is SNMP access to a device. If a device does not have SNMP access, then it is not passed to any of the Discovery Agents. Instead, it is passed but to the Working Entities database where a record of all discovered devices is stored in the entityByName table.

8. A Stitcher moves information (of those devices with SNMP access) from the Details Agent returns table to the different Discovery Agents, as specified in the DiscoSchema.cfg configuration file. First, however, these devices are filtered according to the criteria in the Agent Definition file, which restricts the devices being inserted to the Discovery Agents despatch tables.

Figure 7-2 Discovery process data flow

1. As stated previously, the despatch tables for all of the Agents are active; an insertion to the despatch table of a Discovery Agent will trigger an attempt by that Agent to discover connectivity information pertaining to a device.


Note There are various categories of Discovery Agents designed to handle different types of devices.


2. The Discovery Agents set up a TCP socket-based communication link with the Helper Server and request the connectivity information pertaining to the devices that have been sent to them by DISCO.

3. The process of retrieving information from the Helper Server can be performed in one of the following scenarios:

Scenario one—The information requested by the Discovery Agent X is not present in the Helper Server or has expired. In order to service Agent X's request, the Helper Server deploys the Helpers, which go out, retrieve the information from the network and place it in the Helper Server to service the requests from any Agent (Agent X inclusive) before the information expires. Agent X receives the information from the Helper Server, processes it, and then returns the information to its returns table.

Scenario two—The information requested by the Discovery Agent Y is present in the Helper Server and is yet to expire. Agent Y's request is serviced by the Helper Server directly and Agent Y receives the information, processes it, and then returns the retrieved information to its returns table.

See "The Helper System," for a detailed explanation of how the Helper Server retrieves device and connectivity information from the network.

4. A Stitcher processes the information in each Discovery Agent's returns table and updates the Working Entities database. It is also possible to configure the discovery so that newly discovered connected devices can be fed back to the Finders returns table for processing. (The feedback Stitcher takes data from the agent returns table and sends it to the pingFinder ping rules table.)

5. A Stitcher processes the information in each Discovery Agent's returns table and populates the EBN and EBNR tables in each Discovery Agent's database. This process is configurable and can be performed whilst the Discovery Agent is operating or after it has finished its operation.

6. Special data processing Stitchers interact with the different categories of Discovery Agents, using data from their returns tables and the Working Entities database (which contains named entities) to produce various temporary network topologies, e.g., Layer 2 topology, Layer 3 topology, FDDI topology, etc.

7. Upon the completion of all the Discovery Agents' tasks (bearing in mind that some finish before others) and the generation of all the necessary temporary network topologies, another set of Stitchers work to amalgamate all the temporary network topologies into a single topology, which is formed in the Full Topology database. It is worth noting that the merged topology is still in a name-to-name format.

8. When the name-to-name topology stored in the Full Topology database is complete (signified by the completion of the Stitchers' tasks), a set of specialized topology Stitchers work on the topology to deduce and create the Containment Model. The default Containment Model is based on physical containment and VLAN membership and is processed and stored in the Scratch Topology database.

9. When the topology Stitchers complete their operation and generate the network topology Containment Model, the discovery data flow activates a Stitcher that converts the Scratch Topology to the final Impact Topology by default. Alternatively, you may set up the discovery data flow so that you as the user can verify the topology and activate the Stitcher.

10. As soon as the Impact Topology is formed, another Stitcher is automatically triggered to move the Containment Model from DISCO to MODEL, the topology store, which is interrogated by other components of MWFM NMOS, such as MONITOR, for topology information.

Figure 7-3 Discovery process data flow

Configurable discovery data flow

It has been mentioned that DISCO is flexible because it allows you the option of configuring the entire data flow of the discovery process. Figure 7-1, Figure 7-2, and Figure 7-3 show the flow of information in DISCO during the discovery process. You can see that the Stitcher processes play an important role in the transfer of data from one database to the other; thus, if you change the way in which the Stitchers are triggered and the way they operate, you can effectively customize the discovery process to suit your own needs.

A discovery cycle

Throughout this guide you will come across references to the term discovery cycle; a discovery cycle which occurs when the data flow has completed from start to finish. For example, if the default data flow completes once (i.e., from deployment of the Finders to report device existence to topology deduction by the Stitchers), one discovery cycle is said to have occurred.

A full discovery process may require more than one cycle.

Altering the discovery data flow

The alteration of the discovery data flow may be regarded as the alteration of the process dependencies, i.e., a reconfiguration of the event criteria that triggers the deployment of the Stitchers and the Discovery Agents. The event criteria may be one or a combination of the following events:

The insertion of data into a specific database table.

The completion of another Stitcher.

The completion of a specific Discovery phase.

The completion of a certain Discovery Agent.

This list of trigger conditions is not exhaustive.

Stitchers are the processes that move and process data between databases and tables of DISCO; hence, in order to change the discovery data flow, you simply change the actions that instigate the deployment of the Stitchers and the Stitching rules (if necessary).

In order to change the process dependencies, you must modify the Stitcher triggers in the Stitchers and, if necessary, the Agent Definitions section of the Discovery Agents. When these alterations have been made to the necessary processes, DISCO will automatically pick up these changes during a periodic scan of the Agent and Stitcher files (the scan frequency is determined by the entry in the DISCO database created at DISCO startup) and make the necessary changes to its Agent and Stitcher Definitions databases.

After the changes have been made, DISCO applies the newly defined discovery data flow to the next discovery cycle. See also "The Stitchers" and "The Discovery Process," for more details on the Stitchers and keywords used to specify Stitcher dependency.

Altering the data flow: on-demand Stitchers

In addition to the flexibility of changing the complete discovery data flow, DISCO also allows you the flexibility of starting the discovery cycle after data collection has completed. This is accomplished by placing a Stitcher which has an onDemand trigger in its Definitions section into the actions table of the Stitchers Definitions database (see "DISCO Databases" for more information about the actions table).

Upon the insertion of the Stitcher name into the table, DISCO proceeds to run the Stitcher. Since there are other actions that may be configured to act on completion of the Stitcher, the discovery cycle will continue from that point onwards.

Recursive Agents

You may want to change the discovery data flow if you have written a recursive Discovery Agent. A recursive Discovery Agent triggers the discovery of other devices as a result of the reported connectivity information. In order to achieve this, you may specify a Stitcher to process and move devices from the Agent returns table to the Finders despatch table, which will be sent to the Details Agent, e.g., Feedback.stch.

Figure 7-4 Discovery data flow: Recursive Agents

Discovery Exceptions

The flexibility of the discovery data flow gives you the opportunity to specify SNMP exceptions. Discovered devices can be filtered in this way by making entries to the tables of the Scope database, which are created and defined in the DiscoSchema.cfg and can be edited by appending OQL inserts into the tables:

The zones table—any devices that match this filter will not be passed to the Details Agent, i.e., the devices will not be discovered and SNMP access to these devices will not be attempted.

The detectionFilter table—devices matching the filter will be passed to the Details Agent and SNMP queries for device information will be made; however, no interfaces on the device will be discovered and the information will not be passed to the Discovery Agents. This filter can be used to restrict SNMP access to a device.

The instantiationFilter table—this filter is used when the user wishes to restrict the display of discovered devices.

Figure 7-5 Discovery process data flow: discovery exceptions

Primary verification of device existence

The flexibility of a configurable discovery data flow can be used for performing a primary verification of the existence of devices reported back by the fileFinder (a Finder responsible for parsing system files and inferring device existence from the information contained therein) during the discovery. The following case scenarios examine some situations that demonstrate this principle:

CASE 1: Well maintained file system

In this instance you are sure of the integrity of the file system. The data flow conforms to the default configuration, i.e., any device that is reported by the fileFinder is accepted without doubt, sent to the Details Agent for retrieval of basic information, and discovery continues as normal

Figure 7-6 The well maintained file system.

CASE 2: Badly maintained file system

In this instance there is reason to doubt the validity of the devices reported by the fileFinder, possibly because of a rapidly changing network. You can configure the discovery data flow such that any entry into the Finders database that was created by the fileFinder gets sent to the pingFinder, which proceeds to verify the existence of the device on the network by pinging the device.

If the device exists, the device is returned to the Finders database with a new creator (the pingFinder). Then, the device is sent to the Details Agent for retrieval of basic information, and discovery continues as normal. However, if the pingFinder cannot verify the existence of the device, it is discarded. This data flow eliminates the unnecessary processing on nonexistent devices in the network.

Figure 7-7 The badly maintained file system

The Containment Model

"Active Objects," introduced the concept of containment and described its advantages regarding network management. One of the key features of DISCO is its ability to generate the Containment Model according to your specifications. The Containment Model is generated from the resolved name-to-name connections and it may be regarded as an expression of the network topology containing a range of devices, the devices in turn containing cards, and the cards containing ports to which other devices are connected.

DISCO supports the generation of both physical and logical containment models.

Containment models are discused below.

Physical containment

The physical Containment Model is an expression of containment pertaining to physical devices, i.e., devices or chassis containing cards, which contain ports. The physical Containment Model enables you to perform device management down to the port level (as depicted in"Active Objects").

Logical containment

DISCO also supports the generation of a logical Containment Model, which shows objects contained within logical containers—containers that do not necessarily exist in the physical sense. A perfect example is a VLAN container, which is a logical grouping of devices, cards, and ports that are not necessarily physically connected together or in the same location. The default logical Containment Model is based on VLAN containment.

Altering the Containment Model

As one of the advanced features of discovery, DISCO offers you the ability to configure the Containment Model. In order to generate a custom Containment Model, you have to write a Stitcher (using the Stitcher language) that will produce the required Containment Model and configure the discovery data flow to include your Stitcher. Refer to "The Stitchers," for a full syntax definition of the Stitcher language.

Advanced discovery features

A stages approach to discovery

Generally speaking, the discovery process can be divided into two somewhat overlapping stages—the data collection stage and the data processing stage. These stages are controlled by the user and these stages can be configured to complete discovery in any way you require. These stages are discussed below.

The data collection stage

The data collection stage is characterized by interrogation of devices and retrieval of the information necessary to produce a network topology. This stage involves the Finders, Agents and Helpers.

The data collection stage can be divided up into many phases although there will typically only need to be three phases, i.e., phase one, phase two, and phase three, where phase 0 indicates that the discovery has completed and the topology is being determined. Within this stage the data collection entities (Finders and Discovery Agents) have the potential to behave differently. The details of the data collection phases are discussed in depth further on in this guide.

The data processing stage

The data processing stage performs topology deduction. During this stage all of the information collected during the data collection stage is analyses, interpreted, and processed by the Stitchers to produce a network topology, which culminates in the Containment Model. As mentioned previously, the data collection and data processing stages can overlap because Stitchers are configured to process connectivity information from different Discovery Agents before the main stitching operation begins. In addition, you may configure more Stitchers to handle new Discovery Agents.

The phases of the data collection stage

The following phases demonstrate how you might configure the data collection stage phases and the processes that occur within the phases.


Note These phases are for example purposes only; your configuration does not have to follow this schema.


The following example involves the Switch Discovery Agents and the ARP Cache Discovery Agent working together.

Discovery data collection phase one

The first phase of the discovery is the identification of all the devices that exist on the network—taken care of by the Finders. Following the example, this would involve the Switch Discovery Agents downloading all VLAN and interface information before proceeding to phase two.

In order to progress to phase two, the processes necessary for phase one must have finished; in addition to this standard transition there is some extra functionality attached to phase one. Though it may be desirable, it is neither practical nor efficient to hold back the discovery process until all devices have been discovered; therefore, phase one will complete when the rate of discovered devices reaches a certain level, explained further on in this chapter.

Furthermore, some Agents will return data that can be used to find other devices, e.g., the IP address of remote neighbors or the subnet within which a local neighbor exists. The Feedback Stitcher sends such new information to the pingFinder at which point it is included in the discovery—these additional devices will be discovered provided they are in scope. Any Agent involved in this process will have to be run in phase one; otherwise, the new devices will not be discovered until a rediscovery is initiated. IP Agents typically run in phase one for this reason.

Discovery data collection phase two

When phase one has completed, phase two begins. In order to map layers two and three of the OSI model, the ARP Cache Discovery Agent populates the Helper Server with ARP data, i.e., a complete list of device IP address to MAC address resolution. Remember that it is not necessary to configure your discovery this way.

For a transition to occur from phase two to phase three, the processes must have finished.

Discovery data collection phase three

When phase two has completed, phase three begins. If any delayed action that should have occurred in phase two is detected, phase three will continue while the process runs through phase two for the entity in order to catch up. With full knowledge of all devices that exist within the network (acquired from phase one) and access to full IP address to MAC address mappings for all devices in the Helper Server (acquired from phase two), the Switch Agents can proceed to download all switches' forward database table information whilst pinging all devices to confirm accuracy of the tables' contents.

When phase three has finished, signified by the completion of all processes scheduled to run in this phase, the discovery is ready to proceed from the data collection stage to the data processing stage, where all connectivity information will be knitted together to form a network topology.

Figure 7-8 The data collection and data processing stages of Discovery

The impact of the stages approach on DISCO processes

The division of the data collection stage into sub-phases affects all the processes that make the discovery and deduction of network topology possible. This is because the phases of the data collection stage are processed in order, i.e., from phase one to phase two, to phase three, etc. A phase cannot begin until the criteria for completion of the previous phase have been met.

This implies that all the processes of DISCO must have an associated phase(s) in which they are allowed to operate. Thus, whilst the Finders will typically be configured to run through all phases, you might want to configure certain Discovery Agents to operate only within a specific phase(s). The flexibility of DISCO allows you to have processes that are intelligent enough to behave differently when they operate within different phases, and can pass control to other processes or stop operation until the start of their next operational phase. Examples of the use of this concept will be given further on in this chapter, in order to clarify the concept.

Determining the phase completion criteria

The next obvious question is: what determines the end of a phase? You have been introduced to the phased approach adopted for the data collection stage of the discovery process, and we have seen that the next phase cannot begin unless the present phase completes. The processes scheduled to run within the phase must have been initialized and subsequently completed. A continuous check is made to monitor the number of processes that are configured to operate in a phase, the number of processes that have been launched in a phase, and the number of processes that have completed their operation in a phase. It is only when all launched processes have completed that a signal is broadcast signifying the end of the phase and that next phase can commence.

A phase is said to have been completed when the configured phase completion criteria are fulfilled; for the Agents, this is determined by the DiscoAgentCompletionPhase part of the Agent Definition file. An Agent is deemed to have finished once all entities in its despatch table have appeared in its returns table.

In order for phase one to complete, the "find rate" of devices must drop below a certain threshold. This fixed but configurable criteria must be satisfied before phase one can be completed, which occurs when either the average rate of discovered devices reaches a certain threshold (by configuring m_AvgeFindPeriod in the DiscoSchema.cfg) or when no devices have been discovered for a specified amount of time (by configuring m_NothingFndPeriod in the DiscoSchema.cfg).


Note Agents are multithreaded—records of discovered devices passed to the Agents are tagged with a certain phase. As a result an Agent can, at a particular time, process devices in two separate phases.


Exceptions to the phase completion criteria

In order to prevent the scenario whereby a discovery cycle has advanced to phase two or any subsequent data collection phase and suddenly a new device is reported back to DISCO by the Finders (an action that could potentially cause the abortion of phase two and the restart of phase one), DISCO locks out the Finders processing table and any devices that are found in subsequent phases of the discovery cycle are returned from the Finders returns table to the Finders pending table of the Finders database.

Devices that are placed in the pending table are retained and only when the present discovery cycle has completed will they be transferred to the processing table of the Finders database for the commencement of another discovery cycle. The interval during discovery cycles whereby the information from the Finder returns table is being transferred to the pending table is called the Blackout State.

The Blackout state is a boolean flag that is set to either 0 or 1; when the data collection stage is still in phase one, the blackout state is set to 0 and data returned from the Finders returns table is allowed to go to the Finders processing table. However, when the data collection stage proceeds beyond phase one, the Blackout state is set to 1 and data returned from the Finders returns table is sent to the Finders pending table, awaiting the completion of the current discovery cycle.

Why use different data collection phases?

At this point it becomes imperative to give an example of why it is advantageous to apply a stages approach to discovering and knitting together a network topology.

Switch connectivity

In determining the true connectivity of some devices, it is sometimes necessary for the Discovery Agent to know all the devices that exist before requesting particular MIB variable(s), especially if the requested information is of a transient nature.

A good example is when the Layer 2 Agents discover connectivity between Ethernet switches. Ethernet switches are characterized as having forward database tables that expire over time and in order to ensure that a switch has a fully populated forward database table at the time of interrogation, one of the techniques available is to ping all devices associated with the switch, which will ensure that the forward database table is fully populated and that we have the most up-to-date information to download.

Hence, in this instance, you would configure the Switch Discovery Agents to perform some other processing in data collection phase one. Once they receive the signal signifying the completion of phase one (i.e., all devices have been found), they will commence phase two operations, which could be to ping all devices within the discovery domain whilst downloading the forward database tables for all switches at once.

Mapping subnet boundaries; Multiple IP:MAC address resolution

One apparent limitation of configuring individual Discovery Agents to make individual ARP requests directly from the Helper Server is that the ARP Helper cannot run simultaneously on multiple subnets unless it is specifically configured to do so. One way of resolving this problem is by utilizing a special ARP Cache Discovery Agent that imitates a generic Discovery Agent in the sense that entities can be sent to it, however it is able to map boundaries or different layers of the OSI model.

Since the ARP Cache Discovery Agent is able to run on different subnets "simultaneously", it has the ability to inquire about ARP caches that exist on routers. It uses this information to populate the ARP Helper database within the Helper Server so as to build up full device IP address to MAC address mapping without having to rely on the ARP Helper.

This approach could be applied again when using Switch Discovery Agents that need to perform IP address to MAC address resolution before they can commence operation. This implies that, following the example, you could configure your discovery data collection stage to have three phases:

1. Phase one—Find all devices that exist on the network

2. Phase two—Utilize the ARP Cache Discovery Agent to populate the Helper Server with full IP address to MAC address mappings.

3. Phase three—Ping all devices and invoke the Switch Discovery Agents to commence their operation by downloading the forward database tables for all switches in the network, utilizing the IP address to MAC address mappings determined in phase two.

Multi-phase Discovery Agents

Another implication of dividing the discovery data collection stage into phases is that it enables you to create Discovery Agents that perform different operations within different phases of the discovery.

This implies that despite the fact that a Discovery Agent is programmed to commence operation in phase two, ideally nothing prevents you from specifying that some other operation should be carried out in phase one. This is because the end-of-phase one marker only signifies that all devices have been nominally discovered. The Agent could be configured to do other things such as downloading interfaces, TELNET requests, or download other MIB variables during phase one. Only when phase two has commenced does the Agent begin to perform processing instructions specific to phase two.

Generally speaking, it is a good idea for you to configure your discovery to occur over multiple phases, so as to ensure maximum accuracy of the deduced topology.

Criteria for multi-phasing

Generally speaking, the main criterion for configuring your discovery to have multiple phases is an assessment of the requirements of the different operations that will be performed during the discovery process.

From the examples given above, you can infer that Ethernet-based Discovery Agents will require at least two phases. Figure 7-8 shows that it is possible to have Discovery Agents that can operate in any phase, or even a Discovery Agent that can operate in phase three.

Managing the phases

The different phases of the discovery data collection stage are managed by an internal phase manager that is responsible for reading the maximum overall phase number and calculating the total number of phases as soon as all the Discovery Agents and Stitchers definition files are loaded up. The phase manager is also responsible for calculating the phase and process dependencies, i.e., which Discovery Agents are scheduled to run in which phases.

The phase manager also monitors the processes running during the phases and monitoring their completion so that a completion of phase signal can be sent to all processes that are on standby ready to be launched in the next phase.

Effect of discovery multi-phasing on network traffic

The effects of multi-phasing on network traffic is managed by the Helper Server, which serves as the intermediary between the Discovery Agents and the network. One of the most beneficial features of the Helper Server is that it amalgamates multiple pings of the same device requested by different Discovery Agents into one block so that multiple pings of the same device are resolved into one ping.

The Helper Server also has a request pool, which ensures that only a certain number of requests are handled simultaneously; the request pool ensures the Helper Server refrains from overloading the network.

After the first discovery: rediscovery

DISCO changes its operation to a reactive state called the rediscovery stage when the data collection and data processing stages of the first discovery have completed and a network topology has been sent from DISCO to MODEL. DISCO can be configured to perform either a full or a partial rediscovery. A full rediscovery, as it suggests, will instigate the discovery of the whole network, while a partial rediscovery may be limited to a particular subnet.

The overall operational stages of DISCO as described in the previous sections of this chapter remain the same, i.e., DISCO adheres to the configured operational data flow. However, whilst in this stage, DISCO operates in a reactive mode, listening for events (such as traps), which prompt it to initiate a rediscovery of a particular subnet or device.

The receipt of a trap from a device within the network can instigate a rediscovery. In this instance you would configure the data flow so that the receipt of a trap by the Trap Finder will prompt DISCO to assess whether a full or partial discovery should be conducted on that part of the network topology.

Conclusion

This chapter provided an explanation of how the MWFM NMOS component DISCO discovers the connectivity of network devices and produces the Containment Model by using processes such as Finders, Agents, Helpers and Stitchers. It also provided a detailed explanation of the discovery process, the concept of a configurable discovery data flow, an explanation of discovery cycles, the concepts of phasing the discovery process, and the different modes of discovery, i.e., discovery and full or partial rediscovery.

The next chapter introduces the Finders, which are the executable processes responsible for discovering network entities.