Table Of Contents
Troubleshooting VNE Modeling
Troubleshooting Management Issues (Device Reachability)
Troubleshooting VNE Communication State Issues
Check the Communication State in Cisco ANA NetworkVision
Check the Communication Details in Cisco ANA NetworkVision
Check the Ticket in Cisco ANA EventVision
Troubleshooting VNE Investigation State (Discovery) Issues
Registry Settings for VNE Discovery Timeout and Investigation State Reporting
Opening a Bug Report
Troubleshooting VNE Modeling
These topics provide procedures to help you troubleshoot VNE modeling problems.
•
Troubleshooting Management Issues (Device Reachability)
•
Troubleshooting VNE Investigation State (Discovery) Issues
•
Troubleshooting VNE Communication State Issues
•
Opening a Bug Report
Additional VNE administration tasks are described in:
•
Basic AVM and VNE Administration Tasks
•
VNE Administration: VNE Lifecycle and Creating VNEs
•
VNE Updates
Troubleshooting Management Issues (Device Reachability)
Figure 21-1 illustrate the two aspects that determine a VNE's communication state: agent communication, which describes reachability between the Cisco ANA gateway server and the VNEs, and management communication, which describes the reachability between a Cisco ANA VNE and the network device it is modeling. Both must function in order for Cisco ANA to properly model and manage a device.
Figure 21-1 VNE Communication States—Management and Agent
Management communication is the more challenging domain because it is far more common for devices to become unreachable than for a VNE to go down. There can be many scenarios: perhaps only the Telnet protocol is down but everything else is fine; or all protocols are down but the device is still "alive" (sending syslogs and traps); or all protocols down, and the device is not even generating traps or syslogs. To provide the most accurate reachability status, Cisco ANA does the following:
•
Tracks protocol health by performing reachability tests that are tailored to the different types of protocols.
•
Provides different management communication policies that you can choose, depending on how more or less strictly you want to track protocol health.
•
Allows you to fine-tune both of the above to fit the needs of your network.
•
Provides detailed information for troubleshooting purposes.
For details about how Cisco ANA does all of the above, see Device Reachability.
The most common management problem is when Cisco ANA reports that a VNE communication state is Device Partially Reachable because at least one protocol is not operational (this is the default behavior for protocol reporting and can be changed; see VNE Management Communication Policies and How To Change Them). An example of a Device Partially Reachable indicator is shown in Figure 21-2.
Figure 21-2 Device Partially Reachable Status Indicator in Cisco ANA NetworkVision
The following procedure describes how to troubleshoot device reachability problems.
Step 1
From the Cisco ANA NetworkVision map view, double-click the device that is reporting a reachability issue. This opens the device properties window.
Step 2
Check the information at the top of the device properties window, as shown in Figure 21-6.
Note
If the VNE was stopped, you will see a message and a refresh button at the top of the properties window. If the VNE was restarted, refreshing the window will repopulate the information. However, if the VNE is still down, refreshing the window will result in an error message. To start the VNE, see Changing VNE Status and Lifecycle (Start, Stop, Maintenance).
Figure 21-3 VNE States and Communication Details (in Cisco ANA NetworkVision)
Step 3
Click Communication Details at the bottom of the window.
Note
Information for a protocol is only populated if the protocol's trackreachability registry key is enabled (see Customizing Protocol Reachability Testing).
Figure 21-4 Management (Protocol) Communication Details
See Table 21-1 for information on the data in the Communication Details page.
Step 4
Check for a System event in Cisco ANA NetworkVision. See Check the Ticket in Cisco ANA EventVision.
Step 5
Check the AVM to see if a specific VNE is causing the problem. See the procedure under Agent Unreachable (AVM or Unit Issues)—Troubleshooting, referring to Step 5.
Table 21-1 provides information about the fields in the Communication Details window, and suggestions for troubleshooting steps based on the information you see.
Table 21-1 Information in the Communication Details Window
Field
|
Description
|
Management Connectivity State
|
Policy
|
Policy being used by Cisco ANA to determine device reachability and when to change the communication state to Device Unreachable. See VNE Management Communication Policies and How To Change Them.
|
notstrict
|
Change state to Device Unreachable when:
• All of the protocols (SNMP, Telnet, ICMP) are down, and
• No traps or syslogs were sent by the device for the past 6 minutes.
Change state to Device Partially Reachable when:
• All of the protocols (SNMP, Telnet, ICMP) are down.
• Traps or syslogs are being sent by device.
|
ensure- management
|
Change state to Device Unreachable when:
• All of the protocols (SNMP, Telnet, ICMP) are down.
The status of traps/syslogs is not considered. This is the default policy.
|
strict
|
Change state to Device Unreachable when:
• At least one of the protocols (SNMP, Telnet, ICMP) are down.
The status of traps/syslogs is not considered. (Because the state goes directly to Device Unreachable, you will never see the Device Partially Reachable communication state when using this policy.)
|
Reduced Polling
|
Reports whether VNE is using reduced polling mechanism to control polling. Reduced polling means polling is performed only when a poll-worthy event is received from device, thus reducing the overall polling (true if enabled, false if disabled). For information on the reduced polling mechanism, see Reduced Polling.
|
SNMP/Telnet/ICMP Connectivity
|
State
|
Functional state of the protocol.
|
Operational
|
The device is reachable using this protocol, and all of the protocol's properties are functioning (for example, all contexts are reachable).
|
Protocol Partially Functional
|
The device is reachable using this protocol, but not all of the protocol's properties are functioning (for example, not all contexts are reachable).
|
Down
|
The device is not reachable using this protocol.
|
Unknown
|
The protocol state is not known.
|
State Description
|
Detail about the protocol state (for example, the host address cannot be resolved).
• First prompt incorrect—Log into the device, copy the prompt, and configure it in Cisco ANA Manage (see VNE Telnet/SSH Settings).
• Communication I/O-Networking Problems—Contact your system administrator.
• Context name is incorrect—Contact your system administrator.
• Username/password is incorrect—Contact your system administrator.
See Troubleshooting VNE Connectivity Problems, for common error scenarios.
|
State Since
|
Timestamp of when the protocol information was last updated.
|
Syslog/Trap Connectivity
|
Syslog/Trap received in last 6 minutes
|
Tells you whether the device is sending traps or syslogs (an indication of whether the device is still "alive"). The format is value (time), where:
• value—Indicates whether a syslog or trap was (true) or was not (false) received in the last 6 minutes. This field is updated whenever a syslog or trap is received.
• timestamp—Indicates when the last change occurred. This field is refreshed whenever you open the Communication Details window.
For example:
false (Mon Jul 19 23:03:33 PDT 2010) means the VNE has not received any syslogs or traps since the time and date listed.
true (Tue Jul 20 05:09:25 PDT 2010) means the VNE has been receiving syslogs or traps at least every 6 minutes since the time and date listed.
If this field is blank, either no syslogs or traps were sent since the VNE was started, or Cisco ANA is using a management policy that does not track syslogs and traps.
If syslogs or traps are not arriving, do the following:
1. Check the status of AVM 100. See Creating AVMs, Viewing, and Editing AVM Properties.
2. Check whether the device is configured to forward traps and syslogs to the unit or gateway that has the running AVM 100. See Managing the ANA Event Collector (AVM 100).
|
See Device Reachability, for more information on management communication policies, including the following:
•
How to change management communication policies
•
How Cisco ANA determines protocol reachability
•
How to customize protocol reachability testing
•
How to troubleshooting SSH and Telnet connectivity issues
Troubleshooting VNE Communication State Issues
The following procedure explains how to respond when you observe a VNE communication state problem. These same steps can be used when a Device Partially Reachable or any unexpected communication state occurs.
Check the Communication State in Cisco ANA NetworkVision
Cisco ANA uses the ICMP, SNMP, and Telnet protocols to determine device reachability as described in How Cisco ANA Determines Protocol Reachability. Probably the most common communication problem is when the VNE communication state changes to Device Partially Reachable, which normally indicates that at least one protocol is experiencing a problem. On the other hand, it could mean the VNE was stopped or moved to maintenance mode.
You can check a VNE's current communication state in the device properties page of Cisco ANA NetworkVision. State changes are signalled in Cisco ANA NetworkVision by the network element icons and their decorators. Figure 21-5 provides examples of some of the icons and decorators you may see. The figure is followed by a procedure that explains how to interpret and troubleshoot this information.
Figure 21-5 VNE State Icons and Decorators in Cisco ANA NetworkVision Map View
Step 1
From the Cisco ANA NetworkVision map view, double-click the icon in which you are interested. This opens the device properties window.
Step 2
Check the information at the top of the device properties window, as shown in Figure 21-6.
Figure 21-6 VNE Communication State (in Cisco ANA NetworkVision)
Step 3
Check the examples in the following topics for more information.
Check the Communication Details in Cisco ANA NetworkVision
You can check the state of all device protocols, including whether the device is sending traps and syslogs, in Cisco ANA NetworkVision, as follows.
The complete procedure is described in Troubleshooting Management Issues (Device Reachability). In Figure 21-7
Figure 21-7 Management (Protocol) Communication Details
In this example, the Communication Details page provides the following information (this contents of this page are described in Table 21-1):
•
The VNE is using management communication policy is ensure-management (Policy: ensure-management), which is the default. This means the device moves to Device Unreachable only when all protocols are down.
•
The VNE is not using reduced polling (Reduced Polling: False).
•
The Telnet protocol is down (Telnet Status: Down).
•
The Telnet protocol is down because the first prompt was incorrect (Telnet State Description: Protocol failed to get first prompt).
•
The VNE has received traps and syslogs from the device (Syslog Connectivity and Trap Connectivity is true).
The Telnet "failed to get first prompt" problem can be fixed by logging into the device, copying the prompt, and configuring it in Cisco ANA Manage (see VNE Telnet/SSH Settings).
To continue troubleshooting this problem, examine the System event. See Check the Ticket in Cisco ANA EventVision.
VNE Agent Not Loaded (VNE Defined Not Started)—Troubleshooting
The following illustration shows a device that has the Agent Not Loaded communication state. (This is the equivalent of the Defined Not Started investigation state.)
The Agent Not Loaded communication state means the VNE is not responding to the gateway because the VNE was stopped, or it was just created and has not yet been started. As mentioned earlier, a System event is generated whenever the communication state changes; but when a VNE is started, an event is generated only after:
•
All protocols have been tested and a new problem is found (one that was not previously reported).
•
A problem that was found has been resolved.
Note
If the VNE was stopped, you will see a message and a refresh button at the top of the properties window. If the VNE was restarted, refreshing the window will repopulate the information. However, if the VNE is still down, refreshing the window will result in an error message. To start the VNE, see Changing VNE Status and Lifecycle (Start, Stop, Maintenance).
To troubleshoot a VNE in this state:
1.
Check the VNE, AVM, and unit status using Cisco ANA Manage.
2.
Verify the management communication information. See Check the Communication Details in Cisco ANA NetworkVision.
3.
Examine the System event. See Check the Ticket in Cisco ANA EventVision.
Agent Unreachable (AVM or Unit Issues)—Troubleshooting
In the following illustration, the VNE named VNE 1 has the Agent Unreachable communication state.
This state means the VNE is not responding to the gateway. This can happen if the unit or AVM is overutilized, the connection between the gateway and unit or AVM was lost, or the VNE is not responding in a timely fashion. (Remember that a VNE in this state does not mean the device is down; it may still be processing network traffic.)
To troubleshoot a VNE in this state:
1.
Check the VNE, AVM, and unit status using Cisco ANA Manage and check the amount of available memory.
2.
Use the diagnostics tool to check memory usage, GC, and CPU usage; see Obtaining Diagnostic Information Using Graphs.
3.
Verify the management communication information. See Check the Communication Details in Cisco ANA NetworkVision.
4.
Examine the System event. See Check the Ticket in Cisco ANA EventVision.
5.
Examine the AVM to see if a specific VNE is causing the problem. See VNE or AVM reachability issues are often due to CPU-related resource problems.
Device Reachable (But Device Unsupported)—Troubleshooting
In this illustration, the VNE named 3750E-24TD-AGG2 is reachable but is not supported by Cisco ANA, so only basic modeling information is provided.
To troubleshoot a VNE in this state:
1.
Examine the System event. See Check the Ticket in Cisco ANA EventVision.
–
If you have chosen the wrong scheme, delete and readd the VNE using the right scheme. See Creating VNEs and Viewing and Editing VNE Properties.
–
If the polling method is not supported, edit the VNE to use the proper method. See Finding Out Whether a VNE is Using Reduced Polling.
2.
If the device type is not supported:
–
You can add the VNE as Generic VNE or ICMP VNE. These VNE types are specified in the VNE General properties; see Table 20-6.
–
You can add the support using the Cisco ANA VNE Customization Builder. See the Cisco Active Network Abstraction 3.7.2 Customization User Guide.
Device Unreachable—Troubleshooting
In this illustration, a VNE named VNE-7 has the Device Unreachable communication state.
By default, a VNE with this communication state indicates that all of the device protocols are down (though the device may be sending traps or syslogs). This could be due to a problem such as a wrong IP address, or incorrect device credentials.
To troubleshoot a VNE in this state, see Troubleshooting Management Issues (Device Reachability).
VNE Deleted or Moved—Troubleshooting
In the following illustration, VNE-2 has been moved to a different AVM, or has been deleted from Cisco ANA.
Check the Ticket in Cisco ANA EventVision
When a VNE communication state changes, Cisco ANA correlates the generated System event and generates a ticket. If needed, you can check the ticket to see if there is any additional information about the event.
Note
Keep in mind that if an AVM or unit crashes, Cisco ANA will not generate a Service event for the communication state change, because event-generating entity (the AVM or unit) is itself down. However, the GUI will display the VNE/Agent Unreachable icon. Any tickets related to the problem (that were sent before the crash) will remain open until the VNE restarts and generates a clearing event. If no related tickets were sent before the crash, check Cisco ANA EventVision for other related information.
Step 1
Log into Cisco ANA EventVision and click the Tickets tab.
Step 2
Open the ticket for the device (see Figure 21-8).
Step 3
Check the information in the Details area.
Figure 21-8 Ticket Details for a Device Partially Reachable Ticket
Step 4
If you want more information, you can adjust the registry setting so that Cisco ANA EventVision generates an elaborated report about state changes. See Table 21-2.
If you need more information about protocols and how Cisco ANA determines whether a protocol is reachable, see How Cisco ANA Determines Protocol Reachability.
Troubleshooting VNE Investigation State (Discovery) Issues
VNE investigation states describe how successfully a VNE has modeled the device it represents. You can check a VNE's current investigation state in the device properties page of Cisco ANA NetworkVision. State changes are signalled in Cisco ANA NetworkVision by the network element icons and their decorators. Figure 21-5 provides examples of some of the icons and decorators you may see. The figure is followed by a procedure that explains how to interpret and troubleshoot this information.
For information on how to control discovery behavior and get more information about discovery problems, see Registry Settings for VNE Discovery Timeout and Investigation State Reporting.
Note
At any time you can restart the VNE discovery process by restarting the VNE (see Changing VNE Status and Lifecycle (Start, Stop, Maintenance)).
Figure 21-9 VNE State Icons and Decorators in Cisco ANA NetworkVision Map View
The following procedure explains how to respond when you observe a VNE investigation state problem. These same steps can be used when a VNE remains in the Discovering or Partially Unsynchronized state, or the VNE is in an unexpected investigation state.
Step 1
From the Cisco ANA NetworkVision map view, double-click the icon in which you are interested. This opens the device properties window.
Step 2
Check the information at the top of the device properties window, as shown in Figure 21-6.
Figure 21-10 VNE Investigation State (in Cisco ANA NetworkVision)
Step 3
Check the following topics for more information:
•
Maintenance State—Troubleshooting
•
Discovering Investigation State—Troubleshooting
•
Currently Unsynchronized Investigation State—Troubleshooting
Maintenance State—Troubleshooting
In this illustration, the VNE named 10.56.23.119 is in the maintenance investigation state.
If a user did not manually move the VNE to this state, this state change was mostly likely done automatically by Cisco ANA as part of the adaptive polling mechanism. If adaptive polling is enabled and a device's CPU usage exceeds its configured upper threshold, the VNE is moved to slower polling and, if the problem continues, it is moved to maintenance mode. This prevents the VNE from using too much of the network element's CPU resources.
For information on how to solve high CPU usage and adjust the adaptive polling mechanism, see Adaptive Polling.
Discovering Investigation State—Troubleshooting
In this illustration, the VNE named 10.56.23.124 is in the Discovering investigation state. The VNE is building the model of the device (the device type was found and is supported by Cisco ANA).
To troubleshoot a VNE that does not move out of this state, perform the following steps:
1.
Verify that all required device configuration tasks have been performed. If they were not, Cisco ANA cannot properly model the device. See Device Configuration Tasks for VNE Creation.
2.
Verify that there are no communication state issues. See Troubleshooting VNE Communication State Issues. Also see Troubleshooting Management Issues (Device Reachability).
3.
Verify that the VNE is using the proper scheme. See Choosing a VNE Scheme.
4.
Verify that the device is using the proper polling method. See Finding Out Whether a VNE is Using Reduced Polling.
The default discovery timeout is 30 minutes but is customizable. To change the timeout, see Registry Settings for VNE Discovery Timeout and Investigation State Reporting.
Currently Unsynchronized Investigation State—Troubleshooting
In this illustration, the VNE named 10.56.123.21 is in the Currently Unsynchronized investigation state.
If a required, recoverable device command failed, the device command will be retried 3 times per polling cycle. The VNE may recover from the failure on subsequent polling cycles.
To troubleshoot a VNE that does not move out of this state, perform the following steps:
1.
Verify that all required device configuration tasks have been performed. If they were not, Cisco ANA cannot properly model the device. See Device Configuration Tasks for VNE Creation.
2.
Verify that there are no communication state issues; specifically, check for a System event in Cisco ANA NetworkVision (see Check the Ticket in Cisco ANA EventVision). The problem may be due to the fact that the device did not respond in a timely manner. Also perform the troubleshooting steps in these sections:
–
Troubleshooting Management Issues (Device Reachability)
–
Troubleshooting VNE Communication State Issues
3.
Verify that the VNE is using the proper scheme. See Choosing a VNE Scheme.
4.
Verify that the device is using the proper polling method. See Finding Out Whether a VNE is Using Reduced Polling.
5.
Enable the investigation state update event and summary event to gather discovery details (see Registry Settings for VNE Discovery Timeout and Investigation State Reporting).
6.
Open the device properties window in Cisco ANA NetworkVision. Place your cursor in the inventory window, and press F2. Click Managed State Aspect and review the information. This information is especially useful when working with the Cisco Technical Assistance Center.
Registry Settings for VNE Discovery Timeout and Investigation State Reporting
Table 21-2 lists registry settings you can change to control the following discovery and state reporting behaviors:
•
Whether Cisco ANA should generate a Service event and long event description when an investigation state changes. This is not done by default because it can affect performance and cause unnecessary concern to operators. (Service events are generated for communication state changes by default.)
•
The number of retries for device commands issued during the discovery process, and whether the device command is required.
•
Whether Cisco ANA should use the timeout mechanism or the convergence mechanism to determine when the discovery process is complete. (You can also adjust the length of the discovery timeout.) For information on these mechanisms, see Cisco Active Network Abstraction 3.7.2 Theory of Operations.
Note
All changes to the registry should only be carried out with the support of Cisco. For details, contact your Cisco account representative.
Table 21-2 Registry Settings for Discovery and Investigation States
Registry Entry
|
Description
|
Default Value
|
Investigation and Communication State Reporting
|
site/agentdefaults/da/investigation-progress/investigation-state-update-event
|
Generate a Service event (in Cisco ANA EventVision) when investigation state changes
|
false
|
site/agentdefaults/da/investigation-progress/investigation-state-result-summary-event
|
Include an elaborated report about the investigation state change in the Long Description field of the Service event
|
false
|
Device Commands Used for Discovery
|
site/interfacebasedscheme/default registration/error update tolerance
|
Allowable number of device command failures, after which an error is generated.
|
3
|
site/interfacebasedscheme/default registration/required
|
Designate the device command as required for evaluating an investigation state (insert this after the device command key name)
|
false
|
VNE Discovery Period Controls
|
site/agentdefaults/da/investigation-progress/max-delay-before-managed-state-in-milliseconds
|
Timeout for VNE discovery process (in milliseconds) (ignored if convergence is being used)
|
1800000 (30 minutes)
|
site/agentdefaults/da/investigation-progress/convergence
|
Use the VNE convergence mechanism to control discovery
|
false
|
Opening a Bug Report
After performing the troubleshooting steps in the previous sections, if you still have a problem, you may consider opening a bug (or enhancement request).
Before You Open a Bug
1.
Verify that the network element, event, script, etc. is supported by checking th eCisco Active Network Abstraction 3.7.2 Reference Guide and the Cisco Active Network Abstraction 3.7.3 Release Notes. Also check the addendum to the reference guide, when it becomes available.
Note
If the device is not supported, you can add the support using the Cisco ANA VNE Customization Builder. See Cisco Active Network Abstraction 3.7.2 Customization User Guide. Also, this guide contains an extended procedure for finding out which traps and syslogs are not supported and how to troubleshoot them.
2.
Make sure you have tried all of the troubleshooting steps provided in these topics:
–
Troubleshooting Management Issues (Device Reachability)
–
Troubleshooting VNE Communication State Issues
–
Troubleshooting VNE Investigation State (Discovery) Issues
3.
Provide all of the necessary details for the bug report (reproduce the problem if necessary).
Information You Must Provide
1.
Describe the actual behavior versus the expected behavior. For example, "Module serial numbers are missing from NetworkVision."
2.
Describe how to recreate the error scenario.
3.
Provide the following device details:
–
Device type.
–
Device operating system (including service and patches applied on the NE).
–
Device configuration information. If possible, attach a running config.
–
For device physical modeling issues, details on the physical module.
–
For device logical modeling issues, details on the service.
4.
Collect the following Cisco ANA information:
–
Pertinent AVM log files from $ANAHOME/Main/logs.
–
List of VNE drivers that are installed.
–
Cisco ANA version. From the gateway, run anactl status and note the version and build number that are displayed at the top of the status message.
–
Patch level details. You can use this command:
checkPatchInstallation.pl -v -p
5.
For physical model issues, provide screen captures (of the Cisco ANA GUI clients and the EMS) that show the discrepancies.
6.
For NBI-related issues, provide the IMO or BQL citation.