The Service Diagnostics feature provides a bundled set of Tool Command Language (Tcl) scripts and Embedded Event Manager (EEM) policies written and tested by subject matter experts to facilitate diagnosing common networking issues in the areas of Border Gateway Protocol (BGP), Open Shortest Path First (OSPF), and Quality of Service (QoS). A new feature called Embedded Menu Manager (EMM)-available in Cisco IOS Software Release 12.4(20)T and later Cisco IOS images-may be used to guide the user in installing and deploying these scripts and policies.
Application Programming Interface
Command Line Interface
Embedded Event Manager
Embedded Menu Manager
Embedded Resource Manager
Embedded Syslog Manager
Cisco IOS File System
Menu Definition File
Internetwork Operating System
To Be Determined
To Be Supplied (at a later time)
Tool Command Language
Introduction to Service Diagnostics
The concept behind Service Diagnostics is to automate some of the vast troubleshooting experiences of Cisco engineers by using the existing scripting capabilities and embedded management tools in Cisco IOS. Cisco has been adding and enhancing such tools as EEM, Embedded Syslog Manager (ESM), and Embedded Resource Manager (ERM) to Cisco IOS over the past few years. This feature is meant to be the "glue" that combines one or more of these tools to automate common diagnostic scenarios.
The goal is to isolate the end user from the rigors of Tcl scripting, and/or EEM policy writing, and provide a simple interface for deploying and receiving feedback from scenario-specific troubleshooting scripts. Service Diagnostics provides CLI Tcl shell user interfaces as well as EMM Menu Definition Files (MDF's) for deploying troubleshooting scenarios. The scripts are posted on the Cisco Beyond Website under the Diagnostic category
Deploying Service Diagnostic Scenarios
If your image has the EMM feature, it is much easier to use the MDF vs. the ZIP file. If not, you must use the tclsh helper scripts.
4. Deploy BGP Neighbor Formation Problem Diagnostic Script
5. Deploy BGP Route Problem Diagnostic Script
6. Deploy All BGP Scripts
7. Remove Diagnostic Policies
8. Display Diagnostic Policy Configuration
Enter selection :
4. Press the number "1" (no Enter key is needed). You will be prompted for directories for the EEM user library and user policies as follows:
Enter ? for help
Enter a directory to store the BGP diagnostic policies in the form of a URL
(excluding filename, e.g. disk0:/svc_diag
Enter value [disk0:/svc_diag]:
Enter ? for help
Enter a directory for the user library files in the form of a URL
(excluding filename, e.g. disk0:/user_lib
Enter value [disk0:/user_lib]:
Note: The MDF will query the router's available file systems and present a default directory that had sufficient free space to contain the diagnostic scripts and policies. Press the "Enter" key to accept the default.
Deploy the script via tclsh with parameters "notification" "configuration history option" "event history option" "user policy directory" "user library directory" where:
The value for notification can be "email or syslog or all"
The value for configuration history option, event history option can be "TRUE or FALSE"
The value for user policy and user library directories is the respective full path where the scripts and library files are stored
For each scenario, the following sections document an example command line using tclsh.
THE SERVICE DIAGNOSTIC MESSAGE FOR BGP ROUTE PROBLEM IS:
Best path route exists, Check for other reasons why Routes are not advertised.
QoS script is an application to support qos drop counters. Currently qos drop counters for policymap, classmap, traffic shapping, police, matchstatement etc. are displayed by the mib graphic browser, this is not easily caught when a packet drop happens. Using qos policy can inform a user when a drop happens, without looking at the mib browser manually.
The script runs with an input-file user given and report drop counters based on the template input-file. The user can find drop counters in the template file and it can be customized if user wants an diff_only counter from policymap or a raw counter from policymap or classmap or an rawmatchstatement, etc. The user needs to specify which interface and which direction he/she wants the drop counter.
The inputfile has four fields. "diff_only counter, class name, interface name, input/output direction", where the first field is if the user wants to have diff_only counters, diff_only counter is the new drop counter between the last report time and this new report time. The second field is for which policymap/classmap/matchstatement the user wants to have a drop counter reported (the classname can be a policymap, classmap or matchstatement), the third field is for which interface the user wants to have a drop counter reported and the forth field is for which service policy direction the user wants the drop counter reported.
When diff only is specified, all classmap diff counters for the interface at the service policy user specified, will be reported, so the second field for class_name does not matter.
An example of the user input file:
yes qosc1 gi0/0 input
This one will report all different drop counters in interface giga0/0 at input service policy during this 15 minutes.
The other example can be:
no PingTest e0/0 input
PingTest is the policymap name in the interface e0/0. Our qos script will report which drop counter PingTest will have at interface e0/0 on input direction.
The other example can be:
no PingPackets gi0/0 output
PingPackets is the classmap name in the interface e0/0 here.
when 2 classmaps as inputs are given by user in inputfile as the following:
"no qosc1 giga0/0 input
no qosc1 giga0/0 output"
We will see the different output based on different inputfile, For first example of the inputfile, we will see the following report:
The OSPF neighbor formation scenario includes two scripts:
The script detects the syslog message:
%OSPF-4-DUP_RTRID_NBR: OSPF detected duplicate router-id .* on interface
and gives addition information for which kind of case can cause the duplicate router id.
The script is a timer based script to check if the router is stuck at one of the following states:
1. Stuck at attempt state, which can be caused by wrong neighbor configuration
For example, the following configuration can cause router stuck at attempt state:
router ospf 107
network 126.96.36.199 0.0.255.255 area 0
neighbor 188.8.131.52 ----- The neighbor is not existing and the neighbor should be 184.108.40.206
2. Stuck at init state, can be caused by the access list in the remote end side. The script can provide a suggestion for this case.
3. Stuck at exstart/exchange state, can be caused by an mtu problem.
For ospf neighbor formation problem, the screen output can be:
For duplicate router rid is:
THE SERVICE DIAG MESSAGE FOR ADJ DUPLICATE RID is:
The message happens when two routers are configured with the same router id. Check the router ids on them to make sure they have individual router ids and restart the ospf protocol by doing "no router ospf <ospf_id>" and "router ospf <ospf_id>" in configuration mode or "clear ip ospf <process_id> process" in super user mode
OSPF neighbor 220.127.116.11 is stuck at INIT might be due to access list on remote end blocking OSPF hellos or authentication config is present on one side, Please check the access-list or enable authentication on both sides.
OSPF neighbor 18.104.22.168 is stuck at EXCHANGE might be due to unmatched mtu, the stuck interface GigabitEthernet0/0 has mtu value 3456, Please check mtu value in remote side to make sure they are synchronized
Email result will not be sent, please look at console or buffer on service diagnostic messages
CPU, Memory, or buffer resource diagnostics are all triggered by crossing of user settable usage thresholds. Service Diagnostic scripts in this area use two embedded IOS features - ERM and EEM. The scripts dynamically generate an EEM policy that uses an ERM event detector in the area to be monitored.
The actions/outputs of these policies are limited to reporting the threshold crossings. The value-add (over using ERM directly) is the email notification capability, as well as providing a wrapper for quickly configuring ERM (which is fairly complex). A typical message follows:
THE SERVICE DIAGNOSTIC MESSAGE FOR RESOURCE CPU MONITORING IS :
Process Exec with PID 227 exceeded the configured CPU utilization threshold 20
Please note, caveats:
1. Sometimes there is an EEM Tcl Error which is due to email_template not being restaged. Since this issue does not have definite steps to be reproduced the workaround for this issue is to do the following on the router:
Router#> no event manager directory user library disk#:/<user library name>
Router#> event manager directory user library disk#:/<user library name>
2. All the BGP scripts will work only with IPv4.
3. There are three sev 3 bugs CSCsx65581, CSCsx53550, CSCsx47799 and 2 sev 4 bugs CSCsx60168, CSCsx65614 that are to be resolved.
4. When the BGP configuration on the router is huge, the existing BGP policies may exceed the maxrun time value for the policy, hence the user needs to change the maxrun timer value in the policy according to their BGP configurations.