System Troubleshooting Methodology

System Troubleshooting Methodology

Download PDF
Download Site_Content_PDF
give_us_feedback

The Implementation phase of your network deployment is an excellent time to develop a methodology for troubleshooting the network as a whole. Troubleshooting networking equipment at a system level requires solid detective skills. When a problem occurs, the list of potential suspects is long. You must collect detailed information and systematically narrow the list of potential causes to determine the root problem. This topic does not provide step-by-instructions for resolving problems that occur during network installation. Instead, this topic describes sound methods for troubleshooting your network using the following general steps:

1. Gather Information on the Problem.

2. Isolate Point(s) of Failure.

3. Apply Tools to Determine the Problem's Root Cause.

Gather Information on the Problem

In a contact center network, problems are typically discovered and reported by one of the following types of users:

External customers dialing into a call center to order products, obtain customer service, and so forth.

Internal agents receiving incoming calls from a call queue or initiating outbound collection calls to customers.

Internal users using administrative phones to call employees in other company locations or PSTN destinations, and perform basic actions such as call transfers and dialing into conferences.

As the network administrator, you must collect sufficient information from these users to allow you to isolate the problem. Detailed, accurate information will make this task easier. Table 2 lists recommended questions to ask users when they report a problem. As you turn up your network, you may consider putting these questions in an on-line form. A form will encourage users to provide more details about the problem and also put them into the habit of looking for particular error messages and indicators. Capturing the information electronically will also permit you to retrieve and re-examine this information in the future, should the problem repeat itself.

Table 2 Questions to Ask Users When They Report Problems 

Ask this Question...
To Determine...
Did something fail or did it simply perform poorly?

Whether the issue relates to system degradation or a connectivity failure. An example of a failure is when a user dials a phone number and hears fast busy tone. An example of a performance problem is when a user dials into a conference call and hears "choppy" audio when other parties speak. Quality of service or performance issues require a different approach than connectivity or operational problems. You must still isolate the potential sources of the problem, but you will typically use performance management tools instead of log files.

What device were you trying to use?

The device type, model and version of software installed. It is also critical to capture the IP address assigned to the device, as well as its MAC address. If the case of IP phones, determining the phone's active Cisco Unified Communications Manager server is also important. On Cisco Unified IP phones, these important network values can be displayed by pressing the Settings button and choosing the Network Configuration option from the menu.

Did it ever work?

If a device was recently installed and the problem occurred while making it work for the first time, or if the device was operating normally before the problem occurred. If the device was newly installed, the problem is most likely due to improper configuration or wiring of that particular device. Problems with devices that are already up and running can typically be traced back to one of two causes: (a) the user modifying their device, such as changing their configuration or upgrading software, or (b) a change or failure elsewhere in the network.

Exactly what action(s) did you perform?

The steps that led up to the problem, including which buttons were pressed and in which order. Capturing this information in detail is important so that you can consistently reproduce the problem.

What error message(s) appeared or announcements did you hear?

The visual and audio indicators of the problem. Ask users to provide the exact text that appears and any error codes in either an E-mail or on-line form. If the error indication was audible, ask the user to write down the announcement they heard, the last menu option they were able to successfully choose or the tone they heard when the call failed.

What time did the problem occur?

The date and time to compare against entries in log files. If the problem occurred on a Cisco Unified IP phone, make certain the user provides the timestamp that appears on their phone's display. Several Cisco components in a network may capture the same problem event in separate log files, with different ID values. In order to correlate log entries written by different components, you must compare the timestamps to find messages for the same event. Cisco Unified IP phones synchronize their date and time with their active Cisco Unified Communications Manager server. If all Cisco components in the network use Network Time Protocol (NTP) to synchronize with the same source, then the timestamps for the same problem messages will match in every log file.

What is the number of the phone you used and what was the phone number you called?

If the problem relates to a WAN or PTSN link, or a Cisco Unified Communications Manager dial plan issue. Ask the user the phone number he or she dialed (called number) and determine if the destination was within his or her site, another site within the corporate network, or a PSTN destination. Because the calling number (the number of the phone used) also affects call routing in some cases, capture this number as well.

Did you try to perform any special actions, such as a transfer, forward, call park, call pickup, or meet-me conference? Is the phone set up to automatically perform any of these actions?

If the problem is not directly related to the calling number or called number but rather to the supplementary service setup on Unified Communications Manager or the problem is at the destination phone the user tried to reach by transferring or forwarding the call.

Did you attempt the same action on another device?

If the problem is isolated to that user's device or represents a more widespread network problem. If the user cannot make a call from his or her phone, ask the user to place a call to the same destination using a phone in a nearby office.


Isolate Point(s) of Failure

After collecting information on the symptoms and behavior of the problem, to narrow the focus of your efforts you should:

Identify the specific devices involved in the problem.

Check the version of software running on each device.

Determine if something has changed in the network.

Verify the integrity of the IP network.

Identify Devices Involved in the Problem

In large- to medium-sized networks, it is crucial to identify the specific phones, routers, switches, servers and other devices that were involved in a reported problem. Isolating these devices allows you to rule out the vast majority of equipment within the network and focus your time and energy on suspect devices. To help you isolate which devices were involved in a problem, two types of information can prove invaluable:

Network topology diagrams: It is strongly recommended that you have one or more diagrams that show the arrangement of all Cisco Unified Communications products in your network. These diagrams illustrate how these devices are connected and also capture each device's IP address and name (you may want to also have a spreadsheet or database of the latter information). This information can help you visualize the situation and focus on the devices that may be contributing to the reported problem. See Network Topology Diagrams for recommendations on how to prepare these diagrams.

Call flow diagrams: Cisco equipment, including Unified Communications Manager servers, typically provide detailed debug and call trace log files. To interpret these log files, however, it is useful to understand the signaling that occurs between devices as calls are set up and disconnected. Using the network topology and call flow diagrams in conjunction with the log files, you can trace how far a call progressed before it failed and identify which device reported the problem. Examples of using call flow diagrams for problem isolation are shown in Troubleshooting Daily Operations.

Check Software Release Versions for Compatibility

After you have identified which devices may be involved in the problem, verify that the version of software running on each device is compatible with the software running on every other device. As part of Cisco Unified Communications verification, Cisco Systems has performed interoperability and load testing on simulated network environments running specific software versions. The Release Matrix lists the combination of software releases that were tested.

However, if the combination of releases installed in your network does not match the values in the Release Matrix, it does not necessarily mean the combination is invalid. To check interoperability for a specific device and software release, locate and review its Release Notes. Release Notes contain up-to-date information on compatibility between the product and various releases of other products. This document also describes open caveats, known issues that may cause unexpected behavior. Before beginning extensive troubleshooting work, examine the Release Notes to determine if you are experiencing a known problem that has an available workaround.


Tip The Bug Toolkit requires that you are a Cisco partner or a registered Cisco.com user with a Cisco service contract. Using the Bug Toolkit, you can find caveats for any release. To access the Bug Toolkit, go to http://tools.cisco.com/Support/BugToolKit/ Opens new window.


Determine if Network Changes Have Occurred

Before focusing on the particular device or site where the problem occurred, it may be useful to determine if a change was made to surrounding devices. If something has been added, reconfigured or removed from elsewhere in the network, that change may be the source of the problem. It is recommended that you track changes to the contact center network such as:

New agent phones added

Modifications to Cisco Unified Communications Manager call routing settings, such as new directory numbers, route patterns and dial rules to support new sites or devices

Changes to port configurations on switches, routers or gateways (new equipment, wiring changes or new port activation)

Changes to IP addressing schemes (such as adding new subnets) that may have affected route tables

Verify the IP Network Integrity

Always remember that Cisco Unified Communications equipment relies on a backbone IP network. Many connectivity problems are not caused by configuration errors or operational failures on Cisco devices, but rather by the IP network that interconnects them. Problems such as poor voice quality are typically due to IP network congestion, while call failures between locations may be the result of network outages due to disconnected cables or improperly configured IP route tables.

Before assuming that call processing problems result from Cisco Unified Communications devices themselves, check the integrity of the backbone IP network. Keep the OSI model in mind as you perform these checks. Start from the bottom, at the physical layer, by checking that end-to-end cabling. Then verify the status of Layer 2 switches, looking for any port errors. Move from there to confirm that the Layer 3 routers are running and contain correct routing tables. Continue up the OSI stack to Layer 7, the application layer. To resolve problems occurring at the top levels of the stack, a protocol analyzer (or "sniffer") may be useful. You can use sniffer to examine the IP traffic passing between devices and also decode the packets. Sniffers are particularly useful for troubleshooting errors between devices that communicate using Media Gateway Control Protocol (MGCP) or Session Initiation Protocol (SIP).

Apply Tools to Determine the Problem's Root Cause

After you have eliminated the IP network as the source of the problem and you have isolated the specific Cisco Unified Communications components involved, you can start applying the many diagnostic tools provided by Cisco components.

Table 3 lists the diagnostic tools and supporting troubleshooting documentation available for most components in a contact center network. Note that this summary table is provided for reference only. The procedures in Troubleshooting Daily Operations specify when to use each tool and provide links to the troubleshooting instructions where appropriate.

Table 3 Contact Center Component Troubleshooting Tools and Documentation 

Category
Component
Diagnostic Tools Available
Information Available In...

Call Control

Cisco Unified Communications Manager

Serviceability System tools:

Alarms

Real-Time Monitoring Tool window

Trace log files

Communications Manager trace log

SDL trace log (under TAC direction)

Troubleshooting Guide for Cisco Unified Communications Manager Opens new window

Cisco Unified Communications Manager Real-Time Monitoring Tool Administration Guide Opens new window

Cisco Unified Serviceability Administration Guide for Cisco Unified Communications Manager Opens new window

Cisco Unified Communications Manager CDR Analysis and Reporting Administration Guide Opens new window

Disaster Recovery System Administration Guide Opens new window

Troubleshooting TechNotes Opens new window

Contact Center

Cisco Unified Intelligent Contact Management Enterprise

Distributed Diagnostics and Services Network (DDSN)

Support Tools Dashboard (requires additional software)

ICM Administration Guide for Cisco ICM/IPCC Enterprise & Hosted Editions Opens new window

Cisco Support Tools User Guide for Cisco Unified Software Opens new window

Troubleshooting TechNotes Opens new window

Cisco Unified Contact Center Enterprise

Log files:

Error/event logs

Agent desktop activity logs

Debugging logs

Test programs:

Chat Service test program

Enterprise Service test program

IP Phone Agent Service test program

Packet Capture Driver test program

Recording and Statistics Service test program

Sniffing Adapter Update Utility

Voice Over IP Monitor service test program

Mobile Agent Guide for Cisco Unified CC Enterprise & Hosted, "Configuration and Troubleshooting Appendix for Remote Agent" appendix

Troubleshooting Guide for Cisco Unified Contact Center Management Portal Opens new window

Troubleshooting TechNotes Opens new window

Cisco Customer Response Solutions (Unified IP IVR)

Log files

Alarms

Cisco Customer Response Solutions Servicing and Troubleshooting Guide, "Part II Troubleshooting" Opens new window

Troubleshooting TechNotes Opens new window

Cisco Unified Customer Voice Portal

Error messages

Alarms

Support Tools Dashboard (requires additional software)

Troubleshooting Guide for Cisco Unified Customer Voice Portal Opens new window

Installation and Upgrade Guide for Cisco Unified Customer Voice Portal, "Troubleshooting Unified CVP Software Installation" chapter Opens new window

Cisco Support Tools User Guide for Cisco Unified Software Opens new window

Troubleshooting TechNotes Opens new window

Contact Center (continued)

CTI Object Server (CTIOS)

Log files:

CTI OS Server logs

CTI Toolkit logs

Error messages in the CTI OS Server console window

Support Tools Dashboard (requires additional software)

CTI OS Troubleshooting Guide for Cisco ICM/IPCC Enterprise and Hosted Editions Opens new window

Cisco Support Tools User Guide for Cisco Unified Software Opens new window

Troubleshooting TechNotes Opens new window

Cisco Agent Desktop (CAD)

Log files:

Error/event logs

Agent desktop activity logs

Debugging logs

Test programs:

IP Phone Agent Service test program

Voice Over IP Monitor service test program

Support Tools Dashboard (requires additional software)

Cisco Support Tools User Guide for Cisco Unified Software Opens new window

Troubleshooting TechNotes Opens new window

Applications

Cisco Unified Presence

Configuration Troubleshooter

Trace log files

Alarms

Cisco Unified Presence Administration Guide, "Configuration Troubleshooter" section Opens new window

Cisco Unified Serviceability Administration Guide for Cisco Unified Presence, "Troubleshooting Trace Setting Configuration" section Opens new window

System Error Messages for Cisco Unified Presence

Cisco IP Phone Messenger User Guide for Cisco Unified Presence, "Troubleshooting" section Opens new window

Disaster Recovery System Administration Guide for Cisco Unified Presence Opens new window

Voice Mail and Unified Messaging

Cisco Unity Connection

Serviceability System tools:

Alarms

Real-Time Monitoring Tool window

Cisco Unity Diagnostic Tool (UDT):

Macro trace logs

Micro trace logs

CuVrt service verbose logging

Real-Time Monitoring Tool Administration Guide for Cisco Unity Connection Opens new window

Administration Guide for Cisco Unity Connection Serviceability Opens new window

Cisco Unified Serviceability Administration Guide for Cisco Unity Connection Opens new window

Disaster Recovery System Administration Guide for Cisco Unity Connection Opens new window

Troubleshooting TechNotes Opens new window

Endpoints and Clients

Cisco Unified IP phones

Network configuration, status and phone model information on Settings menu

End-User Guides Opens new window

Cisco Unified IP Phone Administration Guides for Cisco Unified Communications Manager, "Troubleshooting and Maintenance" chapters Opens new window

Error Message Decoder Opens new window

Output Interpreter Opens new window

Troubleshooting TechNotes Opens new window

Cisco IP Communicator

Quality Report Tool (QRT)

Error Reporting Tool

Cisco IP Communicator Administration Guide, "Troubleshooting Cisco IP Communicator" chapter Opens new window

User Guide for Cisco IP Communicator, "Troubleshooting Cisco IP Communicator" chapter Opens new window

Troubleshooting TechNotes Opens new window

Network Management

Cisco Unified Operations Manager

Alarms and events appearing in Dashboard displays

Phone status tests

Synthetic test

Node-to-node tests

User Guide for Cisco Unified Operations Manager, "Adminstering Operations Manager" chapter Opens new window

Communications Infrastructure

Cisco Catalyst 3550 Access Switch

IOS command line tools (such as Show commands and Debug trace utilities)

Catalyst 3550 Multilayer Switch Software Configuration Guide, "Troubleshooting" chapter Opens new window

Error Message Decoder Opens new window

Output Interpreter Opens new window

Troubleshooting Tech Notes Opens new window

Cisco Catalyst 6506, 6509 including Firewall Services Module (FWSM) and Communications Media Module (CMM)

IOS command line tools (such as Show commands and Debug trace utilities)

Catalyst 6500 Series Switch Installation Guide, "Troubleshooting" chapter Opens new window

Catalyst 6500 Series Error and System Message Guides Opens new window

Catalyst 6500 Series Switch and Cisco 7600 Series Router Firewall Services Module Command Reference Opens new window for FWSM logging configuration and system log messages

Error Message Decoder Opens new window

Output Interpreter Opens new window

Troubleshooting TechNotes Opens new window