Maintenance and Troubleshooting Overview
Revised: March 7, 2011, OL-0800-14
This chapter contains an overview of maintenance and troubleshooting concepts that you can apply to maintaining the elements of the Cisco PGW 2200 Softswitch platform. It includes overall maintenance and system troubleshooting strategies, and reviews available troubleshooting tools.
Although this chapter describes maintenance and troubleshooting separately, these activities are associated. Therefore, several of the maintenance and troubleshooting chapters in this guide frequently refer to each other.
This chapter includes the following sections:
•Maintenance Strategy Overview
•Troubleshooting Strategy Overview
Maintenance Strategy Overview
Maintenance usually consists of the following tasks for each element of the Cisco PGW 2200 Softswitch platform, performed in the order listed:
•Checking equipment status. Determining the status involves three basic activities:
–Reading LEDs—Most Cisco products include LED indicators on the front or rear panels and, in some cases, on both panels. These LEDs indicate the status of the equipment. The specific meaning of each LED on each product is described in the maintenance sections for the individual elements of the Cisco PGW 2200 Softswitch platform.
–Issuing Status Queries—Query the status of the system by entering various commands. The maintenance sections for the individual elements of the Cisco PGW 2200 Softswitch platform provide the commands that you can use to determine the status of the devices in your system.
–Using a GUI NMS—Maintenance sections for the individual elements of the Cisco PGW 2200 Softswitch platform describe how to use a network management system (NMS) with a GUI, such as CiscoWorks2000, Cisco WAN Manager, and Cisco MGC Node Manager (CMNM), to determine the operational status of system devices.
•Removing a device from the system—Maintenance sections for the individual elements of the Cisco PGW 2200 Softswitch platform include procedures for removing defective devices from the system.
•Replacing the complete device—Maintenance sections for the individual elements of the Cisco PGW 2200 Softswitch platform describe how to reinstate a device into the system by using a new or repaired model.
•Replacing hardware components—Maintenance chapter for each element of the Cisco PGW 2200 Softswitch platform includes sections that describe how to replace the field-replaceable components of that device. You swap out components of a device to replace defective components and to upgrade hardware.
Troubleshooting Strategy Overview
The Cisco PGW 2200 Softswitch platform supports connections to external switches and to internal components, such as Media Gateway Controllers, signal processors, and trunking gateways. The Cisco PGW 2200 Softswitch platform functions in a complex environment, which involve numerous connections, links, and signaling protocols. When connectivity and performance problems occur, they can be difficult to resolve.
Troubleshooting usually consists of determining the nature of a problem and then isolating the problem to a particular device or component. When a problem is isolated and identified, troubleshooting also requires fixing the problem, usually by replacing the device or some component of the device. This chapter provides general troubleshooting strategies, as well as information about the tools available for isolating and resolving connectivity and performance problems.
Symptoms, Problems, and Solutions
System problems show certain symptoms. These symptoms can be general (such as a Cisco SS7 interface being unable to access the SS7 network) or specific (such as routes not appearing in a routing table).
Determine the cause of a symptom by using specific troubleshooting tools and techniques. After identifying the cause, correct the problem by implementing a solution that requires a series of actions.
General Problem-Solving Model
A systematic approach works best for troubleshooting. Define the specific symptoms. Identify all potential problems that could be causing the symptoms. Then systematically eliminate each potential problem (from the most likely to the least likely) until the symptoms are no longer present.
Figure 4-1 illustrates the process flow for this general approach to problem-solving. This process is not a rigid outline for troubleshooting. It is a guide that you can use to troubleshoot a problem successfully.
The following steps describe in more detail the problem-solving process that is outlined in Figure 4-1:
Note You need to determine and understand the message flow for certain actions. You might need to use different tools for situations in which messages are exchanged within the Cisco PGW 2200 Softswitch software or the operating system (UNIX), and situations in which messages flow between the Cisco PGW 2200 Softswitch and the external nodes over IP.
Step 1 When analyzing a problem, draft a clear problem statement. Define the problem in terms of a set of symptoms and the potential causes of those symptoms.
For example, the symptom might be that the EQPT FAIL alarm has become active. Possible causes might be physical problems, a bad interface card, or the failure of some supporting entity (for example,
Layer 1 framing).
Figure 4-1 General Problem-Solving Model
Step 2 Gather the necessary facts to help isolate the symptoms and their possible causes.
Ask questions of affected users, network administrators, managers, and other key people. Collect information from sources such as network management systems, protocol analyzer traces, output from router diagnostic commands, or software release notes.
Step 3 Consider possible causes that are based on the facts you have gathered. You can use these facts to eliminate potential causes from your list.
For example, depending on the data, you might be able to eliminate hardware as a cause, which would allow you to focus on software. Try to reduce the number of potential causes so that you can create an efficient plan of action.
Step 4 Create an action plan that is based on the remaining potential causes. Begin with the most likely cause and devise a plan by which only one variable at a time is manipulated.
This approach allows you to reproduce the solution to a specific problem. If you alter more than one variable simultaneously, identifying the change that eliminates the symptom becomes more difficult.
Step 5 Perform each step of the action plan carefully, and test to see if the symptom disappears.
Step 6 Whenever you change a variable, gather the results. You should use the same method of gathering facts that you used in Step 2.
Analyze the results to determine if the problem is resolved. If it is, then the process is complete.
Step 7 If the problem is not resolved, you must create an action plan that is based on the next most likely problem in your list. Return to Step 2 and continue the process until the problem is solved.
Before trying out a new cure, be sure to undo any changes that you made when implementing your previous action plan. Remember to change only one variable at a time.
Note If you exhaust all of the common causes and actions (those causes that are outlined in this chapter and those causes that you have identified for your environment), your last recourse is to contact the Cisco Technical Assistance Center (TAC). See the "Obtaining Documentation and Submitting a Service Request" section on page xviii for more information about contacting the Cisco TAC.
System Troubleshooting Tools
This section presents information about the tools you can use to troubleshoot the system.
The Cisco PGW 2200 Softswitch software generates alarms that indicate problems with processes, routes, linksets, signaling links, and bearer channels. For more information on troubleshooting using alarms, see the "Alarm Troubleshooting Procedures" section on page 6-4. See Cisco PGW 2200 Softswitch Release 9 Messages Reference for detailed information on the system alarms.
The Cisco PGW 2200 Softswitch generates call traces that capture call-processing activity. With the call trace, you can follow the call from a specified destination through the Cisco PGW 2200 Softswitch software engine to see where it failed. Determine the location of a call failure by using the following information that is provided in the call trace:
•The protocol data units (PDUs) that the Cisco PGW 2200 Softswitch receives
•How the Cisco PGW 2200 Softswitch decodes the PDU
•The PDUs that the Cisco PGW 2200 Softswitch sends out
The results of call traces are signal flow diagrams that you can use for troubleshooting. Typically, call traces are used to capture system activity as part of a procedure to clear an alarm. For more information on using call traces, see the "Tracing" section on page 6-155.
The Cisco PGW 2200 Softswitch software continuously generates log files of various system information, including operational measurements (OMs) and alarm records. You can use these logs to obtain statistical information about the calls that the system processes and network events such as delays or service-affecting conditions. The Cisco PGW 2200 Softswitch generates the following types of logs:
•Platform logs contain information that is useful for tracking configuration errors and signaling link and call instantiation problems.
•Command and response logs contain Man-machine language (MML) command history.
•Alarm logs contain alarm information.
•Measurement logs contain system measurements data.
•Call record logs contain call-processing data.
You can read system logs by using the viewers within the Cisco MGC viewer toolkit. For more information on the viewers that comprise the Cisco MGC toolkit, see the "Using the Cisco MGC Viewer Toolkit" section on page 3-119.
See the Appendix A, "Configuring Cisco PGW 2200 Softswitch Log Files," for more information on system log files.
MML is the command line interface method for configuring and managing the Cisco PGW 2200 Softswitch. You can enter MML commands to retrieve information about system components, and to perform logging and tracing. See Cisco PGW 2200 Softswitch Release 9 MML Command Reference for more information.
Cisco Internetwork Management Tools
The following Cisco internetwork management products provide design, monitoring, and troubleshooting tools to help you manage the Cisco PGW 2200 Softswitch platform:
•Cisco WAN Manager
•Cisco MGC Node Manager (CMNM)
CiscoWorks2000 is a series of SNMP-based internetwork management software applications. CiscoWorks applications are integrated on several popular network management platforms. The applications build on industry-standard platforms to provide tools for monitoring device status, maintaining configurations, and troubleshooting problems.
The following list names applications that are included in CiscoWorks2000. These applications are useful for troubleshooting:
•Device Monitor—Monitors specific devices for environmental and interface information.
•Health Monitor—Displays information about the status of a device, including buffers, CPU load, memory available, and protocols and interfaces being used.
•Show Commands—Enable you to view data that are like the data that are generated by router show EXEC commands.
•Path Tool—Collects path utilization and error data by displaying and analyzing the path between devices.
•Device Polling—Extracts data about the condition of network devices.
•CiscoView—Provides dynamic monitoring and troubleshooting functions, including a graphical display of Cisco devices, statistics, and comprehensive configuration information.
•Offline Network Analysis—Collects historical network data for offline analysis of performance trends and traffic patterns.
•CiscoConnect—Enables you to provide Cisco with debugging information, configurations, and topology information to speed resolution of network problems.
Use CiscoWorks2000 to manage several Cisco products. For example, you can use CiscoWorks2000 on the Cisco PGW 2200 Softswitch platform to manage Cisco SS7 interfaces and other Cisco switches. See the CiscoWorks2000 documentation for more information.
Cisco WAN Manager
Cisco WAN Manager is part of the Cisco Service Management System. Cisco WAN Manager includes tools that operators of service provider networks and large enterprise networks can use to provision and manage their networks. The Cisco WAN Manager provides fault-management features and can be used along with other applications such as CiscoView, the Event Browser, and Configuration Save and Restore.
Use Cisco WAN Manager to perform search, sort, and filter operations, and to tie events to extensible actions. For instance, Cisco WAN Manager can page someone upon receiving a certain type of SNMP trap. It supports alarm hierarchies that report the root cause of problems to operators and higher-level systems.
Configuration Save and Restore saves a snapshot of the entire network configuration. For disaster recovery, operators can selectively restore configurations of any element, from a single node up to the entire network. This restoration ability significantly reduces recovery time when a catastrophic failure occurs.
The Cisco WAN Manager Trivial File Transfer Protocol (TFTP) statistics collection facility offers the ability to obtain extensive usage and error data across machines and platforms.
A wide range of statistics is available at the port and virtual channel level including the following:
•Circuit line statistics
•Packet line statistics
•Frame Relay port statistics
•Physical layer statistics
•Protocol layer statistics
Use the Cisco WAN Manager application to manage several Cisco products. Use the
Cisco WAN Manager on the Cisco PGW 2200 Softswitch platform to manage Cisco SS7 interfaces and other switches. See the Cisco WAN Manager documentation for more information.
Cisco MGC Node Manager
The Cisco MGC Node Manager (CMNM) is an element management system that is based on the
Cisco Element Management Framework (CEMF). It is responsible for managing the
Cisco PGW 2200 Softswitch platform, including the Cisco PGW 2200 Softswitch, other switches, and Cisco SS7 interfaces.
NMS design divides network management into five discrete areas: Fault, Configuration, Accounting, Performance, and Security. The Cisco MNM provides fault and performance management of the
Cisco PGW 2200 Softswitch, as well as flow-through provisioning of the Cisco PGW 2200 Softswitch and its subcomponents. In addition, MNM provides fault and performance management of the Cisco SS7 interfaces and switches. MNM uses the Cisco Voice Services Provisioning Tool (VSPT) to enable configuration of the Cisco PGW 2200 Softswitch and uses CiscoView to configure the Cisco SS7 interfaces and switches.
The CEMF platform provides security and some accounting features. MNM does not provide any security or accounting features beyond the features that the CEMF provides. MNM operates separately with a customer-operations support system or a Cisco NMS such as the Voice Network Manager (VNM).
For more information on MNM, see Cisco Media Gateway Controller Node Manager User Guide.
Cisco SS7 Interface Diagnostic Commands
Cisco SS7 interfaces provide the following integrated Cisco IOS command types to assist you in monitoring and troubleshooting systems:
The show commands are powerful monitoring and troubleshooting tools. You can use the show commands to perform several functions:
•Monitor router behavior during initial installation
•Monitor normal network operation
•Isolate problem interfaces, nodes, media, or applications
•Determine when a network is congested
•Determine the status of servers, clients, or other neighbors
Some of the most commonly used status commands include the following:
•show interfaces—Displays statistics for network interfaces using the following commands:
–show interfaces ethernet
–show interfaces fddi
–show interfaces atm
–show interfaces serial
•show controller t1—Displays statistics for T1 interface card controllers
•show running-config—Displays the router configuration currently running
•show startup-config—Displays the router configuration that is stored in nonvolatile RAM (NVRAM)
•show flash—Displays the layout and contents of flash memory
•show buffers—Displays statistics for the buffer pools on the router
•show memory—Shows statistics about the router memory, including free pool statistics
•show processes—Displays information about the active processes on the router
•show stacks—Displays information about the stack utilization of processes and interrupt routines, as well as the reason for the last system reboot
•show version—Displays the configuration of the system hardware, the software version, the names and sources of configuration files, and the boot images
For details on using and interpreting the output of specific show commands, see the Cisco IOS command reference for the release currently used.
Using Debug Commands
The debug commands in privileged EXEC mode can provide information about the traffic being seen (or not seen) on an interface. This information includes error messages that network nodes generate, protocol-specific diagnostic packets, and other useful troubleshooting data.
Be careful when using
debug commands. These commands are processor-intensive and can cause serious network problems (degraded performance or loss of connectivity) if they are enabled on an already heavily loaded router. When you finish using a
debug command, remember to disable it with its specific
no debug command, or use the
no debug all command to turn off all debugging.
Note Output formats vary among debug commands. Some commands generate a single line of output per packet. Other commands generate multiple lines of output per packet. Some commands generate large amounts of output; other commands generate only occasional output. Some commands generate lines of text; other commands generate information in field format.
To minimize the negative impact of using debug commands, follow this procedure:
Step 1 Enter the no logging console global configuration command on your router. This command disables all logging to the console terminal.
Step 2 Establish a Telnet session to a router port and enter the enable EXEC command.
Step 3 Enter the terminal monitor command on your router to copy debug command output and system error messages to your current terminal display.
This procedure permits you to view debug command output remotely, without being connected through the console port. Following this procedure minimizes the load that is created by using debug commands because the console port no longer has to generate character-by-character processor interrupts.
If you intend to keep the output of the debug command, spool the output to a file. Cisco IOS Debug Command Reference provides the procedure for setting up such a debug output file, and complete details about the function and output of debug commands.
Note In many situations, third-party diagnostic tools can be more useful and less intrusive than the debug commands. For more information, see the "Third-Party Troubleshooting Tools" section.
Using the Ping Command
To check host accessibility and network connectivity, use the ping command in EXEC (user) or privileged EXEC mode.
For IP, the ping command sends ICMP Echo messages. If a station receives an ICMP Echo message, it sends an ICMP Echo Reply message back to the source. The extended command mode of the ping command permits you to specify the supported IP header options. This command enables the router to perform a more extensive range of test options.
We suggests using the ping command when the network is functioning properly under normal conditions. You can compare the information that the command returns when the network is performing as expected with the information returned by the command when you are troubleshooting a problem.
For detailed information on using the ping command and extended ping commands, see the Cisco IOS Configuration Fundamentals Command Reference.
Using the Trace Command
The trace user command in EXEC mode discovers the routes that packets follow when traveling to their destinations. The trace command in privileged EXEC mode enables you to specify the supported IP Header options, which enable the router to perform a more extensive range of test options. The trace command uses the error message that a router generates when a datagram exceeds its time-to-live (TTL) value. First probe datagrams are sent with a TTL value of 1, which causes the first router to discard the probe datagrams and send back "time exceeded" error messages. The trace command then sends several probes and displays the round-trip time for each. After every third probe, the TTL increases by 1.
Each outgoing packet can result in one of two error messages. A "time exceeded" error message indicates that an intermediate router has seen and discarded the probe. A "port unreachable" error message indicates that the destination node has received the probe and discarded it, because it could not deliver the packet to an application. If the timer goes off before a response comes in, the trace command prints an asterisk (*).
The trace command terminates when the destination responds, when the maximum TTL is exceeded, or when the user interrupts the trace with the escape sequence.
It is a good idea to use the trace command when the network is functioning properly under normal conditions. Compare the information that the command returns when the network is performing as expected with the information returned by the command when you are troubleshooting a problem.
For detailed information on using the trace and extended trace commands, see Cisco IOS Configuration Fundamentals Command Reference.
Third-Party Troubleshooting Tools
In many situations, third-party diagnostic tools can be more useful than system commands that are integrated into the router. For example, issuing a processor-intensive debug command can contribute to the overloading of an environment that is already experiencing excessively high traffic levels. Attaching a network analyzer to the suspect network is less intrusive and is more likely to yield useful information without interrupting the operation of the router.
Some useful third-party tools for troubleshooting internetworks include the following:
•Volt-ohm meters, digital multimeters, and cable testers
•Breakout boxes, fox boxes, bit error rate testers (BERTs), and block error rate testers (BLERTs)
•Network analyzers and network monitors
•Time domain reflectometers (TDRs) and optical time domain reflectometers (ODTRs)
Volt-Ohm Meters, Digital Multimeters, and Cable Testers
Volt-ohm meters and digital multimeters are at the lower end of the spectrum of cable testing tools. These devices can measure basic parameters such as alternating current (AC) and direct current (DC) voltage, current, resistance, capacitance, and cable continuity. They are used primarily to check physical connectivity.
Cable testers (scanners) can also be used to check physical connectivity. Cable testers are available for shielded twisted-pair, unshielded twisted-pair, 10BASE-T, and coaxial and twinax cables.
A cable tester might also be able to perform any of the following functions:
•Test and report on cable conditions, including near-end crosstalk, attenuation, and noise
•Perform TDR, traffic monitoring, and wire map functions
•Display Media Access Control (MAC) layer information about network traffic, provide statistics such as network utilization and packet error rates, and perform limited protocol testing (for example, TCP/IP tests such as ping)
Similar testing equipment is available for fiber-optic cable. Because of the relatively high cost of fiber-optic cable and its installation, the cable should be tested both before installation (on-the-reel testing) and after installation. Continuity testing of fiber-optic cable requires either a visible light source or a reflectometer. You can use a power meter with a light source that is capable of providing light at the three predominant wavelengths, 850 nanometers (nm), 1300 nm, and 1550 nm. A power meter can measure the same wavelengths and test attenuation and return loss in the fiber-optic cable.
Breakout Boxes, Fox Boxes, and BERTs/BLERTs
Breakout boxes, fox boxes, and BERTs/BLERTs are digital interface testing tools that are used to measure the digital signals present at the interfaces of PCs, CSU/DSUs, and other devices. These testing tools can monitor data line conditions, analyze and trap data, and diagnose problems common to communications systems. Examine traffic from data terminal equipment (DTE) through data communications equipment (DCE) to isolate problems, identify bit patterns, and ensure that the proper cabling has been installed. These devices cannot test media signals such as those for Ethernet, Token Ring, or FDDI.
Network Monitors and Analyzers
Use network monitors to track packets that are continuously crossing a network. A network monitor enables you to obtain an accurate picture of network activity at any moment, or a historical record of network activity during a period. Network monitors do not decode the contents of frames. Monitors are useful for baselining, in which the activity on a network is sampled during a period to establish a normal performance profile or baseline.
Monitors collect information such as packet sizes, numbers of packets, error packets, overall usage of a connection, and the number of hosts and their MAC addresses. A monitor also provides details about communications between hosts and other devices. You can use the data to create profiles of network traffic, locate traffic overloads, plan for network expansion, detect intruders, establish baseline performance, and distribute traffic more efficiently.
A network analyzer (also called a protocol analyzer) decodes the various protocol layers in a recorded frame and presents them as readable abbreviations or summaries. The analyzer provides details about which network layer is involved (physical, data link, and so on) and what function each byte or byte content serves.
Most network analyzers can perform many of the following functions:
•Filtering traffic that meets certain criteria so that, for example, all traffic to and from a particular device can be captured
•Time-stamping captured data
•Presenting protocol layers in an easily readable form
•Generating frames and transmitting them onto the network
•Incorporating an "expert" system in which the analyzer uses a set of rules, combined with information about the network configuration and operation, to diagnose and solve network problems
TDRs and OTDRs
TDRs are at the top end of the cable testing spectrum. These devices can quickly locate open and short circuits, crimps, kinks, sharp bends, impedance mismatches, and other defects in metallic cables.
A TDR works by "bouncing" a signal off the end of the cable. Opens, shorts, and other problems reflect the signal back at different amplitudes, depending on the problem. A TDR measures how much time it takes for the signal to reflect and calculates the distance to a fault in the cable. Also, TDRs can measure the length of a cable or calculate the propagation rate that is based on a configured cable length.
An OTDR performs fiber-optic measurement. OTDRs can measure accurately the length of the fiber, locate cable breaks, measure the fiber attenuation, and measure splice or connector losses. An OTDR can ascertain the "signature" of a particular installation, noting attenuation and splice losses. When you suspect a problem in the system, you can compare the baseline measurement with future signatures.