The convergence of voice, video, and data networks has resulted in increased network management complexity and demands a new set of tools, techniques, and best practices. Real-time applications such as voice lay more stringent requirements on the network than the typical applications in a business enterprise. For example, voice over IP (VoIP) is very sensitive to the latency and jitter properties of the network; users quickly perceive such problems and raise trouble tickets when they experience them. On the other hand, web server transactions and email traffic have a much higher tolerance for latency and jitter. Effectively managing the additional complexity of VoIP deployments offers significant benefits to IT personnel by reducing trouble tickets and maintaining a high level of user satisfaction.
A comprehensive management strategy requires alignment with organizational processes as well as consistency in managing the entire application delivery lifecycle. Examples of the latter include detailed plans for deployment, troubleshooting, network maintenance, and upgrades. An important part of implementing the process in each part of the lifecycle is the use of the management tools that aid in planning, managing, and troubleshooting enterprise networks. In this paper, you will be introduced to the capabilities within the Cisco® Network Analysis Module (NAM) that will proactively detect VoIP quality issues and help troubleshoot them. VoIP analysis capabilities in NAM 4.1 will be described along with use cases that detail the use of these tools to solve day-to-day operational problems. With the NAM 4.1 release, Cisco Unified Service Monitor, a component of Cisco Unified Communications Management Suite, is able to aggregate voice performance metrics from NAMs deployed across the network for enterprisewide voice quality reporting. This suite, which includes Cisco Unified Service Monitor, Cisco Unified Operations Manager, and Cisco Unified Service Statistics Manager, provides enterprisewide reporting and management for VoIP deployments.
This paper should be helpful to anyone whose job role involves monitoring, managing, planning, and responding to network performance and quality issues in VoIP networks. Examples of job roles include network architects, network engineers, IT architects/engineers, performance management and optimization personnel, and IT managers.
The goal of a voice quality monitoring system is to support a set of capabilities that provide accurate and timely indications of voice quality and notify the user when needed. Mean Opinion Scores (MOSs) offer a standard method to monitor the quality of a VoIP call. This method takes into account latency, jitter, and other network properties that affect voice quality. In addition to computing call quality, the system must be cognizant of various other pieces of information, such as the identity of phones making the calls, call history, call volume, and relevant historical trends such as peak and low periods. Finally, the ability to notify the IT administrator of a problem that has occurred is very important. Cisco NAM provides these capabilities: NAM detects and computes MOSs for VoIP calls transported through Real Time Protocol (RTP) streams. MOS is computed periodically and reported through NAM's GUI, which provides real-time and historical reporting on voice quality. Administrators can get access to MOSs in real time, even as calls are in an active state. It is important to note that NAM is an open device, which means that its data is available for consumption by any application that can poll the NAM through its interfaces. This openness allows for easy integration into other applications being used.
Deploying NAM for VoIP Monitoring
Consider Figure 1, which depicts an enterprise network, comprising a campus, a data center, and two branch offices.
Figure 1. Cisco NAM Deployment
NAM can be deployed in various locations depending on the use case being addressed. For example, when deployed in a branch, NAM can monitor the quality of all calls entering and exiting that branch office and monitor any dips in quality levels in that location. Alternatively, a NAM in the data center can monitor signaling messages sent from a phone to the Cisco Unified Communications Manager cluster and collect detailed information about the calling and called parties. A NAM located at the edge of the main campus can raise alarms about poor call quality from the main campus to a particular remote office location.
How Voice Monitoring Works
VoIP phone calls are set up using a signaling protocol, such as Session Initial Protocol (SIP), using which the phone endpoints exchange information. Once setup is complete, voice traffic is sent through a streaming protocol such as RTP. NAM detects both signaling messages and voice traffic over RTP streams. It also has the ability to link a set of call setup messages to the RTP streams associated with them.
NAM monitors signaling messages for Skinny Call Control Protocol (SCCP), SIP, H.323, and Media Gateway Control Protocol (MGCP). The information collected from signaling messages includes currently active calls, call statistics, call history, detailed information about the calling and called parties, codecs used, port numbers used, and other relevant information. These statistics can be collected both on a real-time and historical basis.
In order to measure the quality of the call, NAM detects and monitors RTP streams. First, NAM examines the packet header and identifies whether it is an RTP packet. If so, it checks whether the packet belongs to a new or existing RTP stream. Once the RTP packet is detected and associated with a stream, it is sent to the MOS process for quality analysis. The MOS process performs real time computations to measure voice quality metrics such as jitter, actual packet loss, adjusted packet loss, seconds of concealment, and severe seconds of concealment. Using the aforementioned metrics, the NAM computes the R-Factor MOS based on the ITU-T recommendation G.107. The best, worst, and average values for these metrics are reported every minute through GUI.
An important aspect of this real-time reporting is that voice quality metrics are available to users even when the call is active. There is no need to wait for the end of the call before such statistics are collected. This real-time visibility is a critical part of the solution and facilitates rapid responses to problems.
Note: To monitor signaling messages, the NAM has to be in the path of call signaling messages from the VoIP endpoint (IP Phone) to the call management server. In a Cisco Unified Communications System, the server would be a Unified Communications Manager.
Using the NAM GUI for Voice Monitoring
The GUI onboard NAM allows for easy access to real-time and historical VoIP data collected from the network. It also provides the ability to set and monitor thresholds proactively, so that there is minimal impact on end users of the system. This section will explore various aspects of the voice-related screens in the GUI.
The voice monitoring GUI is divided into different categories. The Active Call Monitoring section provides visibility on various metrics for currently active call. The MOS Quality and Alarm Threshold charts provide a 1 hour window of the quality of active calls. Note that since quality metrics are computed for RTP streams, the NAM must be placed in the path of the call. For example, a NAM placed at the edge of a branch office can provide quality metrics for all calls entering or leaving that branch.
The Active Calls Table provides information gathered by associating call signaling messages with the RTP steams used for the call. To provide correlation between signaling and RTP streams, NAM must ideally be placed at locations where it has visibility into a call's signaling messages and RTP streams.
Figure 2. Active Calls - MOS Quality
The Terminated Calls Table provides analysis of aspects such as which calls suffered from quality issues, where those calls originated, what codecs were used, and other relevant details. For example, the Worst N Calls menu provides detailed visibility into the worst N calls. This will show the caller and called party for the calls and the start and end times along with the quality. This helps isolate the problem to specific network locations or to transient network conditions during specific times in the day. The known phones and RTP stream sections provide details from the perspective of an individual phone and from the perspective of raw RTP streams, respectively. During troubleshooting, it is useful to navigate from the Active Calls menu to the RTP stream screen, for example to get additional details on the codecs used in the calls.
The use cases that follow will utilize the GUI screens described in this section to solve real-world problems. Workflows will be provided along with the relevant screenshots, to show how problems commonly experienced in VoIP networks can be tackled.
Use Case: Troubleshooting Voice Quality Degradation
Consider a situation in which the network administrator learns about problems with VoIP quality by monitoring the NAM GUI. What steps could the administrator follow to isolate the problem's root cause?
As illustrated in Figure 2, NAM classifies the voice calls by quality into poor, fair, good, and excellent categories. This rating is based on MOSs and can be configured by the user to suit the network's sensitivity levels. NAM uses preset default values for the MOS ranges. The chart indicates that there were a few calls with poor quality a few minutes ago.
Figure 3. Individual RTP Streams and Their Associated MOS
In response to this problem, the next step in troubleshooting is to navigate to obtain more detail about poor calls. Figure 3 shows the individual RTP streams and the MOSs associated with them. As indicated in the highlighted portion of the table below, the MOS of the first several calls is very low (1.76) as per the ranges defined in the foregoing chart.
There are other interesting clues that can be gleaned from Figure 3. Note that the Packet Loss column indicates that VoIP streams are experiencing packet loss.
The next step is to get clues as to where packets are being dropped. The source address of the RTP stream should be examined. All calls have the source IP address 10.14.1.2 but with different port numbers. This is typical of a conferencing system that uses different port numbers for different streams. By looking up the network topology diagram, we learn that 10.14.1.2 is located in Building 3 of the main campus of the company. The topology also indicates that there is a NAM at the edge router for Building 3. We log in to that NAM looking for clues on where packets might be getting dropped. See Figure 4.
Figure 4. Interface Statistics from the NAM in Building 3
By navigating to the Interface statistics screen that provides details about packet-related statistics, we find that Gi1/22, the interface that connects to the core of the campus network, is experiencing serious packet loss.
Figure 5. MOS Quality Chart
As this interface serves all traffic going from and to Building 3 and the rest of the campus including voice traffic, this is most likely the root cause for packet drops on the RTP stream. The problem in this case was found to be a hardware defect on the line card that affected the interface. Replacing the card fixed the issue.
This troubleshooting workflow highlights some of the VoIP quality monitoring capabilities and also shows how VoIP features can be used in combination with other traffic monitoring features on the NAM. In this particular case, we used interface statistics monitoring in the NAM in Building 3 to isolate the root cause of the problem.
Use Case: Linking Voice Quality to Quality of Service Issues
The NAM GUI offers multiple reports, and depending on the use case, users have the option of using one report or the other to start their workflow. This use case approaches a problem that is a slight variation of the problem highlighted in the previous use case. While the previous use case describes a proactive response, this one shows how to respond rapidly to a problem.
The situation is that the administrator has received a series of complaints in the last few minutes. He realizes that there is a quality problem but is unsure where in the network it is. In this case, it is useful to narrow down the problem to a set of calls. Consider the MOS quality pie chart in Figure 5. This chart allows you to vary the time period being monitored so that you can narrow your analysis into a 5-minute or a 15-minute period. Figure 5 shows that the overall call quality is less than desirable. There are very few calls that were excellent. This points to a widespread problem related to voice traffic.
Figure 6. Worst N Calls
Further evidence of the issue is provided by the Worst N Calls display (Figure 6). A MOS of 2.36 is in the "Poor" range.
Figure 7. Additional Call Details
We navigate to get additional details on the call to track down the phone location, codec used, the related RTP stream, and any other useful information (Figure 7).
Figure 8. Traffic Distribution After QoS Is Implemented
Since the problem appears to be widespread and we are unable to isolate it to a specific location or time duration, we use the quality of service (QoS) monitoring feature of the NAM to check whether voice traffic is getting the service level in the network. First, Differentiated Services (Diffserv) profiles are created to identify which applications are being associated with differentiated services code point (DSCP) or type of service (ToS) values. The NAM allows the administrator to observe application traffic flow, the DSCP values associated with each application, and total bandwidth utilization per DSCP value.
This analysis showed that voice traffic was being treated as best effort traffic; that is, priority was not being given to voice streams. Even though the network was provisioned with large bandwidth connections, lack of QoS led to issues with VoIP traffic during peak hours. Applying an appropriate QoS scheme eliminated this problem. See Figure 8.
Figure 9. Traffic Distribution After QoS Is Implemented
Use Case: Voice Quality Thresholds for Proactive Troubleshooting
To manage voice quality proactively the system should automatically alert you to potential problems. This is achieved on the NAM through the use of thresholds and alerts. As shown in Figure 9, a number of metrics can be configured for alerts. MOS thresholds can be configured with different values for different codecs because the quality tolerances could be different depending on the codec. In addition, jitter, seconds of concealment, and packet loss can be configured with thresholds based on the requirements of the network. Preset default values are provided and can be modified as needed.
Alerts are sent as syslog messages; therefore, the NAM should be configured to export alerts to a specific syslog receiver. In a typical enterprise, a syslog receiver collects syslog messages from various devices and feeds them into a network or performance management product that processes the message. Figure 10 shows the various thresholds that can be configured. This capability would be useful when different MOS tolerances are required for each codec.
Figure 10. MOS Threshold Chart
NAM provides a snapshot of current call quality vis-à-vis the thresholds that are configured (Figure 10). This report allows administrators to quickly determine whether or not any calls have exceeded the set thresholds. In the picture above, all calls are of excellent quality indicating that VoIP is operating without any problems.
Cisco NAM Integration with Cisco Unified Communications Management Suite
The Cisco Unified Communications Management Suite of products provides end-to-end visibility into all aspects of VoIP networks. To analyze voice quality, the Cisco Unified Communications Management Suite rolls up voice metrics from the NAM for networkwide quality analysis. Such an integrated system provides a comprehensive view of voice quality across the network. Also, it allows troubleshooting of performance issues experienced in one part of the network but whose root cause may be located elsewhere.
The products within this suite, which will collect and use information rolled up from the NAM, directly or indirectly, are defined briefly:
• Cisco Unified Operations Manager monitors and diagnoses problems as well as tests and tracks changes and inventory, providing network visibility and real-time operations management capabilities.
• Cisco Unified Service Monitor tracks and reports on the user experience, providing automated diagnostics for high service quality assurance.
• Cisco Unified Service Statistics Manager provides robust executive and operational reports as well as capacity planning reports.
Figure 11. Cisco NAM Metrics Are Rolled Up into the Cisco Unified Communications Management Suite
Figure 11 shows how NAM data is rolled up by Cisco Unified Service Monitor. These metrics are processed and any resulting network alarms and alerts are exported to Cisco Unified Operations Manager. The alert details can be viewed in Service Quality Alerts from the Cisco Unified Operations Manager as shown in Figure 12. Cisco Unified Operations Manager notifies the administrator if an alarm is raised and provides a direct link to the NAM responsible for triggering the alarm as shown in Figure 13. The administrator can log in directly to the NAM to continue troubleshooting. This workflow between NAM, Cisco Unified Service Monitor, and Cisco Unified Operations Manager allows administrators to manage quality across the network through early notification and useful information about the problem's root cause. Finally, Cisco Unified Service Statistics Manager collects NAM data indirectly through Cisco Unified Service Monitor and uses it to provide historical trending and capacity planning reports that help IT personnel plan for future deployments.
Figure 13. Voice Quality Alert Details in Operations Manager
Cisco Unified Service Monitor correlates call metrics and call detail records from NAM for reporting MOSs every minute as the call progresses and call reports for enhanced analysis. The sensor report in Figures 14 and 15 displays all the pertinent information, such as speaker, listener, MOS, jitter, packet loss, and so on. By clicking the MOS value, the streams and call record information is displayed. The Stream Details table in Figure 16 provides jitter, packet loss, and other information per one minute sampling duration.
Figure 14. Service Monitor Sensor Report (Part 1)
Figure 15. Service Monitor Sensor Report (Part 2)
Figure 16. Service Monitor Streams and Call Record
In today's complex multimedia networks, the ability to measure the quality experienced by end users is critical. By providing visibility into the quality of VoIP traffic, NAM 4.1 significantly enhances the ability of IT personnel to detect, isolate, and troubleshoot VoIP problems. Its real-time monitoring and alerting capabilities facilitate a proactive approach to monitoring user experience, which helps increase end-user satisfaction.
The computed Mean Opinion Score for the minute interval. MOS is computed according to ITU-T G.107 E-Model every three seconds. The reported MOS is the average of all three second scores for the minute. The minimum stream duration to compute MOS is one second of media flow.
The count of aggregate packets lost due to network transmission during the reporting period. This is computed based on observed RTP sequence number analysis.
The RFC 3550 jitter value in milliseconds. This value is a smoothed metric and may not be adequately indicative of problems given short and sudden spikes in jitter. This value should be a good description of the jitter given a uniform and constant distribution of jitter events.
Percent network loss
The percentage of packets dropped by the network on the way to the destination address.
Adjusted packet loss
The percentage of packets lost due to high jitter. This value is computed based on a reference jitter buffer with a fixed length play-out delay. It is not affected by network loss.
Seconds of concealment. The number of seconds during which any impairment was experienced. Impairment can be due to network loss or high jitter. If just one packet is lost during the entire reporting interval, this value should be 1. If each second of the reporting period experiences at least one lost packet then this value should be 60.
Severe seconds of concealment. The number of seconds during which severe impairment was experienced. Severe impairment is defined by packet loss greater than or equal to 5 percent, including both network loss and loss due to jitter buffer discards.
The codec used by the media stream. This value is derived from the RTP payload type and may also include information from media stream payload lengths and packetization properties.