Cisco Resource Policy Management System 2.0.2 Troubleshooting Guide
Using the Cisco RPMS Diagnostic Tools
Downloads: This chapterpdf (PDF - 374.0KB) The complete bookPDF (PDF - 1.43MB) | Feedback

Using the Cisco RPMS Diagnostic Tools

Table Of Contents

Using the Cisco RPMS Diagnostic Tools

Overview: Using the Diagnostic Tools

Overview: Diagnostics Report Fields

Configuration Parameters of Diagnostics Subsystem

Customer-Related Diagnostics Reports

Customer Counts

Calculating the Customer Limit Deficit

Customer Rejects

Customer Call Durations

Gateway-Related Diagnostics Reports

UGCallDuration

UGAccountingDelay

UGRejects

UGTerminations

UGRetries

System-Related Diagnostics Information

PolicyUsage

Message Rate

System Activity

System Latency

System Stats

SystemUGRetries

Cell Durations

MissingAcctStop

CallFailureRates

Threshold Report

Provisioning Changes

Overview: Diagnosing End User Issues

Oversubscribed Policies

Using the Cisco RPMS Diagnostics Tool to Monitor Customer Activity

Low Call Durations

Using the Cisco RPMS Diagnostics Tool to Troubleshoot Low Call Durations

Overview: Identifying Policy Enforcement Issues

Incorrect Active Call Count

Using the Cisco RPMS Diagnostics Tool to Determine the Active Call Count

Overview: Identifying Universal Gateway Issues

Using the Cisco RPMS Diagnostics Tool to Determine Universal Gateway Issues

Overview: Tracking Provisioning Changes

Overview: Viewing the Cisco RPMS History

Using the Cisco RPMS Diagnostics Tool to View the System History

Overview: Analyzing Traffic Patterns with Cisco RPMS

Collecting System Statistics

Analyzing System-Wide Traffic

Overview: Analyzing Customer-Specific Traffic

Call Count Over Time

Call Duration Distribution Over Time


Using the Cisco RPMS Diagnostic Tools


Topics in this chapter include:

Overview: Using the Diagnostic Tools

Overview: Diagnostics Report Fields

Overview: Diagnosing End User Issues

Oversubscribed Policies

Low Call Durations

Overview: Identifying Policy Enforcement Issues

Incorrect Active Call Count

Overview: Identifying Universal Gateway Issues

Overview: Tracking Provisioning Changes

Overview: Viewing the Cisco RPMS History

Overview: Analyzing Traffic Patterns with Cisco RPMS

Collecting System Statistics

Analyzing System-Wide Traffic

Overview: Analyzing Customer-Specific Traffic

Call Count Over Time

Call Duration Distribution Over Time

Overview: Using the Diagnostic Tools

Cisco RPMS Release 2.0.2 introduces new diagnostic features to help monitor the system's performance. The diagnostic tools have two functions: to gather information so that you can monitor the Cisco RPMS' performance, and to help diagnose system issues caused by network elements that interact with the Cisco RPMS. Both functions help track the Cisco RPMS' performance to ensure it runs smoothly, and should problems occur, help you to diagnose and repair the issues.

The features have two components:

A set of reports generated as log files

A CLI interface using the crpms_diag command to access the reports

By using the diagnostic tools, you can accomplish the following:

Troubleshoot issues that adversely affect end users, such as receiving busy signals or low call duration

Monitor other issues, such as policy enforcement or UG hardware and software problems

Track provisioning changes so that you can clearly identify changes to the Cisco RPMS configuration

Identify issues over a specific period of time

Analyze traffic patterns

For more information on using the diagnostic tools, refer to the Cisco Resource Policy Management System Release 2.0.2 Solutions Guide.

Overview: Diagnostics Report Fields

This section defines the fields for all the diagnostics reports generated by Cisco RPMS. The diagnostics reports are classified in to the following categories.

Customer-Related Diagnostics Reports

Gateway-Related Diagnostics Reports

System-Related Diagnostics Information

Provisioning Changes

The diagnostics feature is enabled by default.

Configuration Parameters of Diagnostics Subsystem

The following example shows the configuration parameters for the Diagnostics Subsystem of Cisco RPMS 2.0.2. The configuration parameters are located in the <$RPMSBASE>/config/rpms.conf file.

The sampling intervals are in milliseconds.

Example 7-1 Configuration parameters for the Diagnostics Subsystem

BaseDir= Directory used to store diagnostics file.
CustCountsPolicyUsageDiagSampleTime=120000
CustRejectsDiagSampleTime=300000
CustCallDurationDiagSampleTime=1200000
UGCallDurationDiagSampleTime=1200000
UGAcctDelayDiagSampleTime=300000
UGRejectsDiagSampleTime=300000
UGTerminationsDiagSampleTime=1200000
UGRetriesDiagSampleTime=1200000
UGCallDurationSysDiagSampleTime=1200000
UGCallFailureSysDiagSampleTime=300000
UGMissingAcctStopsDiagSampleTime=3600000
UGRetriesSysDiagSampleTime=1200000
SystemStatsDiagSampleTime=1800000
SystemActivityDiagSampleTime=300000
SystemLatencyDiagSampleTime=300000
SystemMessageRateDiagSampleTime=300000
VirtualCallDuration=20

Customer-Related Diagnostics Reports

Cisco RPMS logs customer-related diagnostics information in three different report files:

Customer Counts

Customer Rejects

Customer Call Durations

The reports are generated for each customer configured in Cisco RPMS under a specific directory for that customer. The reports reside in the following directory structure:

$RPMSBASE/diagnostics/<Date>/customer/<CustomerName>/<ReportType>_<TimeStamp>.csv

Report Type is the name of the report.

Timestamp is the timestamp for the particular day.

The data in the files are logged at periodic intervals that are configurable (in the Diagnostics section of rpms.conf). All data is incremental (that is, change in value since last sample time).

The following sections describe the report fields.

Customer Counts

The report logs the active call count information of each customer during the sampling interval.

Table 7-1 Customer Count Fields 

Field
Definition

Timestamp

Time at which the data was sampled.

Session Limit/ Base Limit

Number of guaranteed concurrent sessions allowed for a customer.

Oversubscription Limit

Maximum number of guaranteed sessions allowed after the Session/Base limit is reached for a customer.

Shared Overflow Limit

Maximum number of shared overflow sessions allowed after the base and Oversubscription limits have been reached for a customer.

Current Count

Number of calls currently active for this customer.

Current Limit Deficit

The value that the limit should currently be raised by in order to accept all calls.


Calculating the Customer Limit Deficit

The active call count for an SLA is the total number of active calls that have been admitted by meeting SLA criteria. When the active call limit reaches an SLA limit, subsequent call attempts are rejected until one of the active calls disconnect, allowing a new call to enter the system and returning to the limit.

A "virtual" call count is identical to the active call count before its limits are reached. However, when a call request is rejected by an SLA active call limit, the active call count does not change; with a virtual call count, the limit is incremented by 1. Essentially, the virtual call count tracks the count as if there were no limits. This difference between the virtual and active call counts is called the "limit deficit."

Periods of limit restriction can last for hours, yet average dial hold times last about 20 minutes. Because of this, the virtual call count also uses a virtual call duration, so that a fictitious additional call count expires after a statistical amount of time. This prevents rejected call counts from inflating forecast virtual calls.

A simple queue-based approach is used to calculate the limit deficit. The queue tracks the call duration of the "virtual" calls. Figure 7-1 illustrates the queue-based algorithm.

Figure 7-1 The queue-based algorithm

Figure 7-1 shows the following:

Ts = Sampling Interval

To = Start Time of Sampling Timer

Ti = Time at ith Sample, i=1,2,3...

n = Queue Size = Virtual Call Duration / Ts

Ri = Number of rejected/virtual calls in the ith sample, i=1,2,3...

The limit deficit at any time is the sum of all the elements in the queue, so that:

To better understand this algorithm, use the following example.

Example 7-2 The Customer Limit Deficit Algorithm

Virtual Call Duration = 20 minutes 
Sampling Interval, Ts = 2 minutes
Start Time of Sampling Timer, To = 04:00:00

From the values in Example 7-2, n = 20 / 2 = 10. Since n is the queue size, the values of virtual call duration and Ts is chosen so that n is a non-fractional number greater than or equal to 1.

Figure 7-2 shows the queue contents at various points in time.

Figure 7-2 Queue contents After 6 Samples

Figure 7-3 Queue contents After 10 Samples

Figure 7-4 Queue contents After 11 Samples

Table 7-2 shows the limit deficit for various samples.

Table 7-2 Limit Deficit Table

Sample
Time
Limit Deficit

1

04:02:00

2

2

04:04:00

7 (2+5)

3

04:06:00

7 (2+5+0)

4

04:08:00

8 (2+5+0+1)

5

04:10:00

15 (2+5+0+1+7)

6

04:12:00

26 (2+5+0+1+7+11)

7

04:14:00

50 (2+5+0+1+7+11+24)

8

04:16:00

62 (2+5+0+1+7+11+24+12)

9

04:18:00

66 (2+5+0+1+7+11+24+12+4)

10

04:20:00

71 (2+5+0+1+7+11+24+12+4+5)

11

04:22:00

75 (5+0+1+7+11+24+12+4+5+6)

.

...

...


Until the 10th sample, the limit deficit is merely a summation of the queue elements. However, for the 11th sample, the queue has already reached its limit. This indicates that the duration of "virtual" calls at the head of the queue have reached the virtual call duration of 20 minutes.

This is why you must calculate the queue size by dividing the virtual call duration by Ts. So, the first element is popped before the new sample value of 6 (which is the number of rejections between 04:20:00 and 04:22:00) is pushed into the queue.

Customer Rejects

The report logs rejection details of each customer during the specified sampling interval.

The report has only Limit Rejections and VPDN Rejections.

Table 7-3 Customer Rejects Fields 

Field
Definition

Timestamp

Time at which the data was sampled.

Attempted Call-Accepts/ Attempted Calls

Number of attempted calls since last sample.

Limit Rejections

Number of limit rejections since the last sample.

Limit Rejection %

(Limit Rejections/Attempted Calls) x 100.

VPDN Attempted Calls

Number of attempted VPDN sessions since the last sample.

VPDN Rejections

Number of VPDN rejections since the last sample.

VPDN Rejection %

(VPDN Rejections/VPDN Attempted Calls) x 100.


Customer Call Durations

This report logs the duration of calls for each customer during the sampling interval.

Table 7-4 Customer Call Duration Fields

Field
Definition

Timestamp

Time at which the data was sampled.

Closed Calls

Number of calls closed since the last sample.

%(a-b) min

Number of calls closed between "a" (exclusive) and "b" (inclusive) minutes expressed as a percentage of total Closed Calls.


Gateway-Related Diagnostics Reports

Cisco RPMS logs gateway-related diagnostics information in the following report files:

UGCallDuration

UGAccountingDelay

UGRejects

UGTerminations

UGRetries

Cisco RPMS logs gateway-related diagnostics reports in five different reports. These reports are generated for each gateway configured in the Cisco RPMS. The reports are located in the following directory structure:

$RPMSBASE/diagnostics/<Date>/gateway/<UGIP>/<ReportType>_<TimeStamp>.csv

Report Type is the name of the report.

Timestamp is the timestamp for the particular day.

The data in the files are logged at periodic intervals that are configurable in the [Diagnostics] section of rpms.conf. All data is incremental (i.e., change in value since last sample time).

UGCallDuration

This report logs the duration of the calls for each UG during the sampling interval.

Table 7-5 UG Call Duration Fields

Field
Definition

Timestamp

Time at which the data was sampled.

Closed Calls

Number of calls closed since the last sample.

%[a-b] min

Percentage of closed calls with call duration between a (exclusive) and b (inclusive) minutes.


UGAccountingDelay

The report logs information regarding delayed accounting messages that Cisco RPMS received during the sampling interval.

Table 7-6 UG Accounting Delay Fields

Field
Definition

Timestamp

Time at which the data was sampled.

Accounting Starts

Number of resource allocated messages since the last sample.

%[a-b] sec

Percentage of resource allocated messages with a delay between a (exclusive) and b (inclusive) seconds. The delay for a resource allocated message is the time expired since the corresponding pre-auth message.


UGRejects

The report logs the rejections per UG during the sampling interval.

Table 7-7 UG Reject Fields

Field
Definition

Timestamp

Time at which the data was sampled.

Attempted Calls

Number of attempted calls since the last sample.

Limit Rejections

Number of limit rejections since the last sample.

Limit Rejection %

(Limit Rejections/Attempted Calls) x 100.

VPDN Attempted Calls

Number of attempted VPDN sessions since the last sample.

VPDN Rejections

Number of VPDN rejections since the last sample.

VPDN Rejection %

VPDN Rejections/VPDN Attempted Calls) x 100.


UGTerminations

The report logs information regarding the UG Terminations during the sampling interval.

Table 7-8 UG Termination Fields

Field
Definition

Timestamp

Time at which the data was sampled.

Attempted Calls

Number of attempted calls since the last sample.

Terminations

Number of calls terminated since the last sample.

Termination %

(Terminations/Attempted Calls) x 100.

<Terminate Cause> Terminations

Number of terminations due to a terminate cause listed below. There is one such column for every terminate cause listed below.

<Terminate Cause> Termination %

(Terminate Cause Terminations/Attempted Calls) x 100.


The following list is for all UG (1-18) and Cisco RPMS (19-23) generated terminate causes:

1. User Request

2. Lost Carrier

3. Lost Service

4. Idle Timeout

5. Session Timeout

6. Admin Reset

7. Admin Reboot

8. Port Error

9. NAS Error

10. NAS Request

11. NAS Reboot

12. Port Unneeded

13. Port Preempted

14. Port Suspended

15. Service Unavailable

16. Callback

17. User Error

18. Host Request

19. Another call on the same UG and port

20. Dangling Session

21. UG unreachable

22. No terminate reason from UG

23. Modem Terminate=<reason>

24. Other.

UGRetries

The report contains information UGRetries during the sampling interval.

Table 7-9 UGRetries Fields

Field
Definition

Timestamp

Time at which the data was sampled.

Total Requests

Total number of messages, including retries, of following types Pre-Auth, Accounting Start, Accounting Update, Accounting Stop, VPDN Session, VPDN Connect and VPDN Disconnect.

Retries

Number of message retries for above message types.

Retries %

(Retries/Total Requests) x 100.


System-Related Diagnostics Information

Cisco RPMS logs system-related diagnostics information in the following report files:

PolicyUsage

Message Rate

System Activity

System Latency

System Stats

SystemUGRetries

Cell Durations

MissingAcctStop

CallFailureRates

Threshold Report

Cisco RPMS generates system-related diagnostics reports to help you understand how the system is being utilized. The reports also provide diagnostics information at the highest level. This information helps troubleshoot system issues.

Cisco RPMS logs the system-related diagnostic information in seven reports, located in the following directory structure:

$RPMSBASE/diagnostics/<Date>/system/<ReportType>_<TimeStamp>.csv

"Report Type" is the name of the report.

"Timestamp" is the timestamp for the particular day.

The data in the files are logged at periodic intervals configurable in the [Diagnostics] section of rpms.conf. All data is incremental (i.e., changes in value since last sample time).

PolicyUsage

The report logs the usage of the SLAs of each customer during the sampling interval.

Table 7-10 PolicyUsage Fields

Field
Definition

Customer

Name of the Customer.

Guaranteed Limit Used %

Percentage of guaranteed concurrent sessions used by the customer.

Guaranteed Limit

Number of guaranteed concurrent sessions allowed for a customer.

Shared Overflow Limit Used %

Percentage of Shared overflows session used by the customer.

Shared Overflow Limit

Maximum number of shared overflow sessions allowed after the base and Oversubscription limits have been reached for a customer.

Current Count

Number of calls currently active for this customer.

Current Limit Deficit %

Percentage of the Current Limit Deficit limit to the actual Limit Current Limit Deficit. The value that the limit should currently be raised by, in order to accept all calls.

Peak Limit Deficit %

Percentage of the Peal Limit Deficit to the actual Peak Limit.

Peak Limit Deficit .

The value that the limit should currently be raised by, in order to accept all calls.



Note Cisco RPMS allows the SLA count to be configured as No Limit.

2147483647 corresponds to "No Limit" configured for Session Count Limit.

4294967294 corresponds to "No Limit" configured for both Session Count Limit and Oversubscription limit.


Message Rate

The report shows the different types of messages Cisco RPMS has received during the sampling interval.

Table 7-11 Message Rate Fields

Field
Definition

Timestamp

Time at which the sample was collected.

Table title

Possible Values for this field are Total Input,call-reservation-request,vpdn-tunnel- reservation,call-start,call-stop,call-stop-accept,call-stop-reject, administrative-cdr-log-request.

Total entries

Total number of messages received within the sampling interval.

%[a-b] milliseconds

Percentage of distribution of messages of similar type and nature with a delay between a (inclusive) and b (inclusive) milliseconds. The delay for the message is the time elapsed between the previous messages of the same type.



Note For the Total Input row, the interpretation of %[a-b] milliseconds is the percentage of the distribution of the subsequent message delay during the sample interval.

Call Stop Accept and Call Stop Reject are the messages for future use.


System Activity

The reports logs the active call count of Cisco RPMS during the sampling interval.

Table 7-12 System Activity Fields

Field
Definition

Timestamp

Time at which the sample was collected.

Call Count

Current active call count at that sampling interval.


System Latency

This reports logs the latency for each messages Cisco RPMS has received during the sampling interval.

Table 7-13 System Latency Fields

Field
Definition

Timestamp

Time at which the sample was collected.

Table title

Possible Values for this field are Totals,call-reservation-request+accept-call- reserved,call-reservation-request+accept- call-overflow,call-reservation-request+ reject-call,vpdn-tunnel-reservation+ accept-tunnel,vpdn-tunnel-reservation+ reject-tunnel,vpdn-tunnel-reservation+ no-vpdn-info-found,call-start+ack,c,ll-stop+ack,call-stop-accept+ack,call-stop-reject+ack.

Total entries

Total number of messages processed within the sampling interval.

%[a-b]milliseconds

Percentage of distribution of messages of similar type with a delay between a (inclusive) and b (inclusive) milliseconds. The delay is the time taken to process that particular type of message.



Note For Total Input row, the interpretation of %[a-b] milliseconds is the percentage of the distribution of the subsequent message delay during the sample interval. The delay is the time taken to process the message.


System Stats

The report logs the system related Statistics during the sampling interval.

Table 7-14 SystemStat Fields 

Field
Definition

Timestamp

Time at which the data was sampled.

Statistics Name

Name of the Statistics.

Today

Number of packets received by Cisco RPMS so far today.

Since Restart

Number of packets received by RPMS since last restart.


SystemUGRetries

The report logs the UG Retries information during the sampling interval.

Table 7-15 SystemUGRetries Fields

Field
Definition

Timestamp

Time at which the data was sampled.

UG Name

Name of the UG.

IP

IP address of the UG.

Retries %

(Total Retries/Total Requests) x 100.

Total Retries

Number of message retries for following message types:

Pre-Auth

Accounting Start

Accounting Update

Accounting Stop

VPDN Session

VPDN Connect

VPDN Disconnect

Total Requests

Total number of messages, including retries, of following message types:

Pre-Auth

Accounting Start

Accounting Update

Accounting Stop

VPDN Session

VPDN Connect

VPDN Disconnect


Cell Durations

Cisco RPMS logs call duration related information in this report. The report is generated whenever calls are closed by NAS in short time. Low call durations can be caused due to NAS issues, such as faulty hardware.

Table 7-16 CallDurations Fields

Field
Definition

UG NAME

Name of the UG.

UG IP

IP address of the UG.

Total

Total calls received from the UG.

%[a-b]minutes

Percent of closed calls with call duration between a (inclusive) and b (inclusive) minutes.


MissingAcctStop

Cisco RPMS generates this report whenever NASs fail to send accounting stop to Cisco RPMS or when the accounting stop message is lost. This leads to a disparity between the active call count reported by RPMS and the number of real calls. If this happens, Cisco RPMS continues to count the calls as active even if they are dropped on the NAS. These calls are cleared only from Cisco RPMS by replacement with another call on the same port or after Active Call Timeout.

Table 7-17 MissingAcctStop Fields

Field
Definition

UGName

Name of the UG.

IP

IP Address of the UG.

Missing Acct-Stop%

(Calls w/o Stop)/(TotalAttemptedCalls) * 100.

Total Attempted Calls

Total no of Calls received from the UG.

Calls w/o Stop

Total no of Calls received without AcctStop.


CallFailureRates

The report is generated whenever NASs experience hardware and or software problems while sending calls to Cisco RPMS. This report shows all NASs that had calls terminated due to reasons deemed erroneous. Cisco RPMS considers the following call termination codes as erroneous:

Table 7-18 Erroneous Termination Codes 

Termination Code
Definition

2

Lost Carrier

3

Lost Service

9

NAS Error

13

Port Preempted

14

Port Suspended

15

Service Unavailable


The following are the fields and definitions for CallFailureRates.

Table 7-19 CallFailureRates Fields 

Field
Definition

UG Name

Name of the UG.

UG IP

UG IP Address.

Call FailureRate

(Total Attempted Calls)/(Failed Calls) * 100.

Total Attempted Call

Total calls received from the UG.

Failed Calls .

Number of calls terminated with the erroneous termination code mentioned in the above table.


Threshold Report

The report is generated when Cisco RPMS used resources reach its Threshold limits. This report is generated in the following cases:

When RPMS File Descriptor's is greater than 1000.

When CPU-IDLE time percentage of the Host is less than 30%.

When TCP Connection Lost is greater than 0.

When System free memory percentage is less than 1%.

This report is generated whenever these conditions become true. The report displays the respective Warning Message which occurred in the system. The report also has a complete output of the process running in the system at that instance. In addition to the report, it also has statistical information on Cisco RPMS-related processes, and their usage.

Table 7-20 MissingAcctStop Fields 

Field
Definition

TIME OF DAY

Time stamp at which the report is generated.

Active Calls

Number of calls currently active for this UG.

usr

Percentage of total CPU used in user mode.

sys

Percentage of total CPU used in system mode.

wio

Percentage of total CPU used for I/O.

idle

Percentage of total CPU unused.

free

Free physical memory in KB.

free%

Percentage of free physical memory in KB.

in

Page in requests rate.

out

Page out requests rate.

scan

Paging scan rate.

used

KB of used swap.

free

KB of free swap.

dropped

Number of dropped TCP connections.

npid

Number of Oracle process.

cpu

Percentage of CPU used.

priv

KB of total memory used in private mode by the processes.

cpu

Percentage of CPU used.

siz

KB of memory used.

res

KB of memory used.

cpu

Percentage of CPU used.

siz

KB of memory used.

res

KB of memory used.

fdd

Number of file descriptor currently opened.

cpu

Percentage of CPU used.

siz

KB of memory used.

res

KB of memory used.



Note Single space is the delimiter for this report.


Provisioning Changes

Cisco RPMS tracks all provisioning changes made to the system database by logging all changes to a "Change Log" file. The file records the time, type and details of every change.

The change log file is generated in the $RPMSBASE/diagnostics/<Date>/ directory.The file is updated whenever there is a change to the system database.

Table 7-21 Provisioning Change Fields

Field
Definition

Time

Time at which the change was made to the database.

Action

Signifies the configuration change created, deleted, or modified the specified entity (e.g., CREATE, DELETE, UPDATE).

Entity

The type of configuration object modified.(e.g. Customer, DNIS Group, Trunk Group, NAS List, etc).

Description

Details on the configuration change made to the system.


Overview: Diagnosing End User Issues

You can use the Cisco RPMS diagnostic tools to diagnose common end user problems such as receiving busy signals, no connection, or if connected, having their calls end quickly. Reasons why these issues may occur include:

Oversubscribed Policies

Low Call Durations

Oversubscribed Policies

End users may encounter busy signals due to oversubscribed policies. Oversubscription is a configurable Cisco RPMS feature, and when limits are exceeded, the system should send them busy signals. But end users may not like receiving the busy signals; to keep the signals to a minimum, the diagnostic tools allow you, the Cisco RPMS administrator, to monitor customer activity. You can access exactly which customers have exceeded the limits and are receiving busy signals, and which other customers are quickly approaching their limits and sending rejections. This way, you can track and re-configure extra or overflow ports so that end users do not receive busy signals.

Using the Cisco RPMS Diagnostics Tool to Monitor Customer Activity

To monitor the current status of customer activity, execute the crpms_diag CLI command:

%crpms_diag mon cust

The command's output provides a snapshot of call activity for all customer profiles in the system, similar to this:

Data Updated at: Jul 31, 2002 3:36 PM, PDT
"Customer","Limit Used(%)","Limit","Current Count","Current Limit Deficit(%)","Current 
Limit Deficit","Peak Limit Deficit(%)","Peak Limit Deficit"
customer1,100,1010,1010,20,202,30,303
customer2,0,110,0,0,0,0,0

The customers at or near their limits appear at the top of the list.

Other output provided by the command includes:

Current Limit DeficitThis field displays the amount you should have raised the limit to accept all calls during the past 20 minutes.

Peak Limit DeficitThis field shows the amount you should have raised the limit to accept all calls at any time of the day.

You can use these numbers to change Customer Profile configurations.

Low Call Durations

Another problem end users may encounter is low call durations; the users might not receive busy signals, but their calls may end quickly or may not connect at all.

This is a UG-related issue, caused by the UG hardware. In a large network, it might be difficult to identify problematic UGs causing low call durations.

Using the Cisco RPMS Diagnostics Tool to Troubleshoot Low Call Durations

To display errors, execute the crpms_diag err CLI command:

%crpms_diag err 

The command's output provides a snapshot of errors in the system, similar to this:

RPMS Errors:
No errors today

DB Errors:
No errors today

RPMS Client Errors:
2002-Jul-28:Call Durations. Details in 
/export/home/crpms/diagnostics/2002-Jul-28/system/CallDurations_2002-Jul-28_00-00-07.csv 

In this example, the "RPMS Client Errors" message displays the UGs sending traffic to Cisco RPMS that are experiencing low call durations. It also displays the file which contains further details about the problematic UGs.

Overview: Identifying Policy Enforcement Issues

This section helps identify policy enforcement issues that may occur with Cisco RPMS. The main issue that may occur is an incorrect call count.

Incorrect Active Call Count

Cisco RPMS may report more active calls in the system than actual calls in the network. The disparity between the active call count reported by Cisco RPMS and the number of real calls is caused by missing or delayed accounting messages. Depending on the Cisco IOS release running, accounting messages may occasionally be delayed or dropped.


Note For information on the compatible Cisco IOS releases, refer to the Cisco Resource Policy Management System 2.0.2 Release Notes.


If this occurs, Cisco RPMS marks the calls as active even if they are dropped on the UG. Cisco RPMS only clears the calls when another call replaces the dropped call on the same port.

Using the Cisco RPMS Diagnostics Tool to Determine the Active Call Count

To display errors, execute the crpms_diag err CLI command:


%crpms_diag err 

The command's output provides a snapshot of errors in the system, similar to this:

RPMS Errors:
No errors today

DB Errors:
No errors today

RPMS Client Errors:
2002-Jul-26:Missing Accounting Stops. Details in 
/export/home/crpms/diagnostics/2002-Jul-26/system/MissingAcctStop_2002-Jul-26_00-01-44.csv

In this example, the "RPMS Client Errors" message details that UGs sending traffic to the Cisco RPMS have not been sending accounting stop messages. It prints the name of the file containing additional information about the problematic UGs.

The file shows data in the following format:

Data Updated at: Jul 27, 2002 12:00 AM, PDT
"NasName","IP","Missing Acct-Stop(%)","Total Attempted Calls","Calls w/o Stop",
NAS-1,10.10.1.1,100,107,107

The data details the total calls received from the UG, and the percent of calls from that total which did not receive an accounting stop message. You can use these numbers to compare a low traffic UG with a higher traffic UG.

In the example above, the Cisco RPMS received 107 calls from NAS-1, and 100% of those calls missed the accounting-stop message, which means the UG is not sending the required messages to the Cisco RPMS. With this information, you can look into why the UG and ensure it is running a Cisco RPMS compatible release of the Cisco IOS software.

Overview: Identifying Universal Gateway Issues

Hardware or software problems on the UG can cause service interruptions for the Cisco RPMS. The problems may only affect a few of the UGs, and as such, may be hard to track.

However, to address this issue, Cisco RPMS tracks the call termination codes for the UGs. You can use the crpms_diag CLI command to view the information and to display errors.

Using the Cisco RPMS Diagnostics Tool to Determine Universal Gateway Issues

To display errors, execute the crpms_diag err CLI command:

%crpms_diag err 

The command's output provides a snapshot of errors in the system, similar to this:

RPMS Errors:
No errors today

DB Errors:
No errors today

RPMS Client Errors:
2002-Jul-28:Call Failure Rates. Details in 
/export/home/crpms/diagnostics/2002-Jul-28/system/CallFailureRates_2002-Jul-28_00-00-07.cs
v 

In this example, the "RPMS Client Errors" message displays information about the UGs sending traffic to Cisco RPMS that might be experiencing hardware and or software problems. It also displays the file which contains further details about the problematic UGs.

The file shows data in the following format:

Data Updated at: 16:13:40
"NAS NAME","NAS Apically Failure Rate","Total Attempted Calls","Calls Calls"
NAS-1,10.10.1.1,10,100,10

The data in this example displays all UGs with calls terminated because of errors. In Cisco RPMS, the following call termination codes are considered erroneous.

Table 7-22 Erroneous call termination codes 

Termination Code
Definition

2

Lost carrier.

3

Lost Service

9

UG error.

13

Port preempted.

14

Port suspended.

15

Service unavailable.


The sample output in the example displays that the Cisco RPMS received 100 calls from NAS-1 and that 10% of those calls terminated because of errors. There may be a problem, and you should investigate the indicated UG.

Once the UG is identified, look for a breakdown of all terminated calls by termination code for this UG in the <crpms-home>/diagnostics/<date>/nas/ <nas-ip>/ NasTerminations_<timestamp>.csv file.

Overview: Tracking Provisioning Changes

Cisco RPMS tracks all of the provisioning changes you make to the system database. The changes, such as the time, type and details, are logged to a change Log file.

The change log file is generated in the <crpms-home>/diagnostics/<date>/ directory. It displays data in the following format:


Time	 					                 						Action 			    Entity		      Description
2002-Jul-23 4:41 PM 				  CREATE		     Customer    		Name=cust1, Description=, Call 
Treatment=busy, Non Overflow Limit=100, Overflow Limit=10, Non Overflow Threshold=85, Call 
Reject Threshold=85, Overflow Call Rejection Threshold=10
	
2002-Jul-23 4:42 PM				   CREATE     		Customer-Mapping-Criteria					 Customer=cust1, Dnis 
Group=default, Call Type=digital, SS7 Resource=None, Trunk Group=default				

Overview: Viewing the Cisco RPMS History

The CLI command crpms_diag, as described in previous sections, supports a history option. You can use the history option to display information collected over a certain period of time, and to obtain a historical perspective of the Cisco RPMS system.

Using the Cisco RPMS Diagnostics Tool to View the System History

To display the history option, execute the crpms_diag err hist CLI command. In the following example, you could execute the history option to see any UG errors that occurred in the past seven days:

%crpms_diag err hist 7

Overview: Analyzing Traffic Patterns with Cisco RPMS

You can use the Cisco RPMS diagnostic tools to analyze traffic patterns and collect information useful in planning for capacity or other issues. The information you can collect includes:

Collecting System Statistics

Analyzing System-Wide Traffic

Collecting System Statistics

You can use the CLI command crpms_diag to collect high-level statistics about your Cisco RPMS system. To do so, execute the command as follows:

%crpms_diag stat

The command output should look similar to:

Data Updated at: Jul 24, 2002 2:01 PM, PDT
Statistics Name,Today,Since Restart
CallAccept,5873,265873
ResourceAllocated,65422,265832
ResourceFreed,65356,265356
ResourceUpdate,805,265805
VpdnSession,0,0
VpdnConnect,0,0
VpdnDisconnect,0,0
NasStart,0,0
NasUpdate,65,468
NasStop,0,0
BadPackets,0,0
UnknownPackets,0,0
NoCustomerFound,0,0
CallDiscriminations,0,0

The previous example displays information such as the various types of packets received by Cisco RPMS so far that day, and since the last restart occurred. Cisco RPMS also records the information in a file so that you can retrieve statistics for a particular day from the file generated for that day. The file is generated in the following location:

<crpms-home>/diagnostics/<date>/system/SystemStats_<time>.csv

Analyzing System-Wide Traffic

Cisco RPMS periodically records the total number of active calls in the system. The information is stored in the <crpms-home>/diagnostics/<date>/system/ SystemActivity_<time>.csv file, which is generated daily.

The entries logged in this file are comma separated, so it is easy to import the file into a charting application such as Microsoft Excel, and to plot the active calls over time to get an idea about overall traffic patterns during the day. You can also analyze patterns over a period longer than a day by concatenating multiple files before charting them.

The data recorded in the SystemActivity files looks similar to this:

TimeStamp,Call Count
16:14:27,724
16:14:37,903
16:14:47,910
16:14:57,910
16:15:08,910
16:15:18,910
16:15:28,910
16:15:38,285
16:15:48,308
16:15:58,482
16:16:09,565

Overview: Analyzing Customer-Specific Traffic

With the Cisco RPMS diagnostic tools you can also analyze customer-specific traffic. The information you can collect includes:

Call Count Over Time

Call Duration Distribution Over Time

Call Count Over Time

By using the Cisco RPMS' diagnostic tools, you can profile a customer's traffic pattern over time. Cisco RPMS periodically records all of the active calls for each customer profile in the system. The information is generated daily, and then stored in the <crpms-home>/diagnostics/<date>/customer/<customer-name>/ CustomerCounts_<time>.csv file.

The entries logged in this file are comma separated, so it is easy to import the file into a charting application such as Microsoft Excel, and to plot the active calls over time to get an idea about overall traffic patterns for a customer during the day. You can also analyze patterns over a period longer than a day by concatenating multiple files before charting them.

The data recorded in the CustomerCounts files looks similar to this:

Customer, Cust-1, Counts for Jul-23, 2002
"Timestamp","Session Limit","Oversubscription Limit","Current Count","Current Limit 
Deficit"
16:15:36,900,10,413,238
16:17:36,900,10,692,238
16:19:36,900,10,690,238
16:21:36,900,10,693,238
16:23:37,900,10,706,238
16:25:37,900,10,699,238
16:27:37,900,10,692,238
16:29:37,900,10,683,238
16:31:38,900,10,690,238
16:33:38,900,10,688,238
16:35:38,900,10,696,0
16:37:38,900,10,688,0
16:39:38,900,10,687,0
16:41:38,900,10,674,0
16:43:38,1000,10,696,0

Plotting the Session Limit and the Call Count shows the time of day that a customer profile went into oversubscription mode. It also displays any provisioning changes made to the Session Limit for this profile.

Call Duration Distribution Over Time

Cisco RPMS also periodically records the duration of terminated calls for each customer. The information is sorted into a number of time bins, generated daily, and stored in the <crpms-home>/diagnostics/<date>/customer/<customer-name> /CustomerCallDuration_<time>.csv file.

The entries logged in this file are comma separated, so it is easy to import the file into a charting application such as Microsoft Excel, and to plot the active calls over time to get an idea about overall traffic patterns for a customer during the day. You can also analyze patterns over a period longer than a day by concatenating multiple files before charting them.

The data recorded in the CustomerCallDuration files looks similar to:

Customer, Cust-1, Call Duration Distribution for Jul-23, 2002
"Timestamp","Closed 
Calls",%(0-1]min,%(1-2]min,%(2-4]min,%(4-8]min,%(8-16]min,%(16-32]min,%(32-64]min,%(64-128
]min,%128+min
16:13:46,100,0,0,0,80,0,20,0,0,0
16:13:56,10,0,0,0,90,0,0,10,0,0
16:14:06,0,0,0,0,0,0,0,0,0,0

You can plot all of the bins if you want. The graph for that would show that end users for the sample customer profile (Cust-1) stay online longer in the morning and in late evenings.