Bulk Statistics are the statistics that are gathered over a given time period and written to a set of files. These statistics can be used by external analytic processes and/or network management systems. The architecture of CPS bulk statistic collection is shown below.
The collection utility collectd is used for collecting and storing statistics from each VM. Detailed collectd documentation can be found on http://collectd.org/.
Collectd within CPS is deployed with nodes relaying data using the collectd network plug-in (https://collectd.org/wiki/index.php/Plugin:Network) to the centralized collection nodes on the pcrfclient01 and pcrfclient02 virtual machines. The centralized collector writes the collected data to output CSV files.
![]() Note | Pcrfclient01 and Pcrfclient02 collect bulk statistics independently. As a result, it is normal to have slight differences between the two files. For example, pcrfclient01 will generate a file at time t and pcrfclient02 will generate a file at time t +/- the clock drift between the two machines. |
As a best practice, always use the bulk statistics collected from pcrfclient01. Pcrfclient02 can be used as a backup in the event of failure of pcrfclient01.
In the event that pcrfclient01 becomes unavailable, statistics will still be gathered on pcrfclient02. Statistics data is not synchronized between pcrfclient01 and pcrfclient02, so a gap would exist in the collected statistics while pcrfclient01 is down.
For more information about using Grafana, refer to the Cisco Policy Suite Operations Guide.
The list of statistics available in CPS is consolidated in an Excel spreadsheet. After CPS is installed, this spreadsheet can be found in the following location on the Cluster Manager VM:
/var/qps/install/current/scripts/documents/QPS_statistics.xlsx
The following diagram represents the various statistic gathering points for incoming and outgoing messages.
Measurements Legend
A – Inbound queue counts and times*
B – policy action counts and times
c – interface specific counts and times
D – policy message counts and times
E – outbound queue counts and times*
F – round trip counts and times*
* – statistics only apply to diameter messages
A brief description of each statistic gathering points is given below:
Upon receipt of a message on the Policy Director (lb) node, the message is registered as received and forwarded to a middle tier processing node.
This middle tier processing node tracks the inbound message counts and time spent within the inbound processing queue. If a message is discarded due to SLA violation, then counters are incremented at this point. This occurs at point A within the diagram.
Upon arrival within the policy engine all messages are counted and timers are started to measure the duration of processing.
Any internal or external actions are tracked at this point and the round trip time is measured from the policy engine invocation of the action and success or failure of the action. This occurs at point B within the diagram.
For external actions (e.g. LDAP), interface specific statistics maybe captured. This occurs at point C in the diagram and is gathered from the Policy Director nodes.
Upon completion of the message in the policy engine, the total elapsed time is measured and whether success or failure occurred in processing.
![]() Note | A message is considered a success even if the policy returns an error (such as 5002). These application errors are tracked at point D within the diagram. |
Outbound messages are tracked from the policy engine to the Policy Directors at point E within the diagram.
Upon receipt of outbound messages, the Policy Directors tracks either end to end completion time for inbound requests OR starts a timer and counts outbound requests. This occurs at point F within the diagram.
This section describes various forms of statistics generated by CPS.
In Diameter statistics, Monitoring Areas are defined on the basis of Queues maintained in it. Diameter statistics can also be defined based on whether the statistic is related to a counter or gauge.
Counter: Counter type represents a non-negative integer which monotonically increases until it reaches a maximum value of 2^32-1 (4294967295 decimal), when it resets and starts increasing again from zero.
Counters have no defined “initial” value, and thus, a single value of a Counter has (in general) no information content. You must take a delta of multiple readings to understand anything.
Gauge: Gauge type represents a non-negative integer, which can increase or decrease, but can never exceed a maximum value, nor fall below a minimum value. The maximum value can not be greater than 2^32-1 (4294967295 decimal), and the minimum value can not be smaller than 0.
CPS tracks LDAP statistics for general LDAP actions, LDAP query counters, LDAP connection counters, as well as message counters.
Categories:
RADIUS server statistics are defined based on two categories:
System statistics are defined based on six categories:
Engine statistics are defined based on three categories:
API statistics are defined based on five categories: Bearer Count, Tenant Onboarding Count, Subscriber Onboarding Count, Authentication Count and Callback Response Statistics.
Counter for the number of default and dedicated bearers related to API requests.
Provides the statistics for default and dedicated bearers related to API requests.
Counter for the number of tenant onboarding related to API requests.
Provides the statistics for tenant onboarding related to API requests.
Counter for the number of subscriber onboarding related to API requests.
Provide the statistics for subscriber onboarding related to API requests.
Error Statistics |
Description |
---|---|
node1.messages.*.error |
Failure processing a message |
e2e*_qns_stat.error |
Count of occurrence for given diameter result code |
pe-submit-error |
Error submitting to policy engine |
_bypass |
Message not sent to policy engine due to successful response (2001) |
_drop |
Message dropped due to SLA violation |
rate-limit |
Message dropped due to rate limiting violation |
![]() Note | The Diameter E2E statistics with the suffix “error” always have a value of 0 (zero) unless they have “_late” in the statistic name. |
By default, CPS outputs a bulk statistics CSV file to the /var/broadhop/stats/ directory on the pcrfclient01 and pcrfclient02 VMs in five minute intervals.
The default naming standard is bulk-hostname-YYYY-MM-DD-HH-MI.csv
These CSV files include all statistics collected from all VMs during the 5 minute interval.
![]() Note | If a statistic is generated by the system multiple times within the 5 minute interval, only the last measured statistic is collected in the CSV file. |
The following list is a sample of the file names created in the /var/broadhop/stats/ directory on the pcrfclient01 VM.
[root@pcrfclient01 stats]# pwd /var/broadhop/stats [root@pcrfclient01 stats]# ls bulk-pcrfclient01-201510131350.csv bulk-pcrfclient01-201510131355.csv bulk-pcrfclient01-201510131400.csv bulk-pcrfclient01-201510131405.csv bulk-pcrfclient01-201510131410.csv bulk-pcrfclient01-201510131415.csv bulk-pcrfclient01-201510131420.csv bulk-pcrfclient01-201510131425.csv bulk-pcrfclient01-201510131430.csv bulk-pcrfclient01-201510131435.csv bulk-pcrfclient01-201510131440.csv bulk-pcrfclient01-201510131445.csv bulk-pcrfclient01-201510131450.csv bulk-pcrfclient01-201510131455.csv bulk-pcrfclient01-201510131500.csv bulk-pcrfclient01-201510131505.csv bulk-pcrfclient01-201510131510.csv bulk-pcrfclient01-201510131515.csv bulk-pcrfclient01-201510131520.csv bulk-pcrfclient01-201510131525.csv bulk-pcrfclient01-201510131530.csv bulk-pcrfclient01-201510131535.csv bulk-pcrfclient01-201510131540.csv bulk-pcrfclient01-201510131545.csv bulk-pcrfclient01-201510131550.csv bulk-pcrfclient01-201510131555.csv bulk-pcrfclient01-201510131600.csv bulk-pcrfclient01-201510131605.csv bulk-pcrfclient01-201510131610.csv bulk-pcrfclient01-201510131615.csv bulk-pcrfclient01-201510131620.csv bulk-pcrfclient01-201510131625.csv bulk-pcrfclient01-201510131630.csv
By default, CSV files are generated every 5 minutes. To change this interval:
Changing the interval to a lower value allows for easier identification of peaks and valleys in response time. However, only the last statistic measured during a 5 minute period is reported in the CSV file and this fact should be taken into account when interpreting the bulk statistics.
CPS retains each bulk statistic CSV file on the pcrfclient01/02 VM for 2 days, after which the file is automatically removed. If you need to preserve these CSV files, you must back up or move them to an alternate system.
Configuration of the CPS application statistics is controlled in the /etc/collectd.d/logback.xml file.
Refer to http://logback.qos.ch/manual/appenders.html for more information about the configuration of the logback.xml file.
Collectd is configured in the following files:
After making any configuration changes to logback.xml, restart the collectd service:
service collectd restart
By default, the diameter statistics that are generated do not include the realm names. To include realms in the statistics collected, add the following line in the qns.conf file (comma separated auth-appl-id).
-Ddiameter.appid.realm.stats=Auth-Appl-Id-1,Auth-Appl-Id-2,… Auth-Appl-Id-n
where each Auth-Appl-Id refers to the specific protocol's Auth-Application-Id for which realms are needed in the statistics.
For example, to add Gx, Gy, Rx and Sy realms to the statistic names, use the following Auth-Appl-Ids:
-Ddiameter.appid.realm.stats=16777238,16777235,16777236,9
where
Gx Auth-Application-ID = 16777238
Rx Auth-Application-ID = 16777236
Gy Auth-Application-ID = 4
Sy Auth-Application-ID = 7
![]() Note | Adding a realm will increase the number of statistics generated/collected. Add realms only when necessary. |
As an example, statistic names with and without the realms are shown below for reference for the following statistic:
e2e_<domain>_[realm_][alias_]<message id>
Counter name with Realm (with qns.conf file modification):
C,lb02,node2.messages.e2e_PHONE_sy-ac.cisco.com_AC_Syp_AAR_2001.qns_stat.success,528
C,lb02.node2.messages.e2e_PHONE_sy-bm.cisco.com_BM_Syp_AAR_2001.qns_stat.success,1221
Counter name without Realm (without qns.conf file modification):
C,lb01,node2.messages.e2e_PHONE_AC_Syp_AAR_2001.qns_stat.success,1495
C,lb01,node2.messages.e2e_PHONE_BM_Syp_AAR_2001.qns_stat.success,4
Each statistic field has a fixed maximum length of 63 characters. Based on the current syntax, the length of the realm should not exceed 16 characters, otherwise it will lead to truncation of the counter name.
The following list is a sample of the file names created in the /var/broadhop/stats directory on the pcrfclient01 VM.
[root@pcrfclient01 stats]# pwd /var/broadhop/stats [root@pcrfclient01 stats]# ls bulk-pcrfclient01-201510131350.csv bulk-pcrfclient01-201510131355.csv bulk-pcrfclient01-201510131400.csv bulk-pcrfclient01-201510131405.csv bulk-pcrfclient01-201510131410.csv bulk-pcrfclient01-201510131415.csv bulk-pcrfclient01-201510131420.csv bulk-pcrfclient01-201510131425.csv bulk-pcrfclient01-201510131430.csv bulk-pcrfclient01-201510131435.csv bulk-pcrfclient01-201510131440.csv bulk-pcrfclient01-201510131445.csv bulk-pcrfclient01-201510131450.csv bulk-pcrfclient01-201510131455.csv bulk-pcrfclient01-201510131500.csv bulk-pcrfclient01-201510131505.csv bulk-pcrfclient01-201510131510.csv bulk-pcrfclient01-201510131515.csv bulk-pcrfclient01-201510131520.csv bulk-pcrfclient01-201510131525.csv bulk-pcrfclient01-201510131530.csv bulk-pcrfclient01-201510131535.csv bulk-pcrfclient01-201510131540.csv bulk-pcrfclient01-201510131545.csv bulk-pcrfclient01-201510131550.csv bulk-pcrfclient01-201510131555.csv bulk-pcrfclient01-201510131600.csv bulk-pcrfclient01-201510131605.csv bulk-pcrfclient01-201510131610.csv bulk-pcrfclient01-201510131615.csv bulk-pcrfclient01-201510131620.csv bulk-pcrfclient01-201510131625.csv bulk-pcrfclient01-201510131630.csv
A sample bulk statistics .csv file is shown below:
C,node3.messagesmessages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.success,99254 C,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.error,0 D,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.total_time_in_ms,1407 G,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.avg,0.0 C,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.success,0 C,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.error,99294 D,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.total_time_in_ms,0 G,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.avg,0.0 C,node3.messages.e2e_cisco.com_Gx_CCR-I_late_2001.qns_stat.success,0 C,node3.messages.e2e_cisco.com_Gx_CCR-I_late_2001.qns_stat.error,40 D,node3.messages.e2e_cisco.com_Gx_CCR-I_late_2001.qns_stat.total_time_in_ms,0 G,node3.messages.e2e_cisco.com_Gx_CCR-I_late_2001.qns_stat.avg,0.0 C,node3.messages.e2e_cisco.com_Gx_RAR_late_5002.qns_stat.success,0 C,node3.messages.e2e_cisco.com_Gx_RAR_late_5002.qns_stat.error,99294 D,node3.messages.e2e_cisco.com_Gx_RAR_late_5002.qns_stat.total_time_in_ms,0 G,node3.messages.e2e_cisco.com_Gx_RAR_late_5002.qns_stat.avg,0.0 C,node4.messages.e2e_Mobile_Sy_STR_2001.qns_stat.success,99290 C,node4.messages.e2e_Mobile_Sy_STR_2001.qns_stat.error,0 D,node4.messages.e2e_Mobile_Sy_STR_2001.qns_stat.total_time_in_ms,235 G,node4.messages.e2e_Mobile_Sy_STR_2001.qns_stat.avg,0.0 C,node4.messages.e2e_Mobile_Sy_SLR_2001.qns_stat.success,99290 C,node4.messages.e2e_Mobile_Sy_SLR_2001.qns_stat.error,0 D,node4.messages.e2e_Mobile_Sy_SLR_2001.qns_stat.total_time_in_ms,182 G,node4.messages.e2e_Mobile_Sy_SLR_2001.qns_stat.avg,0.0 C,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.success,99254 C,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.error,0 D,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.total_time_in_ms,1407 G,node3.messages.e2e_Mobile_cisco.com_Gx_CCR-I_2001.qns_stat.avg,0.0 C,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.success,0 C,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.error,99294 D,node3.messages.e2e_cisco.com_Gx_RAR_late_5xxx.qns_stat.total_time_in_ms,0