The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
CPS system and application statistics and Key Performance Indicators (KPI) are collected by the system and can be displayed using a browser-based graphical metrics tool. This chapter provides a high-level overview of the tools CPS uses to collect and display these statistics.
The list of statistics available in CPS is consolidated in an Excel spreadsheet. After CPS is installed, this spreadsheet can be found in the following location on the Cluster Manager VM:
/var/qps/install/current/scripts/documents/QPS_statistics.xlsx
Collected clients running on all CPS Virtual Machines (such as Policy Server (QNS), Policy Director (LB), and sessionmgr) push data to the Collected master on the pcrfclient01. The Collected master node in turn forwards the collected data to the Graphite database on the pcrfclient01.
The Graphite database stores system-related statistics such as CPU usage, memory usage, and Ethernet interface statistics, as well as application message counters such as Gx, Gy, and Sp.
Pcrfclient01 and pcrfclient02 collect and store these bulk statistics independently.
As a best practice, always use the bulk statistics collected from pcrfclient01. Pcrfclient02 can be used as a backup if pcrfclient01 fails.
In the event that pcrfclient01 becomes unavailable, statistics will still be gathered on pcrfclient02. Statistics data is not synchronized between pcrfclient01 and pcrfclient02, so a gap would exist in the collected statistics while pcrfclient01 is down.
Note | It is normal to have slight differences between the data on pcrfclient01 and pcrfclient02. For example, pcrfclient01 will generate a file at time t and pcrfclient02 will generate a file at time t +/- clock drift between the two machines. |
To learn more about Grafana, refer to: http://graphite.readthedocs.org/en/latest/
For a list of all functions that can be used to transform, combine and perform computations on data stored in Graphite, refer to: http://graphite.readthedocs.org/en/latest/functions.html
Grafana is a third-party metrics dashboard and graph editor provided with CPS 7.0 and higher.
Grafana provides a graphical or text-based representation of statistics and counters collected in the Graphite database.
Note | Grafana supports maximum five concurrent users. |
This chapter provides information about the CPS implementation of Grafana. For more information about Grafana, or access the general Grafana documentation, refer to: http://docs.grafana.org.
In CPS 7.0.5 and higher releases, users must be authenticated to access Grafana. No default users are provided. In order to access Grafana, you must add at least one user as described in the following sections.
The steps mentioned in the sections describe how to add and delete users who are allowed view-only access of Grafana. In order to create or modify dashboards, refer to Grafana Administrative User.
After adding or deleting a Grafana user, manually copy the /var/broadhop/.htpasswd file from the pcrfclient01 VM to the pcrfclient02 VM.
Also, run /var/qps/bin/support/grafana_sync.sh to synchronize the information between two OAM (pcrfclient) VMs.
There is no method to change the password for a Grafana user; you can only add and delete users. The change_passwd.sh script cannot be used to change the password for Grafana users.
Log on to the pcrfclient01 VM to perform any of the following operations.
/usr/bin/htpasswd -D /var/broadhop/.htpasswd user2 |
Use the following URL to access Grafana.
When prompted, enter the username and password of a user you created in Configure Grafana Users using CLI.
To create or modify dashboards in Grafana, you must log in as the Grafana administrative user.
Note | The steps mentioned here can be performed only by administrative user. |
You can also change the rights of the user from the main page.
Note | The steps mentioned here can be performed only by administrative user. |
Grafana supports multiple organizations in order to support a wide variety of deployment models, including using a single Grafana instance to provide service to multiple potentially untrusted Organizations.
In many cases, Grafana will be deployed with a single Organization. Each Organization can have one or more Data Sources. All Dashboards are owned by a particular Organization.
Note | The steps mentioned here can be performed only by administrative user. |
Note | The steps mentioned here can be performed only by administrative user. |
After an initial installation or after upgrading an existing CPS deployment which used Grafana, you must perform the steps in the following sections to validate the existing data sources.
By default, Grafana is configured to have two Data Sources, as shown below. Unless instructed by a Cisco representative, you do not need to modify these Data Sources.
After CPS is installed or upgraded, perform the following steps to verify the integrity of the data sources.
Step 1 | Log in as the Grafana Administrative User. |
Step 2 | Click
Data
Sources.
If the data sources screen appears as shown below, proceed to Migrate Existing Grafana Dashboards. If there are errors or the screen does not appear as shown, refer to Repair Data Sources. |
Step 3 | To finalize these data source connections, click Edit, then click Save. Perform these actions separately for each data source. |
If a data source connection is missing or becomes corrupted, use the following steps to recreate the data sources.
Step 1 | Navigate to the Data Sources screen as described in Validate and Finalize Grafana Data Sources. |
Step 2 | Delete any existing corrupted data sources by clicking the red X. |
Step 3 | Click
Add
new at the top of the screen, then enter the following information:
Name: Graphite Via UI Default: Select this checkbox. Type: Graphite (default) URL: http://127.0.0.1/graphite Access: proxy (default) Basic Auth: leave unchecked |
Step 4 | Click Add. |
Step 5 | Click
Add
new to create a second data source, then enter the following
information.
Name: Elasticsearch Via UI Default: Leave unchecked. Type: Elasticsearch (via pulldown) URL: http://127.0.0.1/elasticsearch Access: proxy (default) Basic Auth: leave unchecked Index name: grafana-dash Pattern: No pattern (default) Time field name: @timestamp (default) |
Step 6 | Click Add. |
Step 7 | Click Save. The repair steps are complete. |
During an upgrade of CPS (and Grafana), saved dashboard templates remain intact.
After upgrading an existing CPS deployment, you must manually migrate any existing Grafana dashboards.
Step 1 | Sign in as the Grafana Administrative User. For more information, refer to Grafana Administrative User. |
Step 2 | Click
Home at the top of the Grafana window and then click
Import as shown below:
|
Step 3 | In the Migrate
dashboards section, verify that
Elasticsearch Def (Elasticsearch Default via API) is
listed, then click
Import.
|
Step 4 | All existing dashboards are imported and should now be available. |
Grafana enables you to create custom dashboards which provide graphical representations of data by fetching information from the Graphite database. Each dashboard is made up of panels spread across the screen in rows.
Note | CPS includes a series of preconfigured dashboard templates. To use these dashboards, refer to Updating Imported Templates. |
Step 1 | Sign-in as a Grafana Administrative user. For more information, see Grafana Administrative User. |
Step 2 | Click
Home at the top of the Grafana window and select
New as shown below:
A blank dashboard is created. |
Step 3 | At the top of
the screen, click the gear icon, then click
Settings.
|
Step 4 | Provide a name
for the dashboard and configure any other Dashboard settings. When you have
finished, click the
X icon in the upper right corner to close the
setting screen.
|
Step 5 | To add a graph
to this dashboard, hover over the green box on the left side of the dashboard,
then point to
Add
Panel, then click
Graph.
|
The following section describes the configuration of several useful dashboard panels that can be used while processing Application Messages. Configure the dashboard panel as shown in the screens below.
This dashboard panel lists the errors found during the processing of Application Messages. To configure Total Error dashboard panel, create a panel with name 'Total Error' and configure its query as shown:
This dashboard panel displays the total delay in processing various Application Messages. To configure Total Delay dashboard panel, create a panel with name Total Delay and configure its query as shown:
This panel displays the total TPS of CPS system. Total TPS count includes all Gx, Gy, Rx, Sy, LDAP and so on. The panel can be configured as shown below:
Some of the preconfigured templates (such as Diameter statistics panels) have matrices configured which are specific to a particular set of Diameter realms. These panels need to be reconfigured to match customer specific Diameter realms.
As a best practice, the internal Grafana database should be kept in sync between pcrfclient01 and pcrfclient02. This sync operation should be performed after any dashboard or Grafana user is migrated, updated, added or removed.
Under normal operating conditions, all Grafana operations occur from pcrfclient01. In the event of a pcrfclient01 failure, pcrfclient02 is used as backup, so keeping the database in sync provides a seamless user experience during a failover.
The following steps copy all configured Grafana dashboards, Grafana data sources, and Grafana users configured on pcrfclient01 to pcrfclient02.
Log in to the pcrfclient01 VM and run the following command:
/var/qps/bin/support/grafana_sync.sh
As a precaution, the existing database on pcrfclient02 is saved as a backup in the /var/lib/grafana directory.
Check if the following changes are already present in the jmxplugin.conf file. If already configured, then skip this section and move to configuring the Grafana dashboard.
Step 1 | Edit /etc/puppet/modules/qps/templates/collectd_worker/collectd.d/jmxplugin.conf on the Cluster Manager VM as described in the following steps. |
Step 2 | Verify that the
JMX plugin is enabled. The following lines must be present in the
jmxplugin.conf file.
JVMARG has path for jmx jar JVMARG -Djava.class.path=/usr/share/collectd/java/collectd-api.jar/usr/share/collectd/java/generic-jmx.jar And GenericJMX plugin is loaded LoadPlugin org.collectd.java.GenericJMX |
Step 3 | Add an Mbean
entry for garbage collector mbean in GenericJMX plugin so that statistics from
this mbean will be collected.
# Garbage collector information <MBean "garbage_collector"> ObjectName "java.lang:type=GarbageCollector,*" InstancePrefix "gc-" InstanceFrom "name" <Value> Type "invocations" #InstancePrefix "" #InstanceFrom "" Table false Attribute "CollectionCount" </Value> <Value> Type "total_time_in_ms" InstancePrefix "collection_time" #InstanceFrom "" Table false Attribute "CollectionTime" </Value> </MBean> |
Step 4 | For every
“Connection” block in
jmxplugin.conf file add the entry for garbage
collector mbean.
For example: <Connection> InstancePrefix "node1." ServiceURL "service:jmx:rmi:///jndi/rmi://localhost:9053/jmxrmi" Collect "garbage_collector" Collect "java-memory" Collect "thread" Collect "classes" Collect "qns-counters" Collect "qns-actions" Collect "qns-messages" </Connection>] |
Step 5 | Save the changes
to the
jmxplugin.conf file then synchronize the changes
to all CPS VMs as follows:
|
The frontend changes must be done in the Grafana GUI.
Step 1 | Create a new Grafana dashboard. For more information, see Manual Dashboard Configuration using Grafana. |
Step 2 | In the
Metrics tab of the new dashboard, configure queries
for GC related KPIs.
The query needs to be configured in the following format: cisco.quantum.qps.<hostname>.node*. gc*.total_time_in_ms-collection_time cisco.quantum.qps.<hostname>.node*.gc*.invocations where, <hostname> is regular expression for the name of hosts from which KPI needs to be reported. If this is a CPS All in One (AIO) deployment, the hostname is “lab”. If this is a High Availability (HA) CPS deployment, KPIs need to be reported from all Policy Server (QNS) VMs. Assuming the Policy Server (QNS) VMs have “qns” in their hostname, then a regular expression would be *qns*. This would report data for all VMs that have a hostname containing “qns” (qns01 qns02 and so on). |
Step 3 | Save the dashboard by clicking on Save icon. |
Existing dashboard templates can be exported and imported between environments. This is useful for sharing Grafana dashboards with others.
This topic describes how to export a dashboard configuration to a file.
Step 1 | Sign-in as a Grafana Administrative User. |
Step 2 | Open the dashboard to be exported. |
Step 3 | Click the gear
icon at the top of the page, and then select
Export to save the dashboard configuration on your
local system.
|
Step 4 | If prompted, select the location on your local system to save the dashboard template, and click OK. |
This topic describes how to import a dashboard from a file.
Step 1 | Sign-in as a Grafana Administrative User. | ||
Step 2 | Click
Home at the top of the Grafana window, and then
click
Import as shown below.
| ||
Step 3 | Click
Choose
File.
| ||
Step 4 | Select the file on your local system to save the dashboard template and click Open. | ||
Step 5 | After the
dashboard is loaded, click the disk icon (Save dashboard) at the top of the
screen to save the dashboard.
|
This topic describes how to export the data in a graph panel to a CSV file.
This feature generates the session consumption report and stores the data into a separate log. The total number of sessions limited by the license, the total number of active sessions, and total transactions per second are documented at regular time intervals into the log. The core license number is derived from the license file that has the total number of sessions limited by the license. The active session count and the transaction count has been taken from Grafana using the graphite query. A single entity of the feature mainly prints the current time stamp with the statistics values.
The session and TPS count is collected from the graphite API with a JSON response. The JSON response is then parsed to get the counter, which is then logged into the consolidated log. The sample URL and the JSON response are given below:
> curl “http://localhost:8008/render?target=cisco.quantum.qps.pcrfclient01.set_session_count_ total.records&from=-20second&until=-0hour&format=json” > [{"target": "cisco.quantum.qps.localhost.set_session_count_total.records", "datapoints": [[3735.42, 1455148210], [3748.0, 1455148220]]}] > curl “http://localhost:8008/render?target=sumSeries(cisco.quantum.*.*.node*.messages.e2e*.success) &from=-20second&until=-0hour&format=json” > [{"target": "sumSeries(cisco.quantum.*.*.node*.messages.e2e*.success)", "datapoints": [[2345.34324, 1455148210], [2453.23445453, 1455148220]]}]
Data logging is done using the logback mechanism. The consolidated data that is generated is stored in a separate log file named consolidated-sessions.log inside the /var/log/broadhop directory along with other logs. The data entries are appended to the log every 90 seconds. The logs generated are detailed and have the counter name and the current value with the time stamp.
The codebase pulls the JSON response from the Graphite API. The overhead by the codebase adds an average of 350 ms of time.
A log rotation policy is applied on the logs generated for the session Consumption Report. The file size limitation for each log file is 100 MB. The limitation on number of log files is 5. The logs get rotated after reaching the limitations. One file contains a little more than two years of data, so five such files can contain 10 years of data until the first file get replaced.
2016-02-15 20:30:01 - TPS_COUNT: 6440.497603 SESSION_COUNT: 200033.0 LICENSE_COUNT: 10000000 2016-02-15 20:31:31 - TPS_COUNT: 6428.235699999999 SESSION_COUNT: 201814.0 LICENSE_COUNT: 10000000 2016-02-15 20:33:01 - TPS_COUNT: 5838.386624000001 SESSION_COUNT: 204818.0 LICENSE_COUNT: 10000000 2016-02-15 20:34:31 - TPS_COUNT: 6266.777699999999 SESSION_COUNT: 208719.0 LICENSE_COUNT: 10000000 2016-02-15 20:36:01 - TPS_COUNT: 6001.863687 SESSION_COUNT: 211663.0 LICENSE_COUNT: 10000000 2016-02-15 20:37:31 - TPS_COUNT: 6528.9450540000025 SESSION_COUNT: 213976.0 LICENSE_COUNT: 10000000 2016-02-15 20:39:01 - TPS_COUNT: 6384.073428 SESSION_COUNT: 218851.0 LICENSE_COUNT: 10000000 2016-02-15 20:40:31 - TPS_COUNT: 6376.373494000002 SESSION_COUNT: 220515.0 LICENSE_COUNT: 10000000 2016-02-15 20:42:01 - TPS_COUNT: 6376.063389999998 SESSION_COUNT: 222308.0 LICENSE_COUNT: 10000000 2016-02-15 20:43:31 - TPS_COUNT: 6419.310694000001 SESSION_COUNT: 223146.0 LICENSE_COUNT: 10000000 2016-02-15 20:45:01 - TPS_COUNT: 6455.804928 SESSION_COUNT: 222546.0 LICENSE_COUNT: 10000000 2016-02-15 20:46:31 - TPS_COUNT: 6200.357029999999 SESSION_COUNT: 223786.0 LICENSE_COUNT: 10000000 2016-02-15 20:48:02 - TPS_COUNT: 6299.090987 SESSION_COUNT: 223973.0 LICENSE_COUNT: 10000000 2016-02-15 20:49:31 - TPS_COUNT: 6294.876452 SESSION_COUNT: 226629.0 LICENSE_COUNT: 10000000 2016-02-15 20:51:01 - TPS_COUNT: 6090.202965999999 SESSION_COUNT: 227581.0 LICENSE_COUNT: 10000000 2016-02-15 20:52:31 - TPS_COUNT: 6523.586347999997 SESSION_COUNT: 228450.0 LICENSE_COUNT: 10000000 2016-02-15 20:54:01 - TPS_COUNT: 5842.613997000001 SESSION_COUNT: 229334.0 LICENSE_COUNT: 10000000 2016-02-15 20:55:31 - TPS_COUNT: 6638.526543 SESSION_COUNT: 232683.0 LICENSE_COUNT: 10000000 2016-02-15 20:57:01 - TPS_COUNT: 6073.7797439999995 SESSION_COUNT: 230466.0 LICENSE_COUNT: 10000000 2016-02-15 20:58:31 - TPS_COUNT: 6354.272679999999 SESSION_COUNT: 234070.0 LICENSE_COUNT: 10000000 2016-02-15 21:00:03 - TPS_COUNT: 6217.872034999999 SESSION_COUNT: 236139.0 LICENSE_COUNT: 10000000