Clearing
Procedures
Component Notifications
The following table provides the information related to clearing procedures for component notifications:
Application Notifications
The following section provides the information related to clearing procedures for application notifications:
License
- LMGRD related:
-
License Usage Threshold Exceeded: This alarm is generated when the current number of session usage exceeds the License Usage Threshold Percentage value configured in the Policy Builder under . CPS Alarm/Trap message contains the following key words:
"InterfaceID=" this keyword indicates the threshold value.
"severity=" this keyword indicates severity associated to the threshold. The severity value includes: Alarm Code: 1111 - LICENSE_THRESHOLD
Table 2 License Usage Threshold Exceeded Possible Cause
Corrective Action
The current number of session usage exceeds the License Usage Threshold Percentage value.
Option 1: Purchase a license file having larger licensed session number.
Option 2: Adjust License Usage Threshold Percentage value configured in Policy Builder.
-
LicenseSessionCreation: This alarm is generated when CPS does not allow new CPS session to be created.
Alarm Code: 1104 - ERROR_SESSION_CREATION
Table 3 LicenseSessionCreation Possible Cause
Corrective Action
CPS is running in Developer mode and the current number of session usage is > 100.
Clear 'DeveloperMode' flag to annotate the following to make sure the consistency:
CPS "CORE" license related error: -
InvalidLicense: This alarm is generated when CPS license has an error. The error could be any of the followings: CPS Alarm/Trap message format:
"InterfaceID=" keyword indicates the license name.
"license_state=" keywork indicates license state.
CPS defined license sate includes: Alarm Code: 1110 - ERROR_LICENSE
-
DeveloperMode: This alarm is generated when CPS is running in DeveloperMode. CPS keeps reminding the user that system is running in Developer Mode and instructs on how to clear the Developer Mode. CPS is running in Deveoper Mode, number of concurrent session is limited to 100.
Alarm/Trap message: Using Developer mode (100 session limit). To use a license file, remove -Dcom.broadhop.developer.mode from /etc/broadhop/qns.conf file.
Alarm Code: 1105 - ERROR_DEVELOPER_MODE
-
-
Smart Licensing related: -
License Usage Threshold Exceeded: This alarm is generated when the current number of session usage exceeds the License Usage Threshold Percentage value configured in the Policy Builder under . CPS Alarm/Trap message contains the following key words:
"InterfaceID=" this keyword indicates the threshold value.
"severity=" this keywod indicates severity associated to the threshold. The severity value includes: Alarm Code: 1111 - LICENSE_THRESHOLD
Table 6 License Usage Threshold Exceeded Possible Cause
Corrective Action
The current number of session usage exceeds the License Usage Threshold Percentage value.
Option 1: Purchase more license session count.
Option 2: Adjust License Usage Threshold Percentage value configured in Policy Builder.
-
LicenseSessionCreation: This alarm is generated when CPS does not allow new CPS session to be created.
Alarm Code: 1104 - ERROR_SESSION_CREATION
Table 7 LicenseSessionCreation Possible Cause
Corrective Action
-
InvalidLicense: This alarm is generated when CPS license status is not VALID. The error could be any of the followings: CPS Alarm/Trap message format:
"InterfaceID=" keyword indicates the license name.
"license_state=" keywork indicates license state.
CPS defined license sate includes: Alarm Code: 1110 - ERROR_LICENSE
-
DeveloperMode: This alarm is generated when CPS is running in DeveloperMode. CPS keeps reminding the user that system is running in Developer Mode and instructs on how to clear the Developer Mode. CPS is running in Deveoper Mode, number of concurrent session is limited to 100.
Alarm/Trap message: Using Developer mode (100 session limit). To use a license file, remove -Dcom.broadhop.developer.mode from /etc/broadhop/qns.conf file.
Alarm Code: 1105 - ERROR_DEVELOPER_MODE
Table 9 DeveloperMode Possible Cause
Corrective Action
CPS allows new session to be created. CPS is running in DeveloperMode and CPS current session usage is <= 100.
Message: Using Developer mode (100 session limit). To use a license file, remove -Dcom.broadhop.developer.mode from /etc/broadhop/qns.conf file.
Clear 'DeveloperMode' flag to annotate the following to make sure the consistency:
-
Other Alarms
-
PoliciesNotConfigured: The alarm is generated when the policy engine cannot find any policies to apply while starting up. This may occur on a new system, but requires immediate resolution for any system services to operate.
Alarm Code: 1001
This alarm is generated when server is started or when Publish operation is performed. As indicated by the down status, policy configurations contains error - PB Configurations converted CPS Rules are failed. Message contains the error detail. Table 10 PoliciesNotConfigured - 1001 Possible Cause
Corrective Action
This event is raised when exception occurs while converting policies to policy rules.
Message: 1001 Policies not configured.
Log file is logged with error message Exception stack trace is loggedCorrective action needs to be taken as per the log message and corresponding configuration error needs to be corrected as mentioned in the logs.
Alarm Code: 1002
This alarm is generated when diagnostics.sh runs which provides last success/failure policies message.
The corresponding notification appears when Policy Builder configurations converted CPS rules are failed during validation against "validation-rules".
Corrective action needs to be taken as per the log message and diagnostic result. Corresponding configuration error needs to be corrected as mentioned in the logs and diagnostic result. Table 11 PoliciesNotConfigured - 1002 Possible Cause
Corrective Action
This event is raised when policy engine is not initialized.
Message: Last policy configuration failed with the message: Policy engine is not initialized
Log file is logged with the warning message: Policy engine is not initialized
Make sure that policy engine is initialized.
This event occurs when non policy root object exists.
Message: Last policy configuration failed with the message: Policy XMI file contains non policy root object
Log file is logged with the error message: Policy XML file contains non policy root object.
To add policy root object in Policies.
This event occurs when policy does not contain a root blueprint.
Message: Last policy configuration failed with the message: Policy Builder configurations does not have any Policies configured under Policies Tab.
Log file is logged with the error message: Policy does not contain a root blueprint. Please add one under the policies tab.
To add configures in Policies tab.
The event occurs when configured blueprint is missing.
Message: Last policy configuration failed with the message: There is a configured blueprint <configuredBlueprintId> for which the original blueprint is not found <originalBluePrintId>. You are missing software on your server that is installed in Policy Builder.
Log file is logged with the error message: There is a configured blueprint <configuredBlueprintId> for which the original blueprint is not found <originalBluePrintId>. You are missing software on your server that is installed in Policy Builder.
Make sure that the blueprints are installed.
This event occurs when error was detected while converting Policy Builder configuration to CPS Rrules when the server restarts or when Publish happens.
Message: Last policy configuration failed with the message: exception stack trace.
Log file is logged with the error message: Exception stack trace is logged.
Correct policy configuration based on the exception.
-
DiameterPeerDown: Diameter peer is down.
Alarm Code: 3001 - DIAMETER_PEER_DOWN
Table 12 DiameterPeerDown Possible Cause
Corrective Action
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of the peer actually being down.
Check the status of the Diameter Peer, and if found down, troubleshoot the peer to return it to service.
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of a network connectivity issue.
Check the status of the Diameter Peer, and if found UP, check the network connectivity between CPS and the Diameter Peer. It should be reachable from both sides.
In case of a down alarm getting generated intermittently followed by a clear alarm, there could be a possibility of an intermittent network connectivity issue.
Check the network connectivity between CPS and the Diameter Peer for intermittent issues and troubleshoot the network connection.
In case of an alarm raised after any recent PB configuration change, there may be a possibility of the PB configurations related to the Diameter Peer being accidently not configured correctly.
-
DiameterAllPeersDown: All diameter peer connections configured in a given realm are DOWN (connection lost). The alarm identifies which realm is down. The alarm is cleared when at least one of the peers in that realm is available.
Alarm Code: 3002 - DIAMETER_ALL_PEERS_DOWN
Table 13 DiameterAllPeersDown Possible Cause
Corrective Action
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of all the peer actually being down.
Check the status of each Diameter Peer, and if found down, troubleshoot each peer to return it to service.
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of a network connectivity issue.
Check the status of the each Diameter Peer, and if found up, check the network connectivity between CPS and each Diameter Peer. It should be reachable from each side.
In case of a down alarm getting generated intermittently followed by a clear alarm, there could be a possibility of an intermittent network connectivity issue.
Check the network connectivity between CPS and the Diameter Peers for intermittent issues and troubleshoot the network connection.
In case of an alarm raised after any recent PB configuration change, there may be a possibility of the PB configurations related to the Diameter Peers being incorrect.
-
DiameterStackNotStarted: This alarm is generated when Diameter stack cannot start on a particular policy director (load balancer) due to some configuration issues.
Alarm Code: 3004 - DIAMETER_STACK_NOT_STARTED
-
SMSC server connection down: SMSC Server is not reachable. This alarm gets generated when any one of the configured active SMSC server endpoints is not reachable and CPS will not be able to deliver a SMS via that SMSC server.
Alarm Code: 5001 - SMSC_SERVER_CONNECTION_STATUS
Table 15 SMSC server connection down Possible Cause
Corrective Action
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of the SMSC Server actually being down.
Check the status of the SMSC Server, and if found down, troubleshoot the server to return it to service.
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of a network connectivity issue.
Check the status of the SMSC Server, and if found up, check the network connectivity between CPS and the Server. It should be reachable from both sides.
In case of a down alarm getting generated intermittently followed by a clear alarm, there could be a possibility of an intermittent network connectivity issue.
Check the network connectivity between CPS and the SMSC Server for intermittent issues and troubleshoot the network connection.
In case of an alarm raised after any recent PB configuration change, there may be a possibility of the PB configurations related to the SMSC Server being incorrect.
-
All SMSC server connections are down: None of the SMSC servers configured are reachable. This Critical Alarm gets generated when the SMSC Server endpoints are not available to submit SMS messages thereby blocking SMS from being sent from CPS.
Alarm Code: 5002 - ALL_SMSC_SERVER_CONNECTION_STATUS
Table 16 All SMSC server connections are down Possible Cause
Corrective Action
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of all the SMSC Servers actually being down.
Check the status of each SMSC Server, and if found down, troubleshoot the servers to return them to service.
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of a network connectivity issue.
Check the status of each SMSC Server, and if found up, check the network connectivity between CPS and each SMSC Server. It should be reachable from each side.
In case of a down alarm getting generated intermittently followed by a clear alarm, there could be a possibility of an intermittent network connectivity issue.
Check the network connectivity between CPS and the SMSC Servers for intermittent issues and troubleshoot the network connection.
In case of an alarm raised after any recent PB configuration change, there may be a possibility of the PB configurations related to the SMSC Servers being incorrect.
-
Email Server not reachable: Email server is not reachable. This alarm (Major) gets generated when any of the configured Email Server Endpoints are not reachable. CPS will not be able to use the server to send emails.
Alarm Code: 5003 - EMAIL_SERVER_STATUS
Table 17 Email server is not reachable Possible Cause
Corrective Action
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of the Email Server actually being down.
Check the status of the Email Server, and if found down, troubleshoot the server to return it to service.
In case of a down alarm being generated but no clear alarm being generated, there could be a possibility of a network connectivity issue.
Check the status of Email Server, and if found up, check the network connectivity between CPS and the Email Server. It should be reachable from both sides.
In case of a down alarm getting generated intermittently followed by a clear alarm, there could be a possibility of an intermittent network connectivity issue.
Check the network connectivity between CPS and the Email Server for intermittent issues and troubleshoot the network connection.
In case of an alarm raised after any recent PB configuration change, there may be a possibility of the PB configurations related to the Email Server being incorrect.
-
All Email servers not reachable: No email server is reachable. This alarm (Critical) gets generated when all configured Email Server Endpoints are not reachable, blocking emails from being sent from CPS.
Alarm Code: 5004 - ALL_EMAIL_SERVER_STATUS
-
MemcachedConnectError: This alarm is generated if attempting to connect to or write to the memcached server causes an exception.
Alarm Code: 1102 - MEMCACHED_CONNECT_ERROR
-
ZeroMQConnectionError: Internal services cannot connect to a required Java ZeroMQ queue. Although retry logic and recovery is available, and core system functions should continue, investigate and remedy the root cause.
Alarm Code: 3501 - ZEROMQ_CONNECTION_ERROR
-
LdapAllPeersDown: All LDAP peers are down.
Alarm Code: 1201 - LDAP_ALL_PEERS_DOWN
Table 21 LdapAllPeersDown Possible Cause
Corrective Action
All LDAP servers are down.
Check if the external LDAP servers are up and if the LDAP server processes are up. If not, bring the servers and the respective server processes up.
Connectivity issues from the LB to LDAP servers.
Check the connectivity from Policy Director (LB) to LDAP server. Check (using ping/telnet) if LDAP server is reachable from Policy Director (LB) VM. If not, fix the connectivity issues.
-
LdapPeerDown: LDAP peer identified by the IP address is down.
Alarm Code: 1202 - LDAP_PEER_DOWN
Table 22 LdapPeerDown Possible Cause
Corrective Action
The mentioned LDAP server in the alarm message is down.
Check if the mentioned external LDAP server is up and if the LDAP server process is up on that server. If not, bring the server and the server processes up.
Connectivity issues from the Policy Director (LB) to the mentioned LDAP server address in the alarm.
Check the connectivity from Policy Director (LB) to mentioned LDAP server. Check (using ping/telnet) if LDAP server is reachable from Policy Director (LB) VM. If not, fix the connectivity issues.
-
ApplicationStartError: This alarm is generated if an installed feature cannot start.
Alarm Code: 1103
-
VirtualInterface Down: This alarm is generated when the internal Policy Director (LB) VIP virtual interface does not respond to a ping.
Alarm Code: 7405
Table 24 VirtualInterface Down Possible Cause
Corrective Action
This alarm is generated when the internal Policy Director (LB) VIP virtual interface does not respond to a ping. Corosync detects this and moves the VIP interface to another Policy Director (LB). The alarm then clears when the other node takes over and a ViritualInterface Up trap is sent.
No action is required since the alarm is cleared automatically as long as a working Policy Director (LB) node gets the VIP address.
This alarm is generated when the internal Policy Director (LB) VIP virtual interface does not respond to a ping and selection of a new VIP hosts fails.
-
Run diagnostics.sh on Cluster Manager as root user to check for any failures on the Policy Director (LB) nodes..
-
Make sure that both policy director nodes are running. If problems are noted, refer to CPS Troubleshooting Guide for further steps required to restore policy director node function problem.
-
After all the policy directors are up, if the trap still does not clear, restart corosync on all policy directors using the monit restart corosync command.
-
-
VM Down: This alarm is generated when the administrator is not able to ping the VM.
Alarm Code: 7401
Table 25 VM Down Possible Cause
Corrective Action
This alarm is generated when a VM listed in the /etc/hosts does not respond to a ping.
-
No Primary DB Member Found: This alarm is generated when the system is unable to find primary member for the replica-set.
Alarm Code: 7101
Table 26 No Primary DB Member Found Possible Cause
Corrective Action
This alarm is generated during mongo failover or when majority of replica-set members are not available.
-
Login to pcrfclient01/02 VM and verify the replica-set status
diagnostics.sh --get_replica_status
-
If the member is not running start the mongo process on each sessionmgr/arbiter VM
For example, e.g /etc/init.d/sessionmgr-port start
Note Change the port number (port) according to your deployment. -
Verify the mongo process, if the process does not come UP then verify the mongo logs for further debugging log.
For example, /var/log/mongodb-port.log Note Change the port number (port) according to your deployment.
-
-
Arbiter Down: This alarm is generated when the arbiter member of the replica-set is not reachable.
Alarm Code: 7103
Table 27 Arbiter Down Possible Cause
Corrective Action
This alarm is generate in the event of abrupt failure of arbiter VM and does not come up due to some unspecified reason (In HA - arbiter VM is pcrfclient01/02 and for GR - third site or based on deployment model).
-
Login to pcrfclient01/02 VM and verify the replica-set status
diagnostics.sh --get_replica_status
-
Login to arbiter VM for which the alarm has generated.
-
Check the status of mongo port for which alarm has generated.
For example, ps –ef | grep 27720
-
If the member is not running, start the mongo process.
For example, /etc/init.d/sessionmgr-27720 start
-
Verify the mongo process, if the process does not come UP then verify the mongo logs for further debugging log.
For example, /var/log/mongodb-port.log Note Change the port number (port) according to your deployment.
-
-
Config Server Down: This alarm is generated when the configuration server for the replica-set is unreachable. This alarm is not valid for non-sharded replica-sets.
Alarm Code: 7104
Table 28 Config Server Down Possible Cause
Corrective Action
This alarm is generated in the event of abrupt failure of configServer VM (when mongo sharding is enabled) and does not come up due to some unspecified reasons.
-
Login to pcrfclient01/02 VM and verify the shard health status
diagnostics.sh --get_shard_health <dbname>
-
Check the status of mongo port for which alarm has generated.
For example, ps –ef | grep 27720
-
If the member is not running, start the mongo process.
For example, /etc/init.d/sessionmgr-27720 start
-
Verify the mongo process, if the process does not come UP then verify the mongo logs for further debugging log.
For example, /var/log/mongodb-port.log Note Change the port number (port) according to your deployment.
-
-
All DB Member of replica set Down: This alarm is generated when the system is not able to connect to any member of the replica-set.
Alarm Code: 7105
Table 29 All DB Member of replica set Down Possible Cause
Corrective Action
This alarm is generated in the event of abrupt failure of all sessionmgr VMs and does not come up due to some unspecified reason or all members are down.
-
Login to pcrfclient01/02 VM and verify the replica-set status
diagnostics.sh --get_replica_status
-
If the member is not running start the mongo process on each sessionmgr/arbiter VM
For example, e.g /etc/init.d/sessionmgr-port start
Note Change the port number (port) according to your deployment. -
Verify the mongo process, if the process does not come UP then verify the mongo logs for further debugging log.
For example, /var/log/mongodb-port.log Note Change the port number (port) according to your deployment.
-
-
DB resync is needed: This alarm is generated whenever a manual resynchronization of a database is required to recover from a failure.
Alarm Code: 7106
Table 30 DB resync is needed Possible Cause
Corrective Action
This alarm is generated whenever a secondary member of replica-set of mongo database does not recover automatically after failure. For example, if sessionmgr VM is down for longer time and after recovery the secondary member does not recover.
-
QNS Process Down: This alarm is generated when Policy Server (QNS) java process is down.
Alarm Code: 7301
Table 31 QNS Process Down Possible Cause
Corrective Action
This alarm is generated if Policy Server (QNS) process on one of the CPS VMs is down.
-
Gx Message processing Dropped: This alarm is generated for Gx Message CCR-I, CCR-U andCCR-T when processing of messages drops below 95% on qnsXX VM.
Alarm Code: 7302
Table 32 Gx Message processing Dropped Possible Cause
Corrective Action
-
Login via Grafana dashboard and check for any Gx message processing trend.
-
Check CPU utilization on all the Policy Server (QNS) VMs via grafana dashboard.
-
Login to pcrfclient01/02 VM and check the mongo database health.
diagnostics.sh --get_replica_status
-
Check for any unusual exceptions in consolidated policy server (qns) and mongo logs.
-
-
Gx Average Message processing Dropped: This alarm is generated for Gx Message CCR-I, CCR-U and CCR-T when average message processing is above 20ms on qnsXX VM.
Alarm Code: 7303
Table 33 Average Gx Message processing Dropped Possible Cause
Corrective Action
-
Login via Grafana dashboard and check for any Gx message processing trend.
-
Check CPU utilization on all the Policy Server (QNS) VMs via grafana dashboard.
-
Login to pcrfclient01/02 VM and check the mongo database health.
diagnostics.sh --get_replica_status
-
Check for any unusual exceptions in consolidated policy server (qns) and mongo logs.
-
-
Percentage of LDAP retry threshold Exceeded: This alarm is generated for LDAP search queries when LDAP retries compared to total LDAP queries exceeds 10% on qnsXX VM.
Alarm Code: 7304
Table 34 Percentage of LDAP retry threshold Exceeded Possible Cause
Corrective Action
Multiple LDAP servers are configured and LDAP servers are down.
-
LDAP Requests as percentage of CCR-I Dropped: This alarm is generated for LDAP operations when LDAP requests as percentage of CCR-I (Gx messages) drops below 25% on qnsXX VM.
Alarm Code: 7305
Table 35 LDAP Requests as percentage of CCR-I Dropped Possible Cause
Corrective Action
-
LDAP Query Result Dropped: This alarm is generated when LDAP Query Result goes to 0 on qnsXX VM.
Alarm Code: 7306
Table 36 LDAP Query Result Dropped Possible Cause
Corrective Action
Multiple LDAP servers are configured and LDAP servers are down.
-
LDAP Request Dropped: This alarm is generated for LDAP operations when LDAP requests drop below 0 on lbXX VM.
Alarm Code: 7307
Table 37 LDAP Request Dropped Possible Cause
Corrective Action
Gx traffic to the CPS system is increased beyond system capacity.
-
Binding Not Available at Policy DRA: This alarm is generated when IPv6 binding for sessions is not found at Policy DRA. Only one notification is sent out whenever this condition is detected.
Alarm Code: 6001
Table 38 Binding Not Available at Policy DRA Possible Cause
Corrective Action
Binding Not Available at Policy DRA
This alarm is generated whenever binding database at Policy DRA is down.
This alarm gets cleared automatically after the time configured in Policy Builder (
is reached.