| DRA_MESSAGE_ PROCESSING_FAILURE_ TPS_EXCEEDED  | Critical  | Message Processing Failure TPS exceeded, current value is {{ $value }}.  | TPS of rejected messages from DRA Director (Any messages with Result code !=2001)  | 
                                 
                                    | Clear  | Message Processing Failure TPS in control.  | 
                                 
                                    | DRA_DIRECTOR_ TPS_EXCEEDED  | Critical  | {{ $labels.instance }} Director TPS exceeded, current value is {{ $value }}.  | Success TPS of Total DRA Director (ResultCode=2001)  | 
                                 
                                    | Clear  | {{ $labels.instance }} Director TPS in control .  | 
                                 
                                    | DRA_WORKER_ TPS_EXCEEDED  | Critical  | {{ $labels.instance }} Worker TPS exceeded, current value is {{ $value }}.  | TPS of Total Worker  | 
                                 
                                    | Clear  | {{ $labels.instance }} Worker TPS in control.  | 
                                 
                                    | DRA_DB_ TPS_EXCEEDED  | Critical  | {{ $labels.instance }} Persistence DB TPS exceeded , current value is {{ $value }}.  | TPS of DB TPS (Query and Update)  | 
                                 
                                    | Clear  | {{ $labels.instance }} Persistence DB TPS in control.  | 
                                 
                                    | DIAMETER_UNABLE _TO_DELIVER_ TPS_EXCEEDED  | Critical  | UNABLE_TO_DELIVER TPS exceeded, current value is {{ $value }}.  | TPS of Diameter 3002  | 
                                 
                                    | Clear  | UNABLE_TO_DELIVER in control.  | 
                                 
                                    | DIAMETER_TRANSIENT _FAILURE_TPS_ EXCEEDED  | Critical  | TRANSIENT_FAILURE TPS exceeded, current value is {{ $value }}.  | TPS of Diameter 4xxx  | 
                                 
                                    | Clear  | TRANSIENT_FAILURE in control.  | 
                                 
                                    | DIAMETER_UNKNOWN _SESSIONS_TPS _EXCEEDED  | Critical  | UNKNOWN_SESSIONS TPS exceeded, current value is {{ $value }}.  | TPS of Diameter 5002  | 
                                 
                                    | Clear  | UNKNOWN_SESSIONS in control.  | 
                                 
                                    | MISMATCH_REQUEST _RESPONSE  | Critical  | {{ $labels.remote_peer }} MISMATCH_REQUEST _RESPONSE exceeded, current value is {{ $value }}.  | Mismatch in Rate of Request and Response (Discrepancy in ingress and egress)  | 
                                 
                                    | Clear  | {{ $labels.remote_peer }} MISMATCH_REQUEST _RESPONSE in control.  | 
                                 
                                    | KEEP_ALIVE_RAR _ROUTING_FAILURE_ TPS_EXCEEDED  | Critical  | Keep Alive RAR TPS exceeded, current value is {{ $value }}.  | TPS of Keep Alive RAR Routing (Stale RAR)  | 
                                 
                                    | Clear  | Keep Alive RAR TPS in control.  | 
                                 
                                    | EGRESS_RATE_ LIMITED_SESSION_ ERR_RESP_TPS_ EXCEEDED  | Critical  | {{ $labels.local_peer }} {{ $labels.remote_peer }} Egress rate limited messages with error response TPS exceeded, current
                                          value is {{ $value }}. 
                                        | TPS of Rate Limited Response for Error  | 
                                 
                                    | Clear  | {{ $labels.local_peer }} {{ $labels.remote_peer }} Egress rate limited messages with error response TPS in control.  | 
                                 
                                    | EGRESS_RATE_ LIMITED_SESSION_ REJECT_TPS_ EXCEEDED  | Critical  | {{ $labels.local_peer }} {{ $labels.remote_peer }} Egress rate limited messages dropped without error TPS exceeded, current
                                          value is {{ $value }}. 
                                        | TPS of Rate Limited Response Rejected  | 
                                 
                                    | Clear  | {{ $labels.local_peer }}{{ $labels.remote_peer }} Egress rate limited messages dropped without error TPS in control.  | 
                                 
                                    | INGRESS_RATE_ LIMITED_SESSION_ ERR_RESP_TPS_ EXCEEDED  | Critical  | {{ $labels.local_peer }} {{ $labels.remote_peer }} Ingress rate limited messages with error response TPS exceeded, current
                                          value is {{ $value }}. 
                                        | TPS of Rate Limited Response Error - Ingress  | 
                                 
                                    | Clear  | {{ $labels.local_peer }}{{ $labels.remote_peer }} Ingress rate limited messages with error response TPS in control.  | 
                                 
                                    | INGRESS_RATE_ LIMITED_SESSION_ REJECT_TPS_ EXCEEDED  | Critical  | {{ $labels.local_peer }} {{ $labels.remote_peer }} Ingress rate limited messages dropped without error response TPS exceeded,
                                          current value is {{ $value }}. 
                                        | TPS of Rate Limited Response Rejected - Ingress  | 
                                 
                                    | Clear  | {{ $labels.local_peer }}{{ $labels.remote_peer }} Ingress rate limited messages dropped without error response TPS in control.
                                          
                                        | 
                                 
                                    | BINDING_STORAGE _ERRORS_TPS_ EXCEEDED  | Critical  | Binding Store Error TPS exceeded, current value is {{ $value }}.  | TPS Binding Storage Errors (Binding storage failed because of high load/any other database error)  | 
                                 
                                    | Clear  | Binding Store Error TPS in control.  | 
                                 
                                    | BINDING_LOOKUP_ ERROR_TPS_ EXCEEDED  | Critical  | Binding Lookup Error TPS exceeded, current value is {{ $value }}.  | TPS Binding Lookup Errors (Binding retrieval failure because of internal error)  | 
                                 
                                    | Clear  | Binding Lookup Error TPS in control.  | 
                                 
                                    | DB_ERR_ TPS_EXCEEDED  | Critical  |  All DB Errors TPS exceeded, current value is {{ $value }}.  | TPS All database errors  | 
                                 
                                    | Clear  | All DB Errors TPS in control.  | 
                                 
                                    | DB_RESPONSE_ TIME_EXCEEDED  | Critical  | {{ $labels.instance }} DB Response Time exceeded, current value is {{ $value }}.  | Response Time Exceeds (Database Query/Update operation time exceeds)  | 
                                 
                                    | Clear  | {{ $labels.instance }} DB Response Time in control, current value is {{ $value }}.  | 
                                 
                                    | BINDING_KEY_ NOT_FOUND_IN_ AAR_TPS_ EXCEEDED  | Critical  | {{ labels.origin_host }} Binding Key not found in AAR TPS exceeded, current value is {{ $value }}.  | TPS Binding Key Not Found in AAR (When AAR received with no "imsi+apn/msisdn/ipv6")  | 
                                 
                                    | Clear  | {{ labels.origin_host }} Binding Key not found in AAR TPS in control.  | 
                                 
                                    | BINDING_KEY_ NOT_FOUND_IN_ CCR_I_TPS_ EXCEEDED  | Critical  | {{ labels.origin_host }} Binding Key not found in CCR(I) TPS exceeded, current value is {{ $value }}.  | TPS Binding Key Not Found in CCR-I(When CCR-I received with no "imsi+apn/msisdn/ipv6"  | 
                                 
                                    | Clear  | {{ labels.origin_host }} Binding Key not found in CCR(I) TPS in control.  | 
                                 
                                    | BINDING_NOT _FOUND_TPS_ EXCEEDED  | Critical  | {{ labels.origin_host }} Binding not found TPS exceeded, current value is {{ $value }}.  | TPS Binding Not Found  | 
                                 
                                    | Clear  | {{ labels.origin_host }} Binding not found TPS in control,.  | 
                                 
                                    | BINDING_DB_ INCONSISTENT_ TPS_EXCEEDED | Critical  | TPS AAR with Result Code 5065 exceeded, current value is {{ $value }}.  | TPS AAR with Result Code 5065  | 
                                 
                                    | Clear  | TPS AAR with Result Code 5065 in control.  | 
                                 
                                    | BINDING_SESSION _DB_SIZE_ EXCEEDED  | Critical  | {{ $labels.db }} size exceeded, current value is {{ $value }}.  | Total Size of Session DB Exceeded  | 
                                 
                                    | Clear  | {{ $labels.db }} size in control.  | 
                                 
                                    | BINDING_IMSI_ APN_DB_SIZE _EXCEEDED  | Critical  | {{ $labels.db }} size exceeded, current value is {{ $value }}.  | Total Size of IMSI / APN DB Exceeded  | 
                                 
                                    | Clear  | {{ $labels.db }} size in control.  | 
                                 
                                    | BINDING_MSISDN _APN_DB_SIZE _EXCEEDED  | Critical  | {{ $labels.db }} size exceeded, current value is {{ $value }}.  | Total Size of MSISDN / APN DB Exceeded  | 
                                 
                                    | Clear  | {{ $labels.db }} size in control  | 
                                 
                                    | BINDING_IPV6 _DB_SIZE_ EXCEEDED  | Critical  | {{ $labels.db }} size exceeded, current value is {{ $value }}.  | Total Size of IPv6 DB Exceeded  | 
                                 
                                    | Clear  | {{ $labels.db }} size in control  | 
                                 
                                    | PEER_TPS _EXCEEDED  | Critical  | {{ $labels.instance }} Peer Connection {{ $labels.local_peer}} {{ $labels.remote_peer }} TPS exceeded, current value is {{
                                          $value }}. 
                                        | Peer TPS Exceeded (Per peer TPS thresholds)  | 
                                 
                                    | Clear  | {{ $labels.instance }} Peer Connection {{ $labels.local_peer}} {{ $labels.remote_peer }} TPS in control.  | 
                                 
                                    | NO_RESPONSE_ PEER_FOR_ ANSWER_TPS _EXCEEDED  | Critical  | {{ $labels.instance }} No Response From Peer Connection TPS exceeded for {{ $labels.message_type}} , current value is {{ $value
                                          }}. 
                                        | TPS No Response From Peer (timeouts from PCRF/any peer)  | 
                                 
                                    | Clear  | {{ $labels.instance }} No Response From Peer Connection TPS in control for {{ $labels.message_type}} .  | 
                                 
                                    | PEER_RESPONSE _TIME_EXCEEDED  | Critical  | message_duration_seconds {type=~"peer_.*"} [labels: type]  | Peer Response Time Exceeded (Response time of peer exceeds)  | 
                                 
                                    | Clear  | Response time in control.  | 
                                 
                                    | NO_PEER_GROUP _MEMBER _AVAILABLE  | Critical  | {{ $labels.peer_group }} not available.  | Peer Group is not Available (All peers in peer_group down)  | 
                                 
                                    | Clear  | {{ $labels.peer_group }} available.  | 
                                 
                                    | PCRF_NOT_CREATING _SESSIONS_TPS _EXCEEDED  | Critical  | Failed CCR-I TPS exceeded, current value is {{ $value }}.  | TPS Rate of Failed CCR-I(ResultCode !=2001)  | 
                                 
                                    | Clear  | Failed CCR-I TPS in control.  | 
                                 
                                    | FORWARDING_LOOP _FOUND_TPS _EXCEEDED  | Critical  | {{ $labels.remote_peer}} Loop Detected TPS exceeded , current value is {{ $value }}.  | TPS Rate of Diameter Message Loop  | 
                                 
                                    | Clear  | {{ $labels.remote_peer }} Loop Detected TPS in control.  | 
                                 
                                    | RELAY_LINK _TPS_GT_0  | Critical  | {{ $labels.remote_peer}} Relay Started, current value is {{ $value }}.  | TPS Rate of Relay Peer > 0 (When relay peers start exchanging control plane messages)  | 
                                 
                                    | Clear  | {{ $labels.remote_peer}} Relay Stated.  | 
                                 
                                    | RELAY_LINK _TPS_EXCEEDED  | Critical  | {{ $labels.remote_peer}} Relay Link TPS exceeded, current value is {{ $value }}.  | TPS Rate of Relay Peer (TPS of relay messages)  | 
                                 
                                    | Clear  | {{ $labels.remote_peer}} Relay Link TPS in control.  | 
                                 
                                    | RELAY_LINK _STATUS  | Critical  | {{ $labels.remote_peer }} Relay Link is Down.  | Relay Link is Down (Relay link status is monitored)  | 
                                 
                                    | Clear  | {{ $labels.remote_peer}} Relay Link is UP.  | 
                                 
                                    | NO_RELAY_PEER _TPS_EXCEEDED  | Critical  | {{ $labels.remote_peer}} Relay Peer TPS exceeded, current value is {{ $value }}.  | TPS Rate of Relay Peer Failure  | 
                                 
                                    | Clear  | {{ $labels.remote_peer}} Relay Peer TPS in control.  | 
                                 
                                    | SESSION_DB_ LIMIT_EXCEEDED | Alert | Session max DB limit reached | This alarm is generated when session database count crosses maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | Clear | Session max DB limit reached alarm cleared | This alarm is cleared when session database count drops below maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | IPV6_DB_ LIMIT_EXCEEDED | Alert | IPv6 max DB limit reached | This alarm is generated when IPv6 database count crosses maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | Clear | IPv6 max DB limit reached alarm cleared | This alarm is cleared when IPv6 database count drops below maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | IPV4_DB_ LIMIT_EXCEEDED | Alert | IPv4 max DB limit reached | This alarm is generated when IPv4 database count crosses maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | Clear | IPv4 max DB limit reached alarm cleared | This alarm is cleared when IPv4 database count drops below maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | IMSIAPN_DB_ LIMIT_EXCEEDED | Alert | ImsiApn max DB limit reached | This alarm is generated when ImsiApn database count crosses maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | Clear | ImsiApn max DB limit reached alarm cleared | This alarm is cleared when ImsiApn database count drops below maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | MSISDNAPN_DB_ LIMIT_EXCEEDED | Alert | MsisdnApn max DB limit reached | This alarm is generated when MsisdnApn database count crosses maximum limit configured using CLI for db-max-record-limit. | 
                                 
                                    | Clear | MsisdnApn max DB limit reached alarm cleared | This alarm is cleared when MsisdnApn database count drops below maximum limit configured using CLI for db-max-record-limit. | 
                                 					
                                 
                                    						
                                    | CRD_CACHE_ LOAD_ERROR | Critical | Error when loading CRD cache | This alarm is generated when CRD is not loaded properly or CRD is  loaded with an error value as “1”. | 
                                 					
                                 
                                    						
                                    | Clear | CRD cache loaded successfully | This alarm is cleared when CRD cache is updated properly with value as “0”. | 
                                 					
                                 
                                    						
                                    | APP_SERVICE_ HEALTH_STATUS_ CRD* | Critical | {{$labels.service}} service is Unhealthy! | This alarm is generated when CRD servcie is unhealthy if value is “1” | 
                                 					
                                 
                                    						
                                    | Clear | {{$labels.service}} service is Healthy. | This alarm is generated when CRD servcie is healthy if value is “0” | 
                                 					
                                 
                                    						
                                    | APP_SERVICE_ HEALTH_STATUS_ METADATA_DB* | Critical | {{$labels.service}} service is Unhealthy! | This alarm is generated when the Metadata DB service is unhealthy if value is “1”  | 
                                 					
                                 
                                    						
                                    | Clear | {{$labels.service}} service is Healthy. | This alarm is generated when the Metadata DB servcie is healthy if value is “0” | 
                                 					
                                 
                                    						
                                    | VIP_NOT_ACTIVE_ ON_PREFERRED* | Critical | VIP {{ $labels.vip }} active on {{ $labels.currentHost }} and not active on preferred {{ $labels.preferredHost }} | This alarm is generated when the VIP is not present in preferred director or distributor. | 
                                 					
                                 
                                    						
                                    | Clear | VIP {{ $labels.vip }} active on preferred {{ $labels.preferredHost }} | This alarms is generated when the VIP is present in preferred director or distributor. | 
                                 					
                                 
                                    						
                                    | PEER_DYNAMIC_ RATE_LIMIT_ THROTTLING* | Critical | Dynamic Rate limit is active | This alarm is generated when any one peer connected to a director is in throttling mode.  sum(peer_dynamic_rate_ limit_throttling) != 0 | 
                                 					
                                 
                                    						
                                    | Clear | Dynamic Rate limit is not active | This alarm is generated when no peer connected to a Director is in throttling mode.  sum(peer_dynamic_rate_ limit_throttling) == 0 | 
                                 					
                                 
                                    						
                                    | NO_DB_CPU_ THRESHOLD_STATUS* | Critical | {{$labels.instance}} is not receiving any threshold message | Director is not receiving any threshold status messages from Worker.  sum(rate(processed_db_ cpu_control_message_ total [30s])) == 0 | 
                                 					
                                 
                                    						
                                    | Clear | {{$labels.instance}} is receiving throttling messages | Director is receiving threshold status messages from Worker.  sum(rate(processed_db_ cpu_control_ message_total [30s])) != 0 | 
                                 					
                                 
                                    						
                                    | QNS_LOGGING_ STOPPED* | Critical | Application logging has stopped on {{$labels.hostname}} at {{$labels.last_updated_time}} with connections closed {{$labels.tcp_closed}} | This alarm is generated when application has stopped logging consolidated-qns logs unexpectedly. 
                                          
                                             | Note | If there is no activity on the system, and the alert is raised it is expected. It is resolved automatically when application
                                                      activity has started.
                                                    |  | 
                                 					
                                 
                                    						
                                    | Clear | Application logging is successful on {{$labels.hostname}} at {{$labels.last_updated_time}} | This alarm is generated when application is successful logging consolidated-qns logs. | 
                                 					
                                 
                                    						
                                    | DRA_PCRF_ QUERY_NODE_ INACTIVE* | Critical | {{$labels.url_endpoint}} is Inactive! | This alarm is generated when PCRF REST endpoint URL hearbeat message fails if value is “1”. | 
                                 					
                                 
                                    						
                                    | Clear | {{$labels.url_endpoint}} is Active | This alarm is generated when PCRF REST endpoint URL hearbeat message is success if value is “0”. | 
                                 					
                                 
                                    						
                                    | DRA_PCRF_ QUERY_TPS_ EXCEEDED* | Critical | {{$labels.instance}} Pcrf Session Query TPS exceeded, current value is {{ $value }} | This alarm is generated when PCRF REST API TPS exceeds if the value is greater than “5”. | 
                                 					
                                 
                                    						
                                    | Clear | {{ $labels.instance }} Pcrf Session Query TPS in control | This alarm is generated when PCRF REST API TPS is under control if the value is less than “5”. | 
                                 					
                                 
                                    						
                                    | RELAY_TRAFFIC_ THRESHOLD_ EXCEEDED* | Critical | Relay traffic exceeded the threshold of 20%. Current value is {{ $value }}% | This alarm is generated if relay traffic exceeds certain % of total traffic. | 
                                 					
                                 
                                    						
                                    | Clear | Relay traffic % is under control | This alarm is generated if relay traffic is under certain % of total traffic. | 
                                 					
                                 
                                    						
                                    | LOCAL_ PUBLISH_ STOPPED* | Critical | Local publish stopped for {{ $labels.instance }} | This alarm is generated if topology is incomplete and global end point is missing. | 
                                 					
                                 
                                    						
                                    | Clear | Local publish started for {{ $labels.instance }} | This alarm is generated if topology is complete and global end point exists. | 
                                 					
                                 
                                    						
                                    | GLOBAL_ PUBLISH_ STOPPED* | Critical | Global publish stopped for {{ $labels.instance }} | This alarm is generated if topology is incomplete and local end point is missing. | 
                                 					
                                 
                                    						
                                    | Clear | Global publish started for {{ $labels.instance }} | This alarm is generated if topology is complete and local end point exists. | 
                                 					
                                 
                                    						
                                    | DIAMETER_ENDPOINTS_ MISSING_ LOST_REDIS* | Critical | Diameter Endpoints missing due to Redis connection lost | This alarms is generated if Diameter endpoint is missing REDIS configuration. | 
                                 					
                                 
                                    						
                                    | Clear | Redis connection restored. Diameter Endpoints are restored | This alarms is generated if REDIS configuration exists in Diameter endpoint. | 
                                 					
                                 
                                    						
                                    | DIAMETER_PEER_ EXPIRATIONS_ EXCEEDED* | Critical | {{$labels.origin_host}} got EXPIRED in {{$labels.system}} | This alarm is generated if any peer has expired. | 
                                 					
                                 
                                    						
                                    | Clear | Peer expiration got reset for {{$labels.origin_host}} | This alarm is generated if the peer expiration is reset. | 
                                 					
                                 
                                    						
                                    | ELASTICSEARCH_NOT_ REACHABLE | Critical | Elasticsearch server is unreachable with status {{$labels.reachable_status}} with tcp connection status {{$labels.tcp_connected}} | This alarm is generated when elasticsearch is not reachable to DRA or the TCP connections are not healthy. | 
                                 					
                                 
                                    						
                                    | Clear | Elasticsearch server is reachable now !!! | This alarms is generated when the elasticsearch is reachable to DRA or the TCP connections are healthy. | 
                                 					
                                 
                                    						
                                    | TLS_CERT_EXPIRY | Critical, Major, and Minor |  certificate will expire in {{$value}} days! | This alarm monitors the expiry date for TLS certificate. |