Ultra Cloud Core 5G Network Function Repository Function Configuration and Administration Guide, Release 2026.01

Table 1. Summary Data
Applicable Product(s) or Functional Area	5G-NRF
Applicable Platform(s)	SMI
Feature Default Setting	Disabled - Configuration Required
Related Changes in this Release	Not Applicable
Related Documentation	Not Applicable

Table 2. Revision History
Revision Details	Release
First introduced.	2026.01

Configuring Alert Rules

To configure the alert rules, use the following sample configuration:

config 
   alerts rules group alert_group_name 
   interval-seconds seconds 
   rule rule_name 
      expression promql_expression 
      duration duration 
      severity severity_level 
      type alert-type 
      annotation annotation_name 
      value annotation_value 
      end

NOTES:

alerts rules group alert_group_name : Specifiy the Prometheus alerting rules group. One alert group can have multiple lists of rules. alert_group_name is the name of the alert group as a string in the range of 0–64 characters.
interval-seconds seconds : Specify the evaluation interval of the rule group in seconds.
rule rule_name : Specify the alerting rule definition. rule_name is the name of the rule.
expression promql_expression : Specify the PromQL alerting rule expression. promql_expression is the alert rule query expressed in PromQL syntax.
duration duration : Specify the duration of a true condition before it is considered true. duration is the time interval before the alert is triggered.
severity severity_level : Specify the severity of the alert. severity_level can be configured as critical, major, minor, and warning.
type alert_type : Specify the type of the alert. alert_type is the user-defined alert type. For example, Communications Alarm, Environmental Alarm, Equipment Alarm, Indeterminate Integrity Violation Alarm, Operational Violation Alarm, Physical Violation Alarm, Processing Error Alarm, Quality of Service Alarm, Security Service Alarm, Mechanism Violation Alarm, or Time Domain Violation Alarm.
annotation annotation_name : Specify the annotation to attach to the alerts. annotation_name is the name of the annotation.
value annotation_value : Specify the annotation value. annotation_value is the value of the annotation.

Viewing Alert Logger

The Alert Logger stores all the generated alerts by default. You can view the stored alerts using the following show command.

show alert history [ filtering ]

You can narrow down the result using the following filtering options:

annotations: Specifies the annotations of the alert.
endsAt: Specifies the end time of the alert.
labels: Specifies the additional labels of the alert.
severity: Specifies the severity of the alert.
source: Specifies the source of the alert.
startsAt: Specifies the start time of the alert.
type: Specifies the type of the alert.

You can view the active and silenced alerts with the show alerts active and show alerts active commands.

Alarms

Rules are added at CEE as per the NRF alarms that requires the metrics provided by NRF and App-infra.

The following sections provide details of alarms that are supported by NRF.

Incoming TPS is greater than 50% of Max TPS


Severity	Description
Info	If Avg Incoming TPS for last 10mins is greater than 50% of Max TPS


Alert Rules
alerts rules group INCMSGTPS rule INCTPS50Perc duration 10m label name value INCTPS50PERC ;exit;severity warning expression "sum(irate(incoming_request_total{service_name=\"nrf-rest- ep\",protocol=\"http\"}[30s])) by (service_name, protocol) >= (0.5 * MAX_TPS)" type Quality\ Of\ Service\ Alarm annotation summary value "Incoming Messages TPS {{ printf \"%f\" $value }} for last 10min" Note: MAX_TPS depends on environment & is derived after performance evaluation

Incoming TPS is greater than 75% of Max TPS


Severity	Description
Minor	If Avg Incoming TPS for last 5mins is greater than 75% of Max TPS


Alert Rules
alerts rules group INCMSGTPS rule INCTPS75Perc duration 5m label name value INCTPS75PERC ;exit;severity minor expression "sum(irate(incoming_request_total{service_name=\"nrf-rest- ep\",protocol=\"http\"}[30s])) by (service_name, protocol) >= (0.75 * MAX_TPS)" type Quality\ Of\ Service\ Alarm annotation summary value "Incoming Messages TPS {{ printf \"%f\" $value }} for last 5min" Note: MAX_TPS depends on environment & is derived after performance evaluation

Incoming TPS is greater than 90% of Max TPS


Severity	Description
Major	If Avg Incoming TPS for last 1mins is greater than 90% of Max TPS


Alert Rules
alerts rules group INCMSGTPS rule INCTPS90Perc duration 1m label name value INCTPS90PERC ;exit;severity major expression "sum(irate(incoming_request_total{service_name=\"nrf-rest- ep\",protocol=\"http\"}[30s])) by (service_name, protocol) >= (0.9 * MAX_TPS)" type Quality\ Of\ Service\ Alarm annotation summary value "Incoming Messages TPS {{ printf \"%f\" $value }} for last 1min" Note: MAX_TPS depends on environment & is derived after performance evaluation

Incoming TPS is greater than 95% of Max TPS


Severity	Description
Critical	If Avg Incoming TPS for last 1mins is greater than 95% of Max TPS


Alert Rules
alerts rules group INCMSGTPS rule INCTPS95Perc duration 1m label name value INCTPS95PERC ;exit;severity critical expression "sum(irate(incoming_request_total{service_name=\"nrf-rest- ep\",protocol=\"http\"}[30s])) by (service_name, protocol) >= (0.95 * MAX_TPS)" type Quality\ Of\ Service\ Alarm annotation summary value "Incoming Messages TPS {{ printf \"%f\" $value }} for last 1min" Note: MAX_TPS depends on environment & is derived after performance evaluation

Error rate (per Incoming message type) is 1%


Severity	Description
Info	If Error rate (per Incoming message type) is 1%


Alert Rules	Description
Total Errors: alerts rules group INCMSGERR rule INCMSGERR1Perc duration 10m label name value INCMSGERR1PERC ;exit;severity warning expression "( (sum(outgoing_response_total{service_nam e=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"n rf-rest-ep\",protocol=\"http\"})) ) * 100 > 1.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Incoming Messages) Rate {{ printf \"%f\" $value }} for last 10min" Per Message Type: alerts rules group INCMSGERR rule RegReqERR1Perc duration 10m label name value REGREQERR1PERC ;exit;severity warning expression "( (sum(outgoing_response_msg_total{service _name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistr ationRequest\"})) / (sum (incoming_request_msg_total{service_nam e=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"} )) ) * 100 > 1.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (NFRegistrationRequest) Rate {{ printf \"%f\" $value }} for last 10min"	Following are the Messages types to be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type NFDiscoveryRequest NFGetRequest NFRegistrationRequest NFUpdateRequest NFDeregistrationRequest NFCreateSubscriptionRequest NFRemoveSubscriptionRequest NFUpdateSubscriptionRequest Note: Alert rule is based on the total errors & total incoming messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Incoming Messages Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_total{service_na me=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"}[30s]))) / (sum(rate(incoming_request_total{service_nam e=\"nrf-rest-ep\",protocol=\"http\"}[30s]))) ) * 100 > 1.0" Incoming Messages (per Type) Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_msg_total{servic e_name=\"nrf-rest-


Alert Rules	Description
	ep\",status=\"error\",msg_type=\"NFRegistratio nRequest\"}[30s]))) / (sum(rate(incoming_request_msg_total{service _name=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"}[30s ]))) ) * 100 > 1.0"

Error rate (per Incoming message type) is 10%


Severity	Description
Minor	If Error rate (per Incoming message type) is 10%


Alert Rules	Description
Total Errors: alerts rules group INCMSGERR rule INCMSGERR10Perc duration 5m label name value INCMSGERR10PERC ;exit;severity minor expression "( (sum(outgoing_response_total{service_nam e=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"n rf-rest-ep\",protocol=\"http\"})) ) * 100 > 10.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Incoming Messages) Rate {{ printf \"%f\" $value }} for last 5min" Per Message Type: alerts rules group INCMSGERR rule RegReqERR10Perc duration 5m label name value REGREQERR10PERC ;exit;severity warning expression "( (sum(outgoing_response_msg_total{service _name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistr ationRequest\"})) / (sum (incoming_request_msg_total{service_nam e=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"} )) ) * 100 > 10.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (NFRegistrationRequest) Rate {{ printf \"%f\" $value }} for last 5min"	Following are the Messages types to be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type NFDiscoveryRequest NFGetRequest NFRegistrationRequest NFUpdateRequest NFDeregistrationRequest NFCreateSubscriptionRequest NFRemoveSubscriptionRequest NFUpdateSubscriptionRequest Note: Alert rule is based on the total errors & total incoming messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Incoming Messages Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_total{service_na me=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"}[30s]))) / (sum(rate(incoming_request_total{service_nam e=\"nrf-rest-ep\",protocol=\"http\"}[30s]))) ) * 100 > 10.0" Incoming Messages (per Type) Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_msg_total{servic e_name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistratio nRequest\"}[30s]))) / (sum(rate(incoming_request_msg_total{service _name=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"}[30s ]))) ) * 100 > 10.0"

Error rate (per Incoming message type) is 25%


Severity	Description
Major	If Error rate (per Incoming message type) is 25%


Alert Rules	Description
Total Errors: alerts rules group INCMSGERR rule INCMSGERR25Perc duration 5m label name value INCMSGERR25PERC ;exit;severity major expression "( (sum(outgoing_response_total{service_nam e=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"n rf-rest-ep\",protocol=\"http\"})) ) * 100 > 25.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Incoming Messages) Rate {{ printf \"%f\" $value }} for last 5min" Per Message Type: alerts rules group INCMSGERR rule RegReqERR25Perc duration 5m label name value REGREQERR25PERC ;exit;severity warning expression "( (sum(outgoing_response_msg_total{service _name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistr ationRequest\"})) / (sum (incoming_request_msg_total{service_nam e=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"} )) ) * 100 > 25.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (NFRegistrationRequest)	Following are the Messages types to be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type NFDiscoveryRequest NFGetRequest NFRegistrationRequest NFUpdateRequest NFDeregistrationRequest NFCreateSubscriptionRequest NFRemoveSubscriptionRequest NFUpdateSubscriptionRequest Note: Alert rule is based on the total errors & total incoming messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Incoming Messages Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_total{service_na me=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"}[30s]))) / (sum(rate(incoming_request_total{service_nam e=\"nrf-rest-ep\",protocol=\"http\"}[30s]))) ) * 100 > 25.0" Incoming Messages (per Type) Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_msg_total{servic


Alert Rules	Description
Rate {{ printf \"%f\" $value }} for last 5min"	e_name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistratio nRequest\"}[30s]))) / (sum(rate(incoming_request_msg_total{service _name=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"}[30s ]))) ) * 100 > 25.0"

Error rate (per Incoming message type) is 50%


Severity	Description
Critical	If Error rate (per Incoming message type) is 50%


Alert Rules	Description
Total Errors: alerts rules group INCMSGERR rule INCMSGERR50Perc duration 5m label name value INCMSGERR50PERC ;exit;severity warning expression "( (sum(outgoing_response_total{service_nam e=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"n rf-rest-ep\",protocol=\"http\"})) ) * 100 > 50.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Incoming Messages) Rate {{ printf \"%f\" $value }} for last 5min" Per Message Type: alerts rules group INCMSGERR rule RegReqERR50Perc duration 5m label name value REGREQERR50PERC ;exit;severity warning expression "( (sum(outgoing_response_msg_total{service _name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistr ationRequest\"})) / (sum (incoming_request_msg_total{service_nam e=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"} )) ) * 100 > 50.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (NFRegistrationRequest) Rate {{ printf \"%f\" $value }} for last 5min"	Following are the Messages types to be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type NFDiscoveryRequest NFGetRequest NFRegistrationRequest NFUpdateRequest NFDeregistrationRequest NFCreateSubscriptionRequest NFRemoveSubscriptionRequest NFUpdateSubscriptionRequest Note: Alert rule is based on the total errors & total incoming messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Incoming Messages Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_total{service_na me=\"nrf-rest- ep\",status=\"error\",protocol=\"http\"}[30s]))) / (sum(rate(incoming_request_total{service_nam e=\"nrf-rest-ep\",protocol=\"http\"}[30s]))) ) * 100 > 50.0" Incoming Messages (per Type) Error rate (calculated for last 30s): "( (sum(rate(outgoing_response_msg_total{servic e_name=\"nrf-rest- ep\",status=\"error\",msg_type=\"NFRegistratio nRequest\"}[30s]))) / (sum(rate(incoming_request_msg_total{service _name=\"nrf-rest- ep\",msg_type=\"NFRegistrationRequest\"}[30s ]))) ) * 100 > 50.0"

Error rate (per outgoing message type) is 1%


Severity	Description
Info	If Error rate (per outgoing message type) is 1%


Alert Rules	Description
Total Errors: alerts rules group OUTMSGERR rule OUTMSGERR1Perc duration 10m label name value OUTMSGERR1PERC ;exit;severity warning expression "(sum(rpc_response_total{service_name=\"nr f-rest- ep\",interface=\"Rest\",status_code!~\"2[0- 9]{2}\"}) / sum(rpc_request_total{service_name=\"nrf- rest-ep\",interface=\"Rest\"})) * 100 > 1.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Out going Messages) Rate {{ printf \"%f\" $value }} for last 10min" Per Message Type: alerts rules group OUTMSGERR rule StatNotifERR1Perc duration 10m label name value StatNotifERR1Perc ;exit;severity warning expression "(sum(rpc_response_total{service_name=\"nr f-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0-9]{2}\"}) / sum(rpc_request_total{service_name=\"nrf- rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"})) * 100 > 1.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response	Currently the supported outgoing message is NFStatusNotifyRequest. If more messages are going be supported, then those Messages types can be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type Note: Alert rule is based on the total errors & total outgoing messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Outgoing Messages Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest-ep\",interface=\"Rest\"}[30s]))) * 100 > 1.0" Outgoing Messages (per Type) Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest-


Alert Rules	Description
(NFStatusNotifyRequest) Rate {{ printf \"%f\" $value }} for last 10min"	ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"}[30s]))) * 100 > 1.0"

Error rate (per outgoing message type) is 10%


Severity	Description
Minor	If Error rate (per outgoing message type) is 10%


Alert Rules	Description
Total Errors: alerts rules group OUTMSGERR rule OUTMSGERR10Perc duration 5m label name value OUTMSGERR10PERC ;exit;severity minor expression "( (sum(outgoing_response_total{service_name =\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"nrf- rest-ep\",protocol=\"http\"})) ) * 100 > 10.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Outgoing Messages) Rate {{ printf \"%f\" $value }} for last 5min" Per Message Type: alerts rules group OUTMSGERR rule StatNotifERR10Perc duration 5m label name value StatNotifERR10Perc ;exit;severity minor expression "(sum(rpc_response_total{service_name=\"nr f-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0-9]{2}\"}) / sum(rpc_request_total{service_name=\"nrf- rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"})) * 100 > 10.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (NFStatusNotifyRequest) Rate {{ printf \"%f\" $value }} for last 5min"	Currently the supported outgoing message is NFStatusNotifyRequest. If more messages are going be supported, then those Messages types can be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type Note: Alert rule is based on the total errors & total outgoing messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Outgoing Messages Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest-ep\",interface=\"Rest\"}[30s]))) * 100 > 10.0" Outgoing Messages (per Type) Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"}[30s]))) * 100 > 10.0"

Error rate (per outgoing message type) is 25%


Severity	Description
Major	If Error rate (per outgoing message type) is 25%


Alert Rules	Description
Total Errors: alerts rules group OUTMSGERR rule OUTMSGERR25Perc duration 5m label name value OUTMSGERR25PERC ;exit;severity major expression "( (sum(outgoing_response_total{service_name =\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"nrf- rest-ep\",protocol=\"http\"})) ) * 100 > 25.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Outgoing Messages) Rate {{ printf \"%f\" $value }} for last 5min" Per Message Type: alerts rules group OUTMSGERR rule StatNotifERR25Perc duration 5m label name value StatNotifERR25Perc ;exit;severity major expression "(sum(rpc_response_total{service_name=\"nr f-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0-9]{2}\"}) / sum(rpc_request_total{service_name=\"nrf- rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"})) * 100 > 25.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response	Currently the supported outgoing message is NFStatusNotifyRequest. If more messages are going be supported, then those Messages types can be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type Note: Alert rule is based on the total errors & total outgoing messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Outgoing Messages Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest-ep\",interface=\"Rest\"}[30s]))) * 100 > 25.0" Outgoing Messages (per Type) Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus


Alert Rules	Description
(NFStatusNotifyRequest) Rate {{ printf \"%f\" $value }} for last 5min"	NotifyRequest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"}[30s]))) * 100 > 25.0"

Error rate (per outgoing message type) is 50%


Severity	Description
Critical	If Error rate (per outgoing message type) is 50%


Alert Rules	Description
Total Errors: alerts rules group OUTMSGERR rule OUTMSGERR50Perc duration 5m label name value OUTMSGERR50PERC ;exit;severity critical expression "( (sum(outgoing_response_total{service_name =\"nrf-rest- ep\",status=\"error\",protocol=\"http\"})) / (sum (incoming_request_total{service_name=\"nrf- rest-ep\",protocol=\"http\"})) ) * 100 > 50.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (Outgoing Messages) Rate {{ printf \"%f\" $value }} for last 5min" Per Message Type: alerts rules group OUTMSGERR rule StatNotifERR50Perc duration 5m label name value StatNotifERR50Perc ;exit;severity critical expression "(sum(rpc_response_total{service_name=\"nr f-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0-9]{2}\"}) / sum(rpc_request_total{service_name=\"nrf- rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"})) * 100 > 50.0" type Quality\ Of\ Service\ Alarm annotation summary value "Error Response (NFStatusNotifyRequest) Rate {{ printf \"%f\" $value }} for last 5min"	Currently the supported outgoing message is NFStatusNotifyRequest. If more messages are going be supported, then those Messages types can be used for per Message Type Error Rate. Make sure rule name & label is different for each rule of message type Note: Alert rule is based on the total errors & total outgoing messages counters calculated till the point To define the rule for error rate calculated over a period use below expression in the alert rules: Total Outgoing Messages Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest-ep\",interface=\"Rest\"}[30s]))) * 100 > 50.0" Outgoing Messages (per Type) Error rate (calculated for last 30s): "( sum(rate(rpc_response_total{service_name=\ "nrf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\",status_code!~\"2[0- 9]{2}\"}[30s])) / sum(rate(rpc_request_total{service_name=\"n rf-rest- ep\",interface=\"Rest\",msg_type=\"NFStatus NotifyRequest\"}[30s]))) * 100 > 50.0"

CPU usage is greater than 50%


Severity	Description
Info	If Avg CPU usage for last 10mins is greater than 50%


Alert Rules	Description
alerts rules group CPUUSG rule NrfRestEp0CPUUSG50Perc duration 10m label name value NRFRESTEP0_CPUUSG50PERC ;exit;severity warning expression "cpu_percent{service_name=\"nrf-rest- ep\",instance_id=\"0\"} > 50.0" type Quality\ Of\ Service\ Alarm annotation summary value "CPU Usage {{ printf \"%f\" $value }} for last 10min" Note: CPU Usage alert is for each POD i.e. resources are per POD level. for NRF Service pod=> Service name is nrf- service Make sure rule name & labels are different for each pod	POD Name & Instance Id mapping: NRF pods have naming convention of <service-name>-n<node-id>-<replica-id>, for example,: nrf-service-n0-0, nrf-rest-ep-n0-1, nrf- rest-ep-n2-1 etc InstanceId is sum of <node-id> + <replica-id> e.g: nrf-service-n0-0 => InstanceId = 0 + 0 = 0 nrf-service-n0-1 => InstanceId = 0 + 1 = 1 nrf-service-n2-0 => InstanceId = 2 + 0 = 2 Note: Node-Ids increment with a period of number of replicas i.e. if replicas for a deployment is 2 & number of nodes are 2 then nodes-ids are 0, 2

CPU usage is greater than 75%


Severity	Description
Minor	If Avg CPU usage for last 5mins is greater than 75%


Alert Rules	Description
alerts rules group CPUUSG rule NrfRestEp0CPUUSG75Perc duration 5m label name value NRFRESTEP0_CPUUSG75PERC ;exit;severity minor expression "cpu_percent{service_name=\"nrf-rest- ep\",instance_id=\"0\"} > 75.0" type Quality\ Of\ Service\ Alarm annotation summary value "CPU Usage {{ printf \"%f\" $value }} for last 5min" Note: CPU Usage alert is for each POD i.e. resources are per POD level. for NRF Service pod=> Service name is nrf- service Make sure rule name & labels are different for each pod	POD Name & Instance Id mapping: NRF pods have naming convention of <service-name>-n<node-id>-<replica-id>, for example,: nrf-service-n0-0, nrf-rest-ep-n0-1, nrf- rest-ep-n2-1 etc InstanceId is sum of <node-id> + <replica- id> e.g: nrf-service-n0-0 => InstanceId = 0 + 0 = 0 nrf-service-n0-1 => InstanceId = 0 + 1 = 1 nrf-service-n2-0 => InstanceId = 2 + 0 = 2 Note: Node-Ids increment with a period of number of replicas i.e. if replicas for a deployment is 2 & number of nodes are 2 then nodes-ids are 0, 2

CPU usage is greater than 90%


Severity	Description
Major	If Avg CPU usage for last 1mins is greater than 90%


Alert Rules	Description
alerts rules group CPUUSG rule NrfRestEp0CPUUSG90Perc duration 1m label name value NRFRESTEP0_CPUUSG90PERC ;exit;severity major expression "cpu_percent{service_name=\"nrf-rest- ep\",instance_id=\"0\"} > 90.0" type Quality\ Of\ Service\ Alarm annotation summary value "CPU Usage {{ printf \"%f\" $value }} for last 1min" Note: CPU Usage alert is for each POD i.e. resources are per POD level. for NRF Service pod=> Service name is nrf- service Make sure rule name & labels are different for each pod	POD Name & Instance Id mapping: NRF pods have naming convention of <service-name>-n<node-id>-<replica-id>, for example,: nrf-service-n0-0, nrf-rest-ep-n0-1, nrf- rest-ep-n2-1 etc InstanceId is sum of <node-id> + <replica- id> e.g: nrf-service-n0-0 => InstanceId = 0 + 0 = 0 nrf-service-n0-1 => InstanceId = 0 + 1 = 1 nrf-service-n2-0 => InstanceId = 2 + 0 = 2 Note: Node-Ids increment with a period of number of replicas i.e. if replicas for a deployment is 2 & number of nodes are 2 then nodes-ids are 0, 2

CPU usage is greater than 95%


Severity	Description
Critical	If Avg CPU usage for last 1mins is greater than 95%


Alert Rules	Description
alerts rules group CPUUSG rule NrfRestEp0CPUUSG95Perc duration 1m label name value NRFRESTEP0_CPUUSG95PERC ;exit;severity critical expression "cpu_percent{service_name=\"nrf-rest- ep\",instance_id=\"0\"} > 95.0" type Quality\ Of\ Service\ Alarm annotation summary value "CPU Usage {{ printf \"%f\" $value }} for last 1min" Note: CPU Usage alert is for each POD i.e. resources are per POD level. for NRF Service pod=> Service name is nrf- service Make sure rule name & labels are different for each pod	POD Name & Instance Id mapping: NRF pods have naming convention of <service-name>-n<node-id>-<replica-id>, for example,: nrf-service-n0-0, nrf-rest-ep-n0-1, nrf- rest-ep-n2-1 etc InstanceId is sum of <node-id> + <replica- id> e.g: nrf-service-n0-0 => InstanceId = 0 + 0 = 0 nrf-service-n0-1 => InstanceId = 0 + 1 = 1 nrf-service-n2-0 => InstanceId = 2 + 0 = 2 Note: Node-Ids increment with a period of number of replicas, that is, if replicas for a deployment is 2 & number of nodes are 2 then nodes-ids are 0, 2

Memory usage is greater than 50% of Memory Limit


Severity	Description
Info	If Memory usage for last 10mins is greater than 50% of Memory Limit


Alert Rules	Description
alerts rules group MEMUSG rule NrfRestEp0MEMUSG50Perc duration 10m label name value NRFRESTEP0_MEMUSG50PERC ;exit;severity warning expression "((mem_usage_kb{service_name=\"nrf-rest- ep\",instance_id=\"0\"}/1024)/MEMORY_LIMIT_KB) * 100 > 50.0" type Quality\ Of\ Service\ Alarm annotation summary value "Memory Usage {{ printf \"%f\" $value }} for last 10min" Note: Memory Usage alert is for each POD, that is, resources are per POD level. for NRF Service pod=> Service name is nrf-service Make sure rule name & labels are different for each pod	At present, no Memory Limit is given for NRF PODs i.e. there is no limit and it depends on the available memory at the worker node at run time. E.g. If worker node has 1GB memory and 20% is used for its own functionality, then 80% of Memory ia available for the PODs deployed on worker node. In case of no memory limt, if 1 POD is deployed, then it can use complete memory. For alerts case, you can provide the MEMORY_LIMIT_KB depends on the environment, that is available memory for the NRF POD on that host

Memory usage is greater than 75% of Memory Limit


Severity	Description
Minor	If Memory usage for last 5mins is greater than 75% of Memory Limit


Alert Rules
alerts rules group MEMUSGUSG rule NrfRestEp0MEMUSG75Perc duration 5m label name value NRFRESTEP0_MEMUSG75PERC ;exit;severity minor expression "((mem_usage_kb{service_name=\"nrf-rest- ep\",instance_id=\"0\"}/1024)/MEMORY_LIMIT_KB) * 100 > 75.0" type Quality\ Of\ Service\ Alarm annotation summary value "Memory Usage {{ printf \"%f\" $value }} for last 5min" Note: Memory Usage alert is for each POD, that is, resources are per POD level.


Alert Rules
for NRF Service pod=> Service name is nrf-service Make sure rule name & labels are different for each pod

Memory usage is greater than 90% of provided Memory


Severity	Description
Major	If Memory usage for last 1mins is greater than 90% of provided Memory


Alert Rules
alerts rules group MEMUSG rule NrfRestEp0MEMUSG90Perc duration 1m label name value NRFRESTEP0_MEMUSG90PERC ;exit;severity major expression "((mem_usage_kb{service_name=\"nrf-rest- ep\",instance_id=\"0\"}/1024)/MEMORY_LIMIT_KB) * 100 > 90.0" type Quality\ Of\ Service\ Alarm annotation summary value "Memory Usage {{ printf \"%f\" $value }} for last 1min" Note: Memory Usage alert is for each POD, that is, resources are per POD level. for NRF Service pod=> Service name is nrf-service Make sure rule name & labels are different for each pod

Memory usage is greater than 95% of provided Memory


Severity	Description
Critical	If Memory usage for last 1mins is greater than 95% of provided Memory


Alert Rules
alerts rules group MEMUSG rule NrfRestEp0MEMUSG95Perc duration 1m label name value NRFRESTEP0_MEMUSG95PERC ;exit;severity critical expression


Alert Rules
"((mem_usage_kb{service_name=\"nrf-rest- ep\",instance_id=\"0\"}/1024)/MEMORY_LIMIT_KB) * 100 > 95.0" type Quality\ Of\ Service\ Alarm annotation summary value "Memory Usage {{ printf \"%f\" $value }} for last 1min" Note: Memory Usage alert is for each POD, that is, resources are per POD level. for NRF Service pod=> Service name is nrf-service Make sure rule name & labels are different for each pod

NF Profiles count reach 50% of CDL capacity


Severity	Description
Info	If NF profiles is greater than 50%, it is an indication of growing number of profiles


Alert Rules
alerts rules group NFPROFCNT rule NFPROFCNT50Perc duration 10m label name value NFPROFCNT50PERC ;exit;severity warning expression "sum(avg(nrf_profiles_total{service_name=\"nrf-service\"}) by (nf_type)) >= (0.5 * MAX_NF_PROF_CNT)" type Quality\ Of\ Service\ Alarm annotation summary value "Number of NF Profiles {{ printf \"%f\" $value }} for last 10min" Note: MAX_NF_PROF_CNT depends on environment, that is, Maximum CDL capacity

NF Profiles count reach 85% of CDL capacity


Severity	Description
Minor	If NF profiles is greater than 85%, it is a minor fault of growing number of profiles.


Alert Rules
alerts rules group NFPROFCNT rule NFPROFCNT85Perc duration 5m label name value NFPROFCNT85PERC ;exit;severity minor expression "sum(avg(nrf_profiles_total{service_name=\"nrf-service\"}) by (nf_type)) >= (0.85 * MAX_NF_PROF_CNT)" type Quality\ Of\ Service\ Alarm annotation summary value "Number of NF Profiles {{ printf \"%f\" $value }} for last 5min" Note: MAX_NF_PROF_CNT depends on environment, that is, Maximum CDL capacity

NF Profiles count reach 90% of CDL capacity


Severity	Description
Major	If NF profiles is greater than 90%, a major fault is required to look into the deployment for further actions, for example, scaling


Alert Rules
alerts rules group NFPROFCNT rule NFPROFCNT90Perc duration 3m label name value NFPROFCNT90PERC ;exit;severity major expression "sum(avg(nrf_profiles_total{service_name=\"nrf-service\"}) by (nf_type)) >= (0.9 * MAX_NF_PROF_CNT)" type Quality\ Of\ Service\ Alarm annotation summary value "Number of NF Profiles {{ printf \"%f\" $value }} for last 3min" Note: MAX_NF_PROF_CNT depends on environment, that is, Maximum CDL capacity

NF Profiles count reach 95% of CDL capacity


Severity	Description
Critical	If NF profiles is greater than 95%, a critical fault is required to look into the deployment for further actions, for example, scaling


Alert Rules
alerts rules group NFPROFCNT rule NFPROFCNT95Perc duration 1m label name value NFPROFCNT95PERC ;exit;severity critical expression "sum(avg(nrf_profiles_total{service_name=\"nrf-service\"}) by (nf_type)) >= (0.95 * MAX_NF_PROF_CNT)" type Quality\ Of\ Service\ Alarm annotation summary value "Number of NF Profiles {{ printf \"%f\" $value }} for last 1min" Note: MAX_NF_PROF_CNT depends on environment, that is, Maximum CDL capacity

POD connectivity or Status failure


Severity	Description
Critical	If any inter POD connectivity is failed, for example, rest-ep to service, service to cdl etc.


Alert Rules
alerts rules group SERVICE_DOWN rule XXXSERVICE_DOWN1 duration 1s label name value "SERVICE_DOWN" ;exit;severity critical expression "sum(endpoint_status{ep_name=~\"internal-ipc.ep\"}) by (service_name) == 0" type Quality\ Of\ Service\ Alarm annotation summary value "{{ $labels.service_name }} is DOWN" Note: Alert will be raised if any service is down and unable to serve any requests from other PODS. It is based on the POD internal rpc status. Alert will be raised for each service required for NRF, that is, nrf-service, nrf-rest-ep, cache-pod, datastore-ep, datastore-index, datastore-slot, oam-pod*

Bias-Free Language

Results

Chapter: Application-based Alerts

Application-based Alerts

Feature Summary

Summary Data

Revision History

Feature Description

How it Works