Key Performance Indicators (KPIs)

The following section describes KPIs.

ETCD/Cachepod Replication KPIs

The following table lists ETCD/Cachepod Replication KPIs.

geo_replication_total KPIs

KPI Name

Description

Labels

Possible Values

geo_replication

_total

This KPI displays total number of replication requests/responses for various Sync types and Replication types.

ReplicationRequest

Type

Request / Response

ReplicationSync

Type

Immediate / Deferred / Pull

ReplicationNode

ETCD / CACHE_POD / PEER

ReplicationReceiver

Local / Remote

status

True / False

status_code

Error code/description

Geo Rejected Role Change KPIs

The following table lists Geo Rejected Role Change KPIs.

Geo Rejected Role Change KPIs

KPI Name

Description

Labels

Possible Values

geo_RejectedRole

Changed_total

This KPI displays the total number of rejected requests/calls received for STANDBY instance. After the count, the same instance is moved to PRIMARY.

RejectedCount

Number value indicating rejected calls/requests received for standby instance.

GRInstance

Number

1 / 2

Monitoring KPIs

The following table lists monitoring KPIs.

geo_monitoring_total KPIs

KPI Name

Description

Labels

Possible Values

geo_monitoring

_total

This KPI displays the total number of successful / failure messages of different kinds such as, heartbeat / remoteNotify / TriggerGR and so on.

ControlAction

Type

AdminMonitoring

ActionType / AdminRemote

MessageAction

Type / AdminRole

ChangeActionType

ControlAction

NameType

MonitorPod / MonitorBfd /

RemoteMsgHeartbeat /

RemoteMsgNotifyFailover /

RemoteMsgNotify

PrepareFailover /

RemoteMsgGetSiteStatus /

RemoteClusterPodFailure /

RemoteSiteRole

Monitoring /

TriggerGRApi /

ResetRoleApi

Admin Node

Any string value. For example, GR Instance ID or instance key / pod name

Status Code

0 / 1001 / 1002 / 1003 /

1004 / 1005 / 1006 / 1007 /

1008 / received error code (1206, 1219, 2404, …)

Status Message

Success (0) /

STANDBY_ERROR => STANDBY/STANDBY => PRIMARY (0) / Pod Failure (0) /

CLI (0) / BFD Failure (0) /

Decode Failure (1001) /

remote status unavailable (1002) /

target role does not support (1002) /

Pod Failure (1002) /

CLI (1002) / BFD Failure (1002) /

site is down (1003) / Pod Failure (1003) /

CLI (1003) / BFD Failure (1003) /

Traffic Hit (1004) /

Pod Failure (1004) / CLI (1004) /

BFD Failure (1004) / current role is not

STANDBY_ERROR/

STANDBY to reset

role (1005) / resetRole:

Key not found in etcd (1006) /

monitoring threshold per

pod is breached (1007) /

Retry on heartbeat

failure (1008) /

received error message (No remote host available for this request / Selected remote host <remotehostname> has no client connection / Sla is expired for transaction / …)

BFD KPIs

The following table lists BFD KPIs.

BFD KPIs - 1

KPI Name

Description

Labels

Possible Values

bgp_speaker

_bfd_status

This KPI displays BFD link status on BGP Speaker.

status

STATE_UP /

STATE_DOWN

geo_bfd_

status

This KPI displays BFD link status on Geo POD.

status

STATE_UP /

STATE_DOWN

BFD KPIs - 2

KPI Name

Description

Gauge

bgp_speaker

_bfd_status

This KPI displays BFD link status on BGP Speaker.

1 (UP) or 0 (DOWN)

geo_bfd_

status

This KPI displays BFD link status on Geo POD.

1 (UP) or 0 (DOWN)

Cross-rack-routing BFD Interface Monitoring

Cross-rack-routing BFD Interface Monitoring KPIs

KPI Name

Description

Labels

Possible Values

geo_monitoring_

total

This KPI displays the total number of Gateway Down or LocalBFDInterface down messages when peer rack is down with the details of gateway IP or interface name.

ControlAction

Type

AdminMonitoring

ActionType

ControlAction

NameType

MonitorGateway /

MonitorLocalBfdInterface

AdminNode

gateway_ip /

interface_name

status

gateway ip is down from all proto node /

local bfd interface is down from all proto node

status_code

1012 / 1013

bgp_bfd_Monitor_

Interface_

status (Type - Gauge)

This KPI indicates each peer connection status. This connection is BFD interface configured and peers on the remote rack.

interface

<Local Rack Interface Name>

peer_address

<Remote Rack neighbor Ip address>

type

Bfd-Peer

bgp_bfd_Monitor_

Remote_Rack_

status (Type - Gauge)

This KPI indicates the status of remote rack. Current rack interface and remote rack peers are configured in as a part of BFD peering. Rack status is up if any of the connection from both the proto node is up. If connection is down at both the proto nodes, then this KPI indicates the remote rack status is down.

status

BFD_Remote_

Rack_STATUS

Local Interface Monitoring

Local Interface Monitoring KPI

KPI Name

Description

Labels

Possible Values

geo_monitoring_

total

This KPI displays the total number of local interface down cases with the details of interface name.

ControlAction

Type

AdminMonitoring

ActionType

ControlAction

NameType

MonitorInterface

AdminNode

interface_name

status

Local interface is down from all proto node

status_code

1014

GR Instance Information

GR Instance Information KPI

KPI Name

Description

Labels

Possible Values

gr_instance_

information (Type – Guage)

This KPI displays the current role of the GR instance in the application.

gr_instance_id

Configured GR instances value (numerical value)

Geo Maintenance Mode

Geo Maintenance Mode KPI

KPI Name

Description

Labels

Possible Values

geo_MaintenanceMode_

info (Type – Guage)

This KPI displays the current state of maintenance mode for the rack.

MaintenanceMode

0: false

1: true