The Alert Manager component in CVIM-MON is in charge of the routing, grouping, and inhibiting alerts that are sent by the
Prometheus alert rule engine to the appropriate receivers.
By default, CVIM-MON forwards every alert to the SNMP agent to be sent to the SNMP managers as SNMP traps, if enabled in the
configuration file.
After deployment, you can add custom alert routes, alert grouping, alert inhibitions and receivers by following the below
steps:
1.Create a proper custom alerting rules configuration file:
-
Create a custom alert manager rule configuration file named alertmanager_custom_config.yml.
-
Edit the content using your favorite editor (see format below).
-
Verify that the custom alerting rule file is valid using the provided tool.
2.Once the file is validated, you can execute the following command:
# ./bootstrap/k8s-infra/k8s_runner.py --alerting_rules_config <alertmanager_config_file>
Supported Receivers
The Alert Manager supports the following list of receivers:
-
webhook
-
pagerduty
-
e-mail
-
pushover
-
wechat
-
opsgenie
-
victorops
Alert Manager Custom Configuration File Format
General Format
The following listing shows the general format of the alert manager configuration file. Most custom configuration files must
include only a small subset of the available options.
global:
# ResolveTimeout is the time after which an alert is declared resolved # if it has not been updated.
[ resolve_timeout: <duration> | default = 5m ]
# The default SMTP From header field. [ smtp_from: <tmpl_string> ]
# The default SMTP smarthost used for sending emails, including port number.
# Port number usually is 25, or 587 for SMTP over TLS (sometimes referred to as STARTTLS).
# Example: smtp.example.org:587 [ smtp_smarthost: <string> ]
# The default hostname to identify to the SMTP server. [ smtp_hello: <string> | default = "localhost" ]
[ smtp_auth_username: <string> ]
# SMTP Auth using LOGIN and PLAIN. [ smtp_auth_password: <secret> ]
# SMTP Auth using PLAIN.
[ smtp_auth_identity: <string> ] # SMTP Auth using CRAM-MD5.
[ smtp_auth_secret: <secret> ]
# The default SMTP TLS requirement.
[ smtp_require_tls: <bool> | default = true ]
# The API URL to use for Slack notifications. [ slack_api_url: <secret> ]
[ victorops_api_key: <secret> ]
[ victorops_api_url: <string> | default = "https://alert.victorops.com/integrations/generic/20131114/alert/" ]
[ pagerduty_url: <string> | default = "https://events.pagerduty.com/v2/enqueue" ] [ opsgenie_api_key: <secret> ]
[ opsgenie_api_url: <string> | default = "https://api.opsgenie.com/" ] [ hipchat_api_url: <string> | default = "https://api.hipchat.com/" ] [ hipchat_auth_token: <secret> ]
[ wechat_api_url: <string> | default = "https://qyapi.weixin.qq.com/cgi-bin/" ] [ wechat_api_secret: <secret> ]
[ wechat_api_corp_id: <string> ]
# The default HTTP client configuration [ http_config: <http_config> ]
# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'. templates:
[ - <filepath> ... ]
# The root node of the routing tree. route: <route>
# A list of notification receivers. receivers:
- <receiver> ...
# A list of inhibition rules. inhibit_rules:
[ - <inhibit_rule> ... ]
The custom configuration must be a full working configuration file with the following template. It must contain three main
keys such as global, route, and receiver.
The global configuration must have at least one attribute, for example, resolve_timeout = 5m. Ensure that all new receivers
must be part of the route, so the alerts are routed to the proper receivers. The receiver name cannot be snmp.
You can find the configuration details for creating route/receiver in the Prometheus Alert Manager documentation (publicly
available online).
global: resolve_timeout: 5m
route: <route>
receivers:
- <receiver> ...
The following is a custom config to add a webhook receiver.
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 8737h
receiver: receiver-webhook
receivers:
- name: ‘receiver-webhook’
webhook_configs:
- send_resolved: true
url: 'http://webhook-example:####/xxxx/xxx'
Default Built-in Configuration File
Two different default configuration files are available to define the following in order:
-
Generic route for all alerts to the SNMP agent running on the management node.
-
Route to a generic receiver that can be customized to add a channel of notification (webhook, slack and others).
Default configuration file with SNMP enabled
:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 8737h
# A default receiver
receiver: snmp
receivers:
- name: 'snmp'
webhook_configs:
- send_resolved: true
url: 'http://localhost:1161/alarms'
Default configuration file with SNMP disabled
route:
receiver: recv
group_by:
- alertname
- cluster
- service
group_wait: 30s
group_interval: 5m
repeat_interval: 8737h
receivers:
- name: recv
SNMP Trap Receivers
You can send the SNMP traps to SNMP managers enabled in the Cisco VIM configuration file setup_data.yaml.
Example: inhibit (mute) alerts matching a set of labels
Inhibit alerts is a tool that prevents certain alerts to be triggered if other alert/alerts is/are trigged. If one alert having
the target attribute matches with the another alert having source attribute, this tool inhibits the alert with target attribute.
This is the general format for inhibit alerts. You can set a regex to match both the source and target alerts and to filter
the alerts per label name.
# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
[ <labelname>: <labelvalue>, ... ]
target_match_re:
[ <labelname>: <regex>, ... ]
# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
[ <labelname>: <labelvalue>, ... ]
source_match_re:
[ <labelname>: <regex>, ... ]
# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: '[' <labelname>, ... ']' ]
Example: Inhibit alerts if other alerts are active
The following is an example of inhibit rule that inhibits all the warning alerts that are already critical.
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
# Apply inhibition if the alertname is the same.
equal: ['alertname', 'cluster', 'service']
This is an example of inhibit all alerts docker_container in containers that are down (which has the alert docker_container_down
on).
inhibit_rules:
- target_match_re:
alertname: 'docker_container.+'
source_match:
alertname: 'docker_container_down'
equal: ['job', 'instance']
Validation Script
When a new configuration is set, execute amtool script and ensure that you get a SUCCESS in the output from the configuration
POV.
> /opt/cisco/amtool check-config <alertmanager_config_file>
Checking '<alertmanager_config_file>' SUCCESS
Found:
- global config
- route
- 0 inhibit rules
- 1 receivers
- 0 templates