您的位置首页  散文随感

没想到eventalertmod(EventAlertMod插件1.12)

概述因为目前工作基本都是用钉钉办公,所以今天主要介绍一下怎么在prometheus配置钉钉告警,这里的前提是已经部署了alertmanager。

没想到eventalertmod(EventAlertMod插件1.12)

 

概述因为目前工作基本都是用钉钉办公,所以今天主要介绍一下怎么在prometheus配置钉钉告警,这里的前提是已经部署了alertmanager一、配置go由于Prometheus 是用golang开发的,所以首先安装一个go环境,Go语言是跨平台,支持Windows、Linux、Mac OS X等系统,还提供有源码,可编译安装。

下载地址:https://studygolang.com/dl1、解压# tar -xvf go1.13.linux-amd64.tar.gz -C /usr/local/ 2、配置环境变量echo "export PATH=$PATH:/usr/local/go/bin" >> /etc/profile source /etc/profile

3、测试验证一下是否成功,用go version 来验证# go version

二、配置钉钉机器人1、机器人管理

2、选择Webhook

3、选择群组

4、查看机器人设置

二、将钉钉接入 Prometheus AlertManager WebHook插件下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk

1、安装Webhook--源码编译(注意在golang的src目录下新建) mkdir -p /usr/local/go/src/github.com/timonwong/ cd /usr/local/go/src/github.com/timonwong/ git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git cd prometheus-webhook-dingtalk make --二进制包安装 wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0 .linux-amd64.tar.gz

2、解压# tar -xvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

安装后会生成prometheus-webhook-dingtalk发送钉钉告警模版文件:/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/default.tmpl

3、启动prometheus-webhook-dingtalknohup ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=de544211xxxx96f" >dingding.log 2>&1 &

5、配置系统服务# vim /etc/systemd/system/prometheus-webhook-dingtalk.service [Unit] Description=prometheus-webhook-dingtalk After=network-online.target ​ [Service] Restart=on-failure ExecStart=/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/prometheus-webhook-dingtalk --ding.profile=sre=https://oapi.dingtalk.com/robot/send?access_token=de544xxx8ebc04e8da096f ​ [Install] WantedBy=multi-user.target ​ # chmod u+x /etc/systemd/system/prometheus-webhook-dingtalk.service # systemctl daemon-reload # systemctl start prometheus-webhook-dingtalk # systemctl status prometheus-webhook-dingtalk

三、配置 alertmanager 的邮件发送方和对接钉钉 webhook/usr/local/alertmanager/alertmanager.ymlglobal: resolve_timeout: 5m # 配置邮件发送方信息 smtp_smarthost: smtp.qq.com:465 smtp_from: 1275758000@qq.com smtp_auth_username: 1275758000@qq.com smtp_auth_password: nxxxegb smtp_require_tls: false route: group_by: [alertname, cluster, service] receiver: default-receiver group_wait: 30s group_interval: 2m repeat_interval: 30m receivers: - name: default-receiver email_configs: - to: 1430985018@qq.com,644642050@qq.com # 配置连接 prometheus-webhook-dingtalk启动的服务 webhook_configs: #ops_dingding是前面启动webhook所定义的值 - url: http://localhost:8060/dingtalk/sre/send send_resolved: true

repeat_interval: 这个字段是发送的频率,可以根据自己的需要进行设置,在调试过程中可以设置稍微短一点查看状态:

四、prometheus配置(参考)配置文件rules.yml:groups: - name: host_monitoring rules: - alert: 内存报警 expr: netdata_system_ram_MiB_average{chart="system.ram",dimension="free",family="ram"} < 800 for: 2m labels: team: node annotations: Alert_type: 内存报警 Server: {{$labels.instance}} #summary: "{{$labels.instance}}: High Memory usage detected" explain: "内存使用量超过90%,目前剩余量为:{{ $value }}M" #description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})" - alert: CPU报警 expr: netdata_system_cpu_percentage_average{chart="system.cpu",dimension="idle",family="cpu"} < 20 for: 2m labels: team: node annotations: Alert_type: CPU报警 Server: {{$labels.instance}} explain: "CPU使用量超过80%,目前剩余量为:{{ $value }}" #summary: "{{$labels.instance}}: High CPU usage detected" #description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})" - alert: 磁盘报警 expr: netdata_disk_space_GiB_average{chart="disk_space._",dimension="avail",family="/"} < 4 for: 2m labels: team: node annotations: Alert_type: 磁盘报警 Server: {{$labels.instance}} explain: "磁盘使用量超过90%,目前剩余量为:{{ $value }}G" - alert: 服务告警 expr: up == 0 for: 2m labels: team: node annotations: Alert_type: 服务报警 Server: {{$labels.instance}} explain: "netdata服务已关闭"

这个配置文件是改过的,yaml文件对格式要求和其他文件不一样,具体的可以自己去看一下,改完之后可以检测一下自己的格式是否正确这个是一个格式化工具,主要是可以检查一下你的文件是否正确http://www.bejson.com/validators/yaml_editor/

五、查看告警停止cadvisor:docker stop cadvisor

日志:

重启服务后:

好吧,就是告警模板有点丑,后面在做改进,先测试到这里。后面会分享更多关于prometheus方面的内容,感兴趣的朋友可以关注下!

免责声明:本站所有信息均搜集自互联网,并不代表本站观点,本站不对其真实合法性负责。如有信息侵犯了您的权益,请告知,本站将立刻处理。联系QQ:1640731186