prometheus 要编写指定的服务器,指定端口的告警规则,可以通过下面的表达式来实现
expr:probe_success{job="02tcp_port_check",instance="50.40.50.20:80"} or probe_success{job="02tcp_port_check",instance="50.40.50.20:443"} == 0
但这种写法有个缺陷,当一个服务器有几十上百个端口需要进行监控时,这个表达式将无法维护,求编写指定服务器多端口监控的告警规则写法,一个端口一个告警规则文件除外,谢谢各位!
没图难解?
支持正则,把=换成=~,正则匹配换成.*。例如instance=~"50.40.50.20:.*"
expr:probe_success{job="02tcp_port_check",instance=~"50.40.50.20:.*"} or probe_success{job="02tcp_port_check",instance=~"50.40.50.20:.*"} == 0
可参考这个博客中的介绍的进行 https://blog.csdn.net/u011220233/article/details/123043587
我采用下面这个简单的方式
## tcp 端口检测
- alert: 76_TEST_server_port_10.40.30.217:80 failed
for: 1m
expr: probe_success{job="02tcp_port_check",instance="10.40.30.217:80"} == 0
labels:
groupjob: No_alarm_will_be_sent_from_20_to_8_BJ_time
alertype: 74_Send_alarm_from_8_to_20_BJ_time_vSDv8.com
annotations:
description: "{{ $labels.group }}的{{ $labels.app }} tcp检测失败,当前probe_success的值为{{ $value }}"
summary: "{{ $labels.group }}组的应用 {{ $labels.app }} 端口检测不通"
resolved: "{{ $labels.group }}的{{ $labels.app }} tcp服务检测成功,故障已经恢复"
## tcp 端口检测
- alert: 76_TEST_server_port_10.40.30.217:443 failed
for: 1m
expr: probe_success{job="02tcp_port_check",instance="10.40.30.217:443"} == 0
labels:
groupjob: No_alarm_will_be_sent_from_20_to_8_BJ_time
alertype: 74_Send_alarm_from_8_to_20_BJ_time_vSDv8.com
annotations:
description: "{{ $labels.group }}的{{ $labels.app }} tcp检测失败,当前probe_success的值为{{ $value }}"
summary: "{{ $labels.group }}组的应用 {{ $labels.app }} 端口检测不通"
resolved: "{{ $labels.group }}的{{ $labels.app }} tcp服务检测成功,故障已经恢复"