prometheus 指定服务器和端口报警问题

prometheus 要编写指定的服务器,指定端口的告警规则,可以通过下面的表达式来实现


expr:probe_success{job="02tcp_port_check",instance="50.40.50.20:80"} or probe_success{job="02tcp_port_check",instance="50.40.50.20:443"} == 0

但这种写法有个缺陷,当一个服务器有几十上百个端口需要进行监控时,这个表达式将无法维护,求编写指定服务器多端口监控的告警规则写法,一个端口一个告警规则文件除外,谢谢各位!

没图难解?

可以参考
https://blog.csdn.net/xujiamin0022016/article/details/106583707?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~aggregatepage~first_rank_ecpm_v1~rank_v31_ecpm-3-106583707-null-null.pc_agg_new_rank&utm_term=prometheus%E7%9B%91%E6%8E%A7%E7%AB%AF%E5%8F%A3%E7%8A%B6%E6%80%81&spm=1000.2123.3001.4430


支持正则,把=换成=~,正则匹配换成.*。例如instance=~"50.40.50.20:.*"
expr:probe_success{job="02tcp_port_check",instance=~"50.40.50.20:.*"} or probe_success{job="02tcp_port_check",instance=~"50.40.50.20:.*"} == 0

可参考这个博客中的介绍的进行 https://blog.csdn.net/u011220233/article/details/123043587

我采用下面这个简单的方式


## tcp 端口检测
  - alert: 76_TEST_server_port_10.40.30.217:80    failed
    for: 1m
    expr: probe_success{job="02tcp_port_check",instance="10.40.30.217:80"} == 0
    labels:
      groupjob: No_alarm_will_be_sent_from_20_to_8_BJ_time
      alertype: 74_Send_alarm_from_8_to_20_BJ_time_vSDv8.com
    annotations:
      description: "{{ $labels.group }}{{ $labels.app }} tcp检测失败,当前probe_success的值为{{ $value }}"
      summary: "{{ $labels.group }}组的应用 {{ $labels.app }} 端口检测不通"
      resolved: "{{ $labels.group }}{{ $labels.app }} tcp服务检测成功,故障已经恢复"


## tcp 端口检测
  - alert: 76_TEST_server_port_10.40.30.217:443    failed
    for: 1m
    expr: probe_success{job="02tcp_port_check",instance="10.40.30.217:443"} == 0
    labels:
      groupjob: No_alarm_will_be_sent_from_20_to_8_BJ_time
      alertype: 74_Send_alarm_from_8_to_20_BJ_time_vSDv8.com
    annotations:
      description: "{{ $labels.group }}{{ $labels.app }} tcp检测失败,当前probe_success的值为{{ $value }}"
      summary: "{{ $labels.group }}组的应用 {{ $labels.app }} 端口检测不通"
      resolved: "{{ $labels.group }}{{ $labels.app }} tcp服务检测成功,故障已经恢复"