Addons Catapult

Addon rules

The Prometheus agent is configured to report numerous addon metrics. Below is the YAML-based rule set that Catapult uses, including alert names, rules with alert type and expression, timeframe, labels with type and severity, and annotations which contain the summary and description of the notification.

bash-5.0# cat sunpike.yml
groups:
  - name: hosts
    rules:
      - alert: HostNotConverging
        expr: sum without (status_start_attempts) (sunpike_hosts_health{clusterrole!="none",hoststate!="Ok"}) == 1
        for: 20m
        labels:
          type: pf9
          severity: high
        annotations:
          summary: Host {{ $labels.hostname }} not converging
          description: Host {{ $labels.hostname }} is not converging, status {{ $labels.hoststate }}, role {{ $labels.clusterrole }}, retries {{ $labels.status_start_attempts }}

  - name: clusteraddons
    rules:
      - alert: AddonNotHealthy
        expr: sunpike_clusteraddons_health{healthy!="true"} == 1
        for: 10m
        labels:
          type: pf9
          severity: high
        annotations:
          summary: Addon {{ $labels.type }} not healthy!!
          description: Addon {{ $labels.type }} of cluster {{ $labels.cluster }} not healthy

      - alert: AddonNotConverging
        expr: sunpike_clusteraddons_health{phase=""} == 1
        for: 20m
        labels:
          type: pf9
          severity: high
        annotations:
          summary: Addon {{ $labels.type }} not converging!!
          description: Addon {{ $labels.type }} of cluster {{ $labels.cluster }} not converging

      - alert: AddonInstallError
        expr: sunpike_clusteraddons_health{phase="InstallAddonError"} == 1
        for: 5m
        labels:
          type: pf9
          severity: high
        annotations:
          summary: Error installing {{ $labels.type }} addon!!
          description: Error installing Addon {{ $labels.type }} on cluster {{ $labels.cluster }} failed to install

      - alert: AddonUninstallError
        expr: sunpike_clusteraddons_health{phase="UnInstallAddonError"} == 1
        for: 5m
        labels:
          type: pf9
          severity: high
        annotations:
          summary: Error uninstalling {{ $labels.type }} addon!!
          description: Error uninstalling Addon {{ $labels.type }} on cluster {{ $labels.cluster }} failed to uninstall

bash-5.0#

Last updated

Was this helpful?