# Nodes Connectivity Monitoring

## Rules for the Hostagent

The Prometheus agent is configured to report numerous node-based metrics. Below is the YAML-based rule set that Catapult uses, including alert names, expression, timeframe, annotations which contain the summary and description of the notification as well as the SaaS management plane and hostname.

{% tabs %}
{% tab title="YAML" %}

```yaml
bash-5.0# cat resmgr.yml
groups:
  - name: Host Availability Alerts
    rules:
    # du_sidekick_last_checkin_seconds will be the time in seconds since a host checked in via sidekick
    # or -1 if the host is not reported by sidekick at all. The latter case may occur if
    # sidekick has restarted and a host hasn't connected since.
    - expr: >-2
        (
          sum by (host_id,host_name) (
            time() - sidekick_host_last_heartbeat_time{job="sidekickserver"} / 1000
          )
        )
        OR
        ignoring(host_name) (
          sum by (host_id,host_name) (
            (resmgr_host_up{job="resmgr"} == 0) - 1
          )
        )
      record: du_sidekick_last_checkin_seconds


  - name: Hosts disconnected
    rules:
    # resmgr says it's been down for at least 10m, sidekick says still reporting
    # NOTE: the `for` delay _must_ exceeed the cutoff (currently 600 seconds) + the scrape
    # period (1m) or else both this and host-down will fire off simultaneously
    - alert: host-disconnected
      expr: >-2
        sum by (du, host_id, host_name) (du_sidekick_last_checkin_seconds < 600) AND
        ON(host_id) resmgr_host_up{job="resmgr"} == 0
      for: 15m
      annotations:
        summary: host-disconnected
        description: "{{ $labels.host_name }} disconnected from control plane {{ $labels.du }} for more than 10 minutes"
        du: "{{ $labels.du }}"
        host_name: "{{ $labels.host_name }}"


    # resmgr says it's been down for at least 10m, sidekick "agrees"
    - alert: host-down
      expr: >-2
        sum by (du, host_id, host_name) (du_sidekick_last_checkin_seconds >= 600 OR du_sidekick_last_checkin_seconds == -1) AND
        ON(host_id) resmgr_host_up{job="resmgr"} == 0
      for: 10m
      annotations:
        summary: host-down
        description: "{{ $labels.host_name }} down {{ $labels.du }} for more than 10 minutes"
        du: "{{ $labels.du }}"
        host_name: "{{ $labels.host_name }}"
bash-5.0#
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.platform9.com/managed-kubernetes/5.11/catapult-rules-alarms/nodes-connectivity-monitoring.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
