> For the complete documentation index, see [llms.txt](https://docs.platform9.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.platform9.com/managed-kubernetes/5.10/catapult-rules-alarms/calico-monitoring.md).

# Calico Monitoring

Below is the YAML-based rule set that Catapult uses, including alert names, expression, timeframe, labels, and annotations which contain the description and summaries.

{% tabs %}
{% tab title="YAML" %}

```yaml
bash-5.0# cat calico.yml
groups:
  - name: calico.rules
    rules:
#-----------------------------------   Calico Node -------------------------------------
# Felix Prometheus statistics: https://projectcalico.docs.tigera.io/reference/felix/prometheus

      - alert: PromHTTPRequestErrors
        expr: (sum(rate(promhttp_metric_handler_requests_total{job="calico-node",code=~"(4|5).."}[5m] offset 5m )) by (instance, job, cluster, host) / sum(rate(promhttp_metric_handler_requests_total{job="calico-node"}[5m] offset 5m )) by (instance, job, cluster, host)) * 100 > 1
        for: 10m
        labels:
          severity: warning
          type: calico-node
        annotations:
          description: "Cluster {{ $labels.cluster }}: HTTP requests errors on host {{ $labels.host }}."
          summary: Calico HTTP requests errors on cluster {{ $labels.cluster }}

      - alert: CalicoDatapaneFailuresHigh
        expr: increase(felix_int_dataplane_failures[1h] offset 5m) > 5
        for: 1h
        labels:
          severity: warning
          type: calico-node
        annotations:
          description: 'Felix cluster {{ $labels.cluster }} has seen {{ $value }} dataplane failures within the last hour'
          summary: 'A high number of dataplane failures within Felix are happening'

      - alert: CalicoIpsetErrorsHigh
        expr: increase(felix_ipset_errors[1h] offset 5m) > 5
        for: 1h
        labels:
          severity: warning
          type: calico-node
        annotations:
          description: 'Felix cluster {{ $labels.cluster }} has seen {{ $value }} ipset errors within the last hour'
          summary: 'A high number of ipset errors within Felix are happening'

      - alert: CalicoIptableSaveErrorsHigh
        expr: increase(felix_iptables_save_errors[1h] offset 5m) > 5
        for: 1h
        labels:
          severity: warning
          type: calico-node
        annotations:
          description: 'Felix cluster {{ $labels.cluster }} has seen {{ $value }} iptable save errors within the last hour'
          summary: 'A high number of iptable save errors within Felix are happening'

      - alert: CalicoIptableRestoreErrorsHigh
        expr: increase(felix_iptables_restore_errors[1h] offset 5m) > 5
        for: 1h
        labels:
          severity: warning
          type: calico-node
        annotations:
          description: 'Felix cluster {{ $labels.cluster }} has seen {{ $value }} iptable restore errors within the last hour'
          summary: 'A high number of iptable restore errors within Felix are happening'

#-----------------------------------   Calico Kube Controllers   -----------------------
# kube-controllers Prometheus statistics: https://projectcalico.docs.tigera.io/reference/kube-controllers/prometheus
# curl http://<pod_ip>:9094/metrics

#-----------------------------------   Calico Typha   ----------------------------------
# Typha Prometheus statistics:  https://projectcalico.docs.tigera.io/reference/typha/prometheus

      - alert: TyphaPingLatency
        expr: rate(typha_ping_latency_sum[1m] offset 5m ) / rate(typha_ping_latency_count[1m] offset 5m ) > 0.1 and rate(typha_ping_latency_count[1m] offset 5m ) > 0
        for: 2m
        labels:
          severity: warning
          type: calico-typha
        annotations:
          summary: Typha Round-trip ping latency to client (cluster {{ $labels.cluster }})
          description: "Typha latency is growing (ping operations > 100ms)<br>  VALUE = {{ $value }}<br>  LABELS = {{ $labels }}"

      - alert: TyphaClientWriteLatency
        expr: rate(typha_client_write_latency_secs_sum[1m] offset 5m) / rate(typha_client_write_latency_secs_count[1m] offset 5m) > 0.1 and rate(typha_client_write_latency_secs_count[1m] offset 5m ) > 0
        for: 2m
        labels:
          severity: warning
          type: calico-typha
        annotations:
          summary: Typha unusual write latency (instance {{ $labels.cluster }})
          description: "Typha client latency is growing (write operations > 100ms)<br>  VALUE = {{ $value }}<br>  LABELS = {{ $labels }}"
bash-5.0#
```

{% endtab %}
{% endtabs %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.platform9.com/managed-kubernetes/5.10/catapult-rules-alarms/calico-monitoring.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.