# Monitoring

This document describes the built-in monitoring and observability component that is part of <code class="expression">space.vars.product\_name</code>. <code class="expression">space.vars.product\_name</code> uses open source [Prometheus, Alert Manager, and Grafana](https://prometheus.io/) as the key components of it's monitoring stack behind the scenes. The <code class="expression">space.vars.product\_acronym</code> monitoring system is designed to collect infrastructure & application metrics using Prometheus exporters for various <code class="expression">space.vars.product\_acronym</code> components and services. <code class="expression">space.vars.product\_acronym</code> administrators can then leverage these metrics in a variety of ways:

* Metrics visibility via built-in charts in the <code class="expression">space.vars.product\_acronym</code> UI
* Default dashboards provided as part of the built-in Grafana instance
* Metrics that the administrators can consume to feed into their own internal monitoring platforms

## Monitoring Architecture

The <code class="expression">space.vars.product\_name</code> monitoring system leverages three primary components:

1. **Prometheus**: An open-source monitoring and alerting toolkit that collects and stores metrics as time-series data.
2. **Alert Manager**: Handles alerts sent by the Prometheus server, including deduplicating, grouping, and routing alerts to the correct receiver.
3. **Grafana**: A multi-platform open-source analytics and interactive visualization web application that provides charts, graphs, and alerts when connected to supported data sources.

### Region Mapping

An instance of the monitoring system that includes the three components above is provisioned **per region** of your <code class="expression">space.vars.product\_name</code> setup. The metrics displayed in the grafana dashboard for a given region represent aggregated data from all virtualized clusters within that region.

## Accessing Grafana Dashboards

Navigate to Home, then:

* **October 2025+ releases:** Enable "Show All Tenants Info" toggle before selecting Grafana Dashboard
* **Earlier releases:** Select Grafana Dashboard directly

## Log into Grafana

The default login credentials depend on the version of your current <code class="expression">space.vars.product\_name</code> setup:

* **June 2025+ releases:** Use your <code class="expression">space.vars.product\_name</code> administrator credentials to log into Grafana. These are the same credentials you use to access your <code class="expression">space.vars.product\_name</code> instance.
* **Earlier releases:** Username: admin, Password: admin

You will be prompted to change the default password when you first log in.

{% hint style="info" %}
**Note**

If you use your <code class="expression">space.vars.product\_name</code> administrator credentials for Grafana access, **do not rename the administrator account or change its password** without updating both systems. Creating additional domain administrators in <code class="expression">space.vars.product\_name</code> does not automatically grant them Grafana access.
{% endhint %}

## Monitored Metrics

The monitoring system tracks metrics across various categories, with a primary focus on hypervisor health and virtual machine performance.

### Hypervisor Metrics

Hypervisor metrics provide insight into the health and performance of the <code class="expression">space.vars.product\_name</code> hosts in a given region.

Note that the identifier shown in the Grafana hypervisor charts corresponds to the [Host ID](https://docs.platform9.com/private-cloud-director/virtualized-clusters/add-hosts-virtualized-cluster#host-id).

The following metrics are tracked today:

#### Compute Metrics

* **Hypervisor CPU Total**: The total CPU resources allocated to the Hypervisor
* **Hypervisor Memory Total**: The total memory allocated to the Hypervisor
* **Hypervisor CPU Usage**: The actual CPU utilization of the Hypervisor, showing host resource consumption
* **Hypervisor Memory Usage**: The actual memory utilization of the Hypervisor, showing hosts memory consumption
* **Number of Hosts:** Total number of hypervisor hosts running in the region

#### Storage Metrics

* **Disk space ext4:** Total configured disk space on the root partition across all hosts in the region.
* **Disk usage ext4:** Total used disk space on the root partition across all hosts in the region.
* **Disk read throughput**: Reports read throughput for the root partition across all hosts in the region.
* **Disk write throughput**: Reports write throughput for the root partition across all hosts in the region.
* **Disk read throughput over time (chart)**: Shows a chart of read throughput for the root partition per host over time.
* **Disk write throughput over time (chart)**: Shows a chart of write throughput for the root partition per host over time.

#### Networking Metrics

* **Network RX throughput (chart):** Shows a chart of the rate of inbound network traffic per host.
* **Network TX throughput (chart):** Shows a chart of the rate of outbound network traffic per host.

#### Retention time

* Retention time for all Prometheus metrics is 15 days.

### Virtual Machine Metrics

Virtual machine metrics offer visibility into the aggregate resource utilization of all virtual machines within a given region.

{% hint style="info" %}
**Note**

The identifier shown in the Grafana virtual machine charts corresponds to the [VM UUID](https://docs.platform9.com/private-cloud-director/virtualized-clusters/virtualmachine#vm-uuid).
{% endhint %}

The following metrics are tracked today:

#### VM Compute Metrics

* **VM CPU Total**: Total configured CPU allocated to all virtual machines in the region.
* **VM Memory Total**: Total configured memory across all virtual machines in the region. The configured memory of a VM refers to the amount of memory defined in its configuration settings.
* **VM CPU Usage**: CPU utilization across all virtual machines in this region.
* **VM Memory Usage (in percentage and actual value)**: Memory utilized across all virtual machines in the region
* **VM Memory allocated:** Total memory allocated across all virtual machines
* **Number of VMs:** Total number of virtual machines running in the region
* **CPU Throttling (percent)**

#### VM Storage Metrics

* **Total Storage:** Total storage across all virtual machines
* **Allocated Storage:** Total allocated storage across all virtual machines
* **Used Storage:** Total used storage across all virtual machines
* **Read Throughput:** Read throughput across all virtual machines
* **Write Throughput:** Write throughput across all virtual machines
* **Read IOPS:** Read IOPs across all VMs
* **Write IOPS:** Write IOPs across all VMs

#### VM Networking Metrics

* **Read Latency:** Read latency across all VMs
* **Write Latency**: Write latency across all VMs
* **RX throughput:** Inbound traffic throughput across all VMs
* **TX throughput:** Outbound traffic throughput across all VMs
* **RX Packet drop:**

## Dashboards

> The built-in grafana instance per region includes pre-configured dashboards that display the metrics listed above.

### Custom Dashboard Creation

You can create custom Grafana dashboards tailored to your specific monitoring needs.

To create a custom dashboard:

1. Log in to the Grafana interface through the <code class="expression">space.vars.product\_name</code> UI.
2. Navigate to the Dashboard section.
3. Click **New Dashboard.**
4. Add panels by selecting **Add Panel**.
5. Choose visualization types and configure data sources.
6. Save the dashboard with a descriptive name.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.platform9.com/private-cloud-director/monitoring-and-observability/monitoring.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
