# Monitoring & Observability

This document describes the built-in monitoring and observability component that is part of <code class="expression">space.vars.product\_name</code>. <code class="expression">space.vars.product\_name</code> uses open source [Prometheus, Alert Manager, and Grafana](https://prometheus.io/) as the key components of it's monitoring stack behind the scenes. The <code class="expression">space.vars.product\_acronym</code> monitoring system is designed to collect infrastructure & application metrics using Prometheus exporters for various <code class="expression">space.vars.product\_acronym</code> components and services. <code class="expression">space.vars.product\_acronym</code> administrators can then leverage these metrics in a variety of ways:

* Metrics visibility via built-in charts in the <code class="expression">space.vars.product\_acronym</code> UI
* Default dashboards provided as part of the built-in Grafana instance
* Metrics that the administrators can consume to feed into their own internal monitoring platforms

## Monitoring Architecture

The <code class="expression">space.vars.product\_name</code> monitoring system leverages three primary components:

1. **Prometheus**: An open-source monitoring and alerting toolkit that collects and stores metrics as time-series data.
2. **Alert Manager**: Handles alerts sent by the Prometheus server, including deduplicating, grouping, and routing alerts to the correct receiver.
3. **Grafana**: A multi-platform open-source analytics and interactive visualization web application that provides charts, graphs, and alerts when connected to supported data sources.

### Region Mapping

An instance of the monitoring system that includes the three components above is provisioned **per region** of your <code class="expression">space.vars.product\_name</code> setup. The metrics displayed in the grafana dashboard for a given region represent aggregated data from all virtualized clusters within that region.

## Accessing Grafana Dashboards

Navigate to Home, then:

* **October 2025+ releases:** Enable "Show All Tenants Info" toggle before selecting Grafana Dashboard
* **Earlier releases:** Select Grafana Dashboard directly

## Log into Grafana

The default login credentials depend on the version of your current <code class="expression">space.vars.product\_name</code> setup:

* **June 2025+ releases:** Use your <code class="expression">space.vars.product\_name</code> administrator credentials to log into Grafana. These are the same credentials you use to access your <code class="expression">space.vars.product\_name</code> instance.
* **Earlier releases:** Username: admin, Password: admin

You will be prompted to change the default password when you first log in.

{% hint style="info" %}
**Note**

If you use your <code class="expression">space.vars.product\_name</code> administrator credentials for Grafana access, **do not rename the administrator account or change its password** without updating both systems. Creating additional domain administrators in <code class="expression">space.vars.product\_name</code> does not automatically grant them Grafana access.
{% endhint %}

## Monitored Metrics

The monitoring system tracks metrics across various categories, with a primary focus on hypervisor health and virtual machine performance.

### Hypervisor Metrics

Hypervisor metrics provide insight into the health and performance of the <code class="expression">space.vars.product\_name</code> hosts in a given region.

Note that the identifier shown in the Grafana hypervisor charts corresponds to the [Host ID](https://docs.platform9.com/private-cloud-director/2025.10/virtualized-clusters/host#host-id).

The following metrics are tracked today:

#### Compute Metrics

* **Hypervisor CPU Total**: The total CPU resources allocated to the Hypervisor
* **Hypervisor Memory Total**: The total memory allocated to the Hypervisor
* **Hypervisor CPU Usage**: The actual CPU utilization of the Hypervisor, showing host resource consumption
* **Hypervisor Memory Usage**: The actual memory utilization of the Hypervisor, showing hosts memory consumption
* **Number of Hosts:** Total number of hypervisor hosts running in the region

#### Storage Metrics

* **Disk space ext4:** Total configured disk space on the root partition across all hosts in the region.
* **Disk usage ext4:** Total used disk space on the root partition across all hosts in the region.
* **Disk read throughput**: Reports read throughput for the root partition across all hosts in the region.
* **Disk write throughput**: Reports write throughput for the root partition across all hosts in the region.
* **Disk read throughput over time (chart)**: Shows a chart of read throughput for the root partition per host over time.
* **Disk write throughput over time (chart)**: Shows a chart of write throughput for the root partition per host over time.

#### Networking Metrics

* **Network RX throughput (chart):** Shows a chart of the rate of inbound network traffic per host.
* **Network TX throughput (chart):** Shows a chart of the rate of outbound network traffic per host.

#### Retention time

* Retention time for all Prometheus metrics is 15 days.

### Virtual Machine Metrics

Virtual machine metrics offer visibility into the aggregate resource utilization of all virtual machines within a given region.

{% hint style="info" %}
**Note**

The identifier shown in the Grafana virtual machine charts corresponds to the [VM UUID](https://docs.platform9.com/private-cloud-director/2025.10/virtualized-clusters/virtualmachine#vm-uuid).
{% endhint %}

The following metrics are tracked today:

#### VM Compute Metrics

* **VM CPU Total**: Total configured CPU allocated to all virtual machines in the region.
* **VM Memory Total**: Total configured memory across all virtual machines in the region. The configured memory of a VM refers to the amount of memory defined in its configuration settings.
* **VM CPU Usage**: CPU utilization across all virtual machines in this region.
* **VM Memory Usage (in percentage and actual value)**: Memory utilized across all virtual machines in the region
* **VM Memory allocated:** Total memory allocated across all virtual machines
* **Number of VMs:** Total number of virtual machines running in the region
* **CPU Throttling (percent)**

#### VM Storage Metrics

* **Total Storage:** Total storage across all virtual machines
* **Allocated Storage:** Total allocated storage across all virtual machines
* **Used Storage:** Total used storage across all virtual machines
* **Read Throughput:** Read throughput across all virtual machines
* **Write Throughput:** Write throughput across all virtual machines
* **Read IOPS:** Read IOPs across all VMs
* **Write IOPS:** Write IOPs across all VMs

#### VM Networking Metrics

* **Read Latency:** Read latency across all VMs
* **Write Latency**: Write latency across all VMs
* **RX throughput:** Inbound traffic throughput across all VMs
* **TX throughput:** Outbound traffic throughput across all VMs
* **RX Packet drop:**

## Dashboards

> The built-in grafana instance per region includes pre-configured dashboards that display the metrics listed above.

### Custom Dashboard Creation

You can create custom Grafana dashboards tailored to your specific monitoring needs.

To create a custom dashboard:

1. Log in to the Grafana interface through the <code class="expression">space.vars.product\_name</code> UI.
2. Navigate to the Dashboard section.
3. Click **New Dashboard.**
4. Add panels by selecting **Add Panel**.
5. Choose visualization types and configure data sources.
6. Save the dashboard with a descriptive name.
