> For the complete documentation index, see [llms.txt](https://docs.platform9.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.platform9.com/private-cloud-director/upgrade/post-upgrade-verification.md).

# Post-Upgrade Verification

## Overview

After upgrading all hosts, run through this verification checklist to confirm that every service is healthy and every role is operating correctly. Do not re-enable VM HA or DRR until you have completed the service and role verification steps below.

{% hint style="info" %}
**Applies to both deployment models.** The host, role, storage, networking, and GPU verification steps on this page apply to both **SaaS** and **Self-Hosted** deployments. Management-plane verification using `airctl` applies to **Self-Hosted deployments only** and is marked. In SaaS deployments, Platform9 operates and upgrades the management plane; verify region health from the UI.
{% endhint %}

In this guide, you will confirm region health, verify all host roles, validate GPU passthrough configuration, test storage volume attach, test Networking Service connectivity, and re-enable VM HA and DRR.

## Verify Region Health <a href="#verify-management-plane" id="verify-management-plane"></a>

From the <code class="expression">space.vars.product\_name</code> UI, navigate to **Infrastructure > Regions** and confirm all regions show a healthy status before proceeding.

{% hint style="info" %}
**Self-Hosted deployments only.** Run `airctl status` from the management cluster node and confirm every region is ready:

```bash
airctl status
```

Expected output for each region:

```
deployment status:   ready
region health:       ✅ Ready
desired services:    <N>
ready services:      <N>
```

The `desired services` and `ready services` counts must match. If any services are not ready, check the pod logs in the region namespace before proceeding:

```bash
kubectl get pods -n <region-fqdn> | grep -v Running | grep -v Completed
```

Investigate any pod in `CrashLoopBackOff`, `Error`, or `Pending` state.
{% endhint %}

## Verify Host Role Status <a href="#verify-host-roles" id="verify-host-roles"></a>

Every host in every region must show `Status: ok` and `Agent Status: running` after the upgrade.

From the UI, navigate to **Infrastructure > Cluster Hosts** and confirm:

* No host shows a warning or error badge.
* All hosts that had the Hypervisor role before the upgrade still show the Hypervisor role as `applied`.
* Hosts with the Persistent Storage Service role show that role as `applied`.
* Hosts with the Image Library Service role show that role as `applied`.

If a host's role is missing or shows as `unauthorized` after the upgrade, re-assign the role: select the host, click **Edit Roles**, assign the appropriate roles, and click **Update Role Assignment**. If a host is stuck in `converging`, select the host and choose **Other > Re-sync Host**.

For any host that is not healthy, check the host agent log on that host:

```bash
tail -n 100 /var/log/pf9/hostagent.log
```

{% hint style="info" %}
**Self-Hosted deployments only.** You can also verify host role status from the management cluster node:

```bash
airctl host-status --config /opt/pf9/airctl/conf/airctl-config.yaml
```

{% endhint %}

## Validate GPU Passthrough <a href="#validate-gpu-passthrough" id="validate-gpu-passthrough"></a>

{% hint style="info" %}
Skip this section if your deployment does not use GPU passthrough.
{% endhint %}

After upgrading hosts that carry the Hypervisor role with GPU passthrough configured, re-validate that the GPU devices are still bound correctly.

1. On each GPU host, confirm the GPU device is bound to the `vfio-pci` driver:

```bash
lspci -k | grep -A 3 -i nvidia
```

The `Kernel driver in use` field should show `vfio-pci`. If it shows a different driver, the GPU binding was not preserved through the OS upgrade. Follow the [Set up GPU Passthrough](/private-cloud-director/gpu/gpu-support-pcd/set-up-gpu-passthrough.md) guide to rebind the device.

2. From the <code class="expression">space.vars.product\_name</code> UI, navigate to **Infrastructure > Cluster Hosts**, select a GPU host, and confirm that the GPU devices are listed under the host's hardware details.
3. Attempt to launch a test VM using a GPU-enabled flavor. Confirm the VM starts successfully and the GPU is accessible from within the VM.

For vGPU deployments, confirm that the vGPU profiles are still present and the vGPU driver service is running on the host:

```bash
systemctl status nvidia-vgpu-mgr
```

For full GPU troubleshooting steps, see [Troubleshooting GPU Support](/private-cloud-director/gpu/gpu-support-pcd/troubleshooting-gpu-support.md).

## Test Persistent Storage Volume Attach <a href="#test-volume-attach" id="test-volume-attach"></a>

Confirm that the Persistent Storage Service is operational by attaching a volume to a running VM.

1. List available volumes:

```bash
pcdctl volume list
```

2. Identify a test VM that is running:

```bash
pcdctl server list --status ACTIVE
```

3. Attach an existing available volume to the test VM:

```bash
pcdctl server add volume <server-id> <volume-id>
```

4. Confirm the attachment succeeded:

```bash
pcdctl server volume list <server-id>
```

The volume should appear with a `in-use` status. Detach the test volume after confirming:

```bash
pcdctl server remove volume <server-id> <volume-id>
```

If the attach fails, check the Persistent Storage Service endpoint status:

```bash
pcdctl volume service list
```

All endpoints should be `enabled` and `up`. If an endpoint is `down`, check the volume service logs on the block storage host.

## Test Networking Service Connectivity <a href="#test-networking" id="test-networking"></a>

Verify that the Networking Service is healthy and that VM network connectivity is intact after the upgrade.

### Verify Networking Service Endpoints

```bash
pcdctl network agent list
```

All agents should show `alive: True`. If any agent is not alive, check the networking agent status on the affected host:

```bash
systemctl status pf9-neutron-ovn-controller
```

### Test Network and Router Connectivity

1. List the networks in the region and confirm the expected networks are present:

```bash
pcdctl network list
```

2. List routers and confirm they are in `ACTIVE` state:

```bash
pcdctl router list
```

3. From a test VM, confirm outbound connectivity (if the VM has network egress):

```bash
# From within the test VM
ping -c 4 8.8.8.8
```

If any router is not `ACTIVE`, navigate to **Infrastructure > Networking > Routers** in the UI and inspect the router details for error messages.

### Test VM-to-VM Connectivity

Confirm that VMs on different hosts can communicate across the virtual network:

1. Identify two running VMs on different hypervisor hosts.
2. From one VM, ping the private IP address of the other:

```bash
ping -c 4 <other-vm-private-ip>
```

If ping fails between VMs on different hosts, check the OVS bridge status on the hypervisor hosts:

```bash
ovs-vsctl show
ovs-ofctl dump-flows br-int | head -20
```

## Re-Enable VM HA and DRR <a href="#re-enable-vm-ha-and-drr" id="re-enable-vm-ha-and-drr"></a>

Re-enable VM HA and DRR only after:

* All hosts in the cluster are running the same operating system version.
* All hosts show `Status: ok` and all roles are `applied`.
* The region health check passed.
* Storage and networking tests passed.

{% hint style="warning" %}
Do not re-enable VM HA if any hosts in the cluster are still running a different OS version. VM evacuation between hosts with different KVM versions will fail.
{% endhint %}

**Re-enable VM HA:**

1. Navigate to **Infrastructure > Clusters**.
2. Select the cluster.
3. Toggle **VM High Availability** to on.
4. Confirm the cluster shows `Protected` VM HA status after enabling. If it shows `Degraded` or `Not Protected`, hover over the VM HA status to see which prerequisite is not met, and resolve it before relying on VM HA for workload protection.

**Re-enable DRR:**

1. Navigate to **Infrastructure > Clusters**.
2. Select the cluster.
3. Toggle **Dynamic Resource Rebalancing** to on and confirm the frequency setting is correct for your environment.

For pre-conditions and full behavior details, see [Virtual Machine High Availability](/private-cloud-director/virtualized-clusters/virtualized-cluster/virtual-machine-high-availability-vm-ha.md) and [Dynamic Resource Rebalancing (DRR)](/private-cloud-director/virtualized-clusters/virtualized-cluster/dynamic-resource-rebalancing-drr.md).

## Upgrade Complete

Your <code class="expression">space.vars.product\_name</code> upgrade is complete once all verification steps above have passed and VM HA and DRR are re-enabled with healthy statuses. If you encounter issues during verification that you cannot resolve, contact Platform9 Support with the relevant pod or service logs. For Self-Hosted deployments, also include the output of `airctl status` and `airctl host-status`.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.platform9.com/private-cloud-director/upgrade/post-upgrade-verification.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
