Post-Upgrade Verification
Overview
After upgrading all hosts, run through this verification checklist to confirm that every service is healthy and every role is operating correctly. Do not re-enable VM HA or DRR until you have completed the service and role verification steps below.
Applies to both deployment models. The host, role, storage, networking, and GPU verification steps on this page apply to both SaaS and Self-Hosted deployments. Management-plane verification using airctl applies to Self-Hosted deployments only and is marked. In SaaS deployments, Platform9 operates and upgrades the management plane; verify region health from the UI.
In this guide, you will confirm region health, verify all host roles, validate GPU passthrough configuration, test storage volume attach, test Networking Service connectivity, and re-enable VM HA and DRR.
Verify Region Health
From the Private Cloud Director UI, navigate to Infrastructure > Regions and confirm all regions show a healthy status before proceeding.
Self-Hosted deployments only. Run airctl status from the management cluster node and confirm every region is ready:
airctl statusExpected output for each region:
deployment status: ready
region health: ✅ Ready
desired services: <N>
ready services: <N>The desired services and ready services counts must match. If any services are not ready, check the pod logs in the region namespace before proceeding:
kubectl get pods -n <region-fqdn> | grep -v Running | grep -v CompletedInvestigate any pod in CrashLoopBackOff, Error, or Pending state.
Verify Host Role Status
Every host in every region must show Status: ok and Agent Status: running after the upgrade.
From the UI, navigate to Infrastructure > Cluster Hosts and confirm:
No host shows a warning or error badge.
All hosts that had the Hypervisor role before the upgrade still show the Hypervisor role as
applied.Hosts with the Persistent Storage Service role show that role as
applied.Hosts with the Image Library Service role show that role as
applied.
If a host's role is missing or shows as unauthorized after the upgrade, re-assign the role: select the host, click Edit Roles, assign the appropriate roles, and click Update Role Assignment. If a host is stuck in converging, select the host and choose Other > Re-sync Host.
For any host that is not healthy, check the host agent log on that host:
Self-Hosted deployments only. You can also verify host role status from the management cluster node:
Validate GPU Passthrough
Skip this section if your deployment does not use GPU passthrough.
After upgrading hosts that carry the Hypervisor role with GPU passthrough configured, re-validate that the GPU devices are still bound correctly.
On each GPU host, confirm the GPU device is bound to the
vfio-pcidriver:
The Kernel driver in use field should show vfio-pci. If it shows a different driver, the GPU binding was not preserved through the OS upgrade. Follow the Set up GPU Passthrough guide to rebind the device.
From the Private Cloud Director UI, navigate to Infrastructure > Cluster Hosts, select a GPU host, and confirm that the GPU devices are listed under the host's hardware details.
Attempt to launch a test VM using a GPU-enabled flavor. Confirm the VM starts successfully and the GPU is accessible from within the VM.
For vGPU deployments, confirm that the vGPU profiles are still present and the vGPU driver service is running on the host:
For full GPU troubleshooting steps, see Troubleshooting GPU Support.
Test Persistent Storage Volume Attach
Confirm that the Persistent Storage Service is operational by attaching a volume to a running VM.
List available volumes:
Identify a test VM that is running:
Attach an existing available volume to the test VM:
Confirm the attachment succeeded:
The volume should appear with a in-use status. Detach the test volume after confirming:
If the attach fails, check the Persistent Storage Service endpoint status:
All endpoints should be enabled and up. If an endpoint is down, check the volume service logs on the block storage host.
Test Networking Service Connectivity
Verify that the Networking Service is healthy and that VM network connectivity is intact after the upgrade.
Verify Networking Service Endpoints
All agents should show alive: True. If any agent is not alive, check the networking agent status on the affected host:
Test Network and Router Connectivity
List the networks in the region and confirm the expected networks are present:
List routers and confirm they are in
ACTIVEstate:
From a test VM, confirm outbound connectivity (if the VM has network egress):
If any router is not ACTIVE, navigate to Infrastructure > Networking > Routers in the UI and inspect the router details for error messages.
Test VM-to-VM Connectivity
Confirm that VMs on different hosts can communicate across the virtual network:
Identify two running VMs on different hypervisor hosts.
From one VM, ping the private IP address of the other:
If ping fails between VMs on different hosts, check the OVS bridge status on the hypervisor hosts:
Re-Enable VM HA and DRR
Re-enable VM HA and DRR only after:
All hosts in the cluster are running the same operating system version.
All hosts show
Status: okand all roles areapplied.The region health check passed.
Storage and networking tests passed.
Do not re-enable VM HA if any hosts in the cluster are still running a different OS version. VM evacuation between hosts with different KVM versions will fail.
Re-enable VM HA:
Navigate to Infrastructure > Clusters.
Select the cluster.
Toggle VM High Availability to on.
Confirm the cluster shows
ProtectedVM HA status after enabling. If it showsDegradedorNot Protected, hover over the VM HA status to see which prerequisite is not met, and resolve it before relying on VM HA for workload protection.
Re-enable DRR:
Navigate to Infrastructure > Clusters.
Select the cluster.
Toggle Dynamic Resource Rebalancing to on and confirm the frequency setting is correct for your environment.
For pre-conditions and full behavior details, see Virtual Machine High Availability and Dynamic Resource Rebalancing (DRR).
Upgrade Complete
Your Private Cloud Director upgrade is complete once all verification steps above have passed and VM HA and DRR are re-enabled with healthy statuses. If you encounter issues during verification that you cannot resolve, contact Platform9 Support with the relevant pod or service logs. For Self-Hosted deployments, also include the output of airctl status and airctl host-status.
Last updated
Was this helpful?
