Troubleshoot Maintenance Mode Migration Failures

Overview

VM migration failures during maintenance mode are most commonly caused by resource constraints on the destination host, affinity rule conflicts, or CPU model mismatches between the source and destination host. This guide explains how to identify the cause, safely abort or retry a failed migration, and recover any VMs left in an error state.

For how maintenance mode works and how to enable or disable it, see Maintenance Mode.

In this guide, you will diagnose why migrations failed, resolve the blocking condition, and restore stranded VMs to a running state.

Identify Which VMs Failed to Migrate

Open the migration progress panel for the host in maintenance mode:

  1. Navigate to Infrastructure > Cluster Hosts in the Private Cloud Director UI.

  2. Select the host that is in maintenance mode.

  3. Click See Details on the maintenance mode banner to open the View Migration Progress panel.

The panel lists each VM and its migration status. VMs with a Failed status are the ones to investigate.

For each failed VM, note the VM name and check the Compute Service log on the source host for the migration error:

grep -A 5 "<vm-name-or-uuid>" /var/log/pf9/ostackhost.log | tail -40

Common Causes of Migration Failure

Insufficient Resources on Destination Hosts

If no destination host has enough free CPU or memory to accept the VM, the migration fails with a "No valid host was found" error.

What to check:

Review the vCPUs used and RAM used columns for each host in the cluster. If every destination host is near capacity, you must either free up resources (shut down idle VMs) or add another host to the cluster before retrying maintenance mode.

CPU Model Mismatch

Live migration requires that the source and destination host expose compatible CPU models to the VM. If a host in the cluster was recently upgraded and its effective CPU model differs from the source host, live migration will fail.

What to check:

Compare the lists. The selected CPU model for the cluster (visible in nova_override.conf under [libvirt] cpu_models) must appear as usable on both hosts. See Resolve CPU Baseline Mismatch After Host Upgrade for steps to correct a mismatch.

Affinity or Anti-Affinity Rule Conflicts

Maintenance mode honors hard affinity and anti-affinity rules. If a VM has a hard affinity rule requiring it to be co-located with another VM that is also being migrated, and no destination host can satisfy both, the migration fails.

Review the VM's affinity group in the UI: navigate to Compute > VM Affinity Anti-Affinity Rules and confirm which group the VM belongs to. If the conflict cannot be resolved automatically, you may need to temporarily remove the hard rule, migrate the VM manually, and re-apply the rule.

VMs in Error or Unmigratable States

As described in VM States in the Maintenance Mode guide, VMs in Error, Suspended, Shutdown, Rescued, or Pending resize confirmation states are skipped by maintenance mode. VMs in Error state must be recovered first.

To recover a VM in Error state, attempt a hard reboot:

If the hard reboot does not resolve the error, see Recover VMs in ERROR State After Host Reboot or Patching.

Abort Maintenance Mode and Retry

If maintenance mode is partially complete and you want to stop it, disable maintenance mode from the UI:

  1. Navigate to Infrastructure > Cluster Hosts.

  2. Select the host in maintenance mode.

  3. Click Other > Disable Maintenance Mode or use the Disable Maintenance Mode button on the Host Details page.

Disabling maintenance mode marks the host as schedulable again. VMs that were already successfully migrated remain on their destination hosts; they are not migrated back automatically.

After resolving the blocking condition (freeing resources, fixing CPU model, recovering error-state VMs), re-enable maintenance mode to migrate the remaining VMs.

Manually Migrate a Stranded VM

If a specific VM cannot be migrated by maintenance mode (for example, it has a Virtual TPM or an unresolvable affinity constraint), migrate it manually before enabling maintenance mode:

To target a specific destination host:

After the manual migration completes, the VM is no longer on the source host and maintenance mode can proceed without encountering it.

Recover a VM Left in Error State After Migration

If a VM ended up in Error state during maintenance mode migration, recover it with a hard reboot:

If the VM is reporting its hypervisor host as the host in maintenance mode (the source), the VM's record in the management plane may need to be reset. Contact Platform9 Support with the VM UUID and the Compute Service log from the source host.

For a full procedure covering VMs that remain in Error after host maintenance events, see Recover VMs in ERROR State After Host Reboot or Patching.

Last updated

Was this helpful?