Volume Migration and Retype Troubleshooting
Overview
Volume migration and retype operations move a volume's data from one storage backend to another, or change the volume type while keeping the data in place when the backend supports it. These operations can fail or stall for several reasons, including driver incompatibility, insufficient capacity, NFS share imbalance, or a transient network issue.
This guide explains how to detect a stuck or failed migration, understand backend-specific limitations, and recover a volume that is stuck in retyping, maintenance, or error state.
In this guide, you will diagnose and remediate failed or stuck volume migration and retype operations.
Prerequisites
pcdctlconfigured and authenticated against your region.Access to the block storage host logs at
/var/log/pf9/cindervolume-base.log.For Self-Hosted deployments:
kubectlaccess to the management-plane namespace.
Understand Migration and Retype Modes
Before troubleshooting, confirm which mode was used:
Retype — same backend, driver-assisted
The driver changes metadata in place; no data copy occurs. Fast.
Retype — cross-backend
Full data copy from source to destination backend. Slow; duration proportional to volume size.
Volume migrate
Explicit move to a different host/pool. Always copies data.
A cross-backend retype or explicit migration places the volume in retyping or maintenance status during the copy and updates migration_status. A driver-assisted same-backend retype completes almost immediately with no intermediate status.
Detect a Stuck or Failed Migration
Check Volume Status and Migration Status
Key fields to inspect:
status— should beavailableon success;erroron failure;retypingormaintenancewhile in progress.migration_status— values includemigrating,completing,error,success, or empty.
A migration that shows migration_status=error has failed. A migration that has been in migrating status for more than an hour (for a small volume) or proportionally longer for large volumes is likely stuck.
Review the Storage Service Logs
On the block storage host, search for log entries related to the volume UUID:
Common error patterns and their meanings:
No valid host was found
The destination backend rejected the placement (capacity or capability mismatch)
driver does not support migration
The source or destination driver does not implement the migration path
Timeout waiting for volume migration
The data copy stalled; often network or NFS issue
Volume copy failed
Backend-level copy failure; check the storage array
NFS share ... has insufficient space
NFS destination share lacks capacity for the full volume
Self-Hosted deployments only
If the cinder-volume service runs as a pod, retrieve logs with:
Common Failure Modes
Incompatible Drivers
Not every driver-pair supports cross-backend migration. The Persistent Storage Service relies on each driver advertising its capabilities. When a driver does not support the migration path requested, the operation fails immediately with No valid host was found or a driver capability error.
Remediation: Use the generic host-assisted migration path. This copies the volume data through the Persistent Storage Service host rather than delegating to the drivers:
--force-host-copy bypasses driver-to-driver negotiation and copies the raw volume data block-by-block. It is slower but works across any pair of backends.
Insufficient Capacity on the Destination Backend
The migration pre-checks may pass but the data copy fails if the destination backend has less available capacity than the volume's allocated size.
Check destination backend capacity:
Look at the free_capacity_gb for the destination pool. It must be greater than the volume's size value, plus the reserved_percentage configured for that backend.
Remediation: Either free capacity on the destination backend, reduce reserved_percentage, or choose a different destination with sufficient capacity.
NFS Capacity Imbalance Across Shares
When an NFS-backed backend is configured with multiple NFS shares (for example, multiple NetApp ONTAP NFS exports), the Persistent Storage Service distributes volumes across shares based on available capacity. A migration that targets a backend with uneven share utilization may route the volume to an already-full share.
Detect share imbalance:
Each NFS share appears as a separate pool. Compare free_capacity_gb across pools. A pool that reports free_capacity_gb=0 or a very low value will reject new volumes even if total backend capacity is available.
Remediation options:
Delete or migrate volumes off the overloaded share to rebalance capacity.
Add a new NFS share and update the
nfs_shares_configfile on the block storage host, then restartpf9-cindervolume-baseto make the new share available for placement.
Backend-Specific Limitations
NetApp ONTAP NFS
FlexClone-based clone migrations require the destination to be on the same SVM. Cross-SVM migration falls back to file copy.
Tintri NFS
The Tintri driver does not support live migration of in-use volumes. The volume must be detached before retyping.
Pure Storage
Volume copy is performed natively on the array; requires both source and destination volumes to be visible to the same Pure array. Cross-array migration uses generic copy.
iSCSI / FC SAN backends
Cross-backend migration to an NFS backend (or vice versa) always uses generic host-assisted copy.
Recover a Volume Stuck in retyping or maintenance
retyping or maintenanceIf a migration fails midway, the volume may be left in retyping or maintenance status with migration_status=error. In this state the volume is locked and cannot be used or deleted.
Step 1 — Confirm the Migration Has Truly Failed
Check the log for a definitive error (not just a timeout). Wait at least 30 minutes for large volumes before concluding the migration is stuck rather than slow.
Step 2 — Clean Up the Temporary Migration Volume
During a cross-backend migration, the Persistent Storage Service creates a temporary volume on the destination backend. If the migration fails, this temporary volume may be left behind. Find and delete it:
Delete any volume whose name contains the pattern migration-<VOLUME_UUID>:
Step 3 — Reset the Volume State
After the temporary volume is removed, reset the source volume's state to available:
Also reset the migration_status field:
State reset is a soft operation
Resetting state does not undo any partial data copy. If data was partially written to the destination, ensure the temporary volume is deleted before resetting. If you are unsure whether the volume data is consistent, take a snapshot before retrying the migration.
Step 4 — Retry the Migration
After confirming the volume is in available state, address the root cause identified in the logs (capacity, driver compatibility, NFS share space), then retry:
Or for an explicit host-level migration:
Next Steps
Review Volume Retype and Migration for conceptual background and supported scenarios.
Review Storage Live Migration for live-migration of
in-usevolumes.For backend scheduler placement decisions that affect which pool a retype targets, see Persistent Storage Service Backend Selection and Tuning.
Last updated
Was this helpful?
