> For the complete documentation index, see [llms.txt](https://docs.platform9.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.platform9.com/private-cloud-director/storage/troubleshooting-and-log-files/volume-migration-retype-troubleshooting.md).

# Volume Migration and Retype Troubleshooting

## Overview

Volume migration and retype operations move a volume's data from one storage backend to another, or change the volume type while keeping the data in place when the backend supports it. These operations can fail or stall for several reasons, including driver incompatibility, insufficient capacity, NFS share imbalance, or a transient network issue.

This guide explains how to detect a stuck or failed migration, understand backend-specific limitations, and recover a volume that is stuck in `retyping`, `maintenance`, or `error` state.

In this guide, you will diagnose and remediate failed or stuck volume migration and retype operations.

## Prerequisites

* `pcdctl` configured and authenticated against your region.
* Access to the block storage host logs at `/var/log/pf9/cindervolume-base.log`.
* For Self-Hosted deployments: `kubectl` access to the management-plane namespace.

## Understand Migration and Retype Modes

Before troubleshooting, confirm which mode was used:

| Operation                                  | What Happens                                                                                   |
| ------------------------------------------ | ---------------------------------------------------------------------------------------------- |
| **Retype — same backend, driver-assisted** | The driver changes metadata in place; no data copy occurs. Fast.                               |
| **Retype — cross-backend**                 | Full data copy from source to destination backend. Slow; duration proportional to volume size. |
| **Volume migrate**                         | Explicit move to a different host/pool. Always copies data.                                    |

A cross-backend retype or explicit migration places the volume in `retyping` or `maintenance` status during the copy and updates `migration_status`. A driver-assisted same-backend retype completes almost immediately with no intermediate status.

## Detect a Stuck or Failed Migration

### Check Volume Status and Migration Status

```bash
pcdctl volume show <VOLUME_UUID>
```

Key fields to inspect:

* `status` — should be `available` on success; `error` on failure; `retyping` or `maintenance` while in progress.
* `migration_status` — values include `migrating`, `completing`, `error`, `success`, or empty.

A migration that shows `migration_status=error` has failed. A migration that has been in `migrating` status for more than an hour (for a small volume) or proportionally longer for large volumes is likely stuck.

### Review the Storage Service Logs

On the block storage host, search for log entries related to the volume UUID:

```bash
sudo grep <VOLUME_UUID> /var/log/pf9/cindervolume-base.log | grep -E 'ERROR|migrate|retype' | tail -100
```

Common error patterns and their meanings:

| Log Pattern                            | Meaning                                                                          |
| -------------------------------------- | -------------------------------------------------------------------------------- |
| `No valid host was found`              | The destination backend rejected the placement (capacity or capability mismatch) |
| `driver does not support migration`    | The source or destination driver does not implement the migration path           |
| `Timeout waiting for volume migration` | The data copy stalled; often network or NFS issue                                |
| `Volume copy failed`                   | Backend-level copy failure; check the storage array                              |
| `NFS share ... has insufficient space` | NFS destination share lacks capacity for the full volume                         |

{% hint style="info" %}
**Self-Hosted deployments only**

If the `cinder-volume` service runs as a pod, retrieve logs with:

```bash
kubectl logs -n <WORKLOAD_REGION> <CINDER_VOLUME_POD> | grep <VOLUME_UUID>
```

{% endhint %}

## Common Failure Modes

### Incompatible Drivers

Not every driver-pair supports cross-backend migration. The Persistent Storage Service relies on each driver advertising its capabilities. When a driver does not support the migration path requested, the operation fails immediately with `No valid host was found` or a driver capability error.

**Remediation:** Use the generic host-assisted migration path. This copies the volume data through the Persistent Storage Service host rather than delegating to the drivers:

```bash
pcdctl volume migrate --force-host-copy <VOLUME_UUID> <DESTINATION_HOST>
```

`--force-host-copy` bypasses driver-to-driver negotiation and copies the raw volume data block-by-block. It is slower but works across any pair of backends.

### Insufficient Capacity on the Destination Backend

The migration pre-checks may pass but the data copy fails if the destination backend has less available capacity than the volume's allocated size.

**Check destination backend capacity:**

```bash
pcdctl volume service list
pcdctl --os-volume-api-version 3.12 volume backend pool list --detail
```

Look at the `free_capacity_gb` for the destination pool. It must be greater than the volume's `size` value, plus the `reserved_percentage` configured for that backend.

**Remediation:** Either free capacity on the destination backend, reduce `reserved_percentage`, or choose a different destination with sufficient capacity.

### NFS Capacity Imbalance Across Shares

When an NFS-backed backend is configured with multiple NFS shares (for example, multiple NetApp ONTAP NFS exports), the Persistent Storage Service distributes volumes across shares based on available capacity. A migration that targets a backend with uneven share utilization may route the volume to an already-full share.

**Detect share imbalance:**

```bash
pcdctl --os-volume-api-version 3.12 volume backend pool list --detail
```

Each NFS share appears as a separate pool. Compare `free_capacity_gb` across pools. A pool that reports `free_capacity_gb=0` or a very low value will reject new volumes even if total backend capacity is available.

**Remediation options:**

1. Delete or migrate volumes off the overloaded share to rebalance capacity.
2. Add a new NFS share and update the `nfs_shares_config` file on the block storage host, then restart `pf9-cindervolume-base` to make the new share available for placement.

### Backend-Specific Limitations

| Backend                     | Known Limitation                                                                                                                                                            |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **NetApp ONTAP NFS**        | FlexClone-based clone migrations require the destination to be on the same SVM. Cross-SVM migration falls back to file copy.                                                |
| **Tintri NFS**              | The Tintri driver does not support live migration of `in-use` volumes. The volume must be detached before retyping.                                                         |
| **Pure Storage**            | Volume copy is performed natively on the array; requires both source and destination volumes to be visible to the same Pure array. Cross-array migration uses generic copy. |
| **iSCSI / FC SAN backends** | Cross-backend migration to an NFS backend (or vice versa) always uses generic host-assisted copy.                                                                           |

## Recover a Volume Stuck in `retyping` or `maintenance`

If a migration fails midway, the volume may be left in `retyping` or `maintenance` status with `migration_status=error`. In this state the volume is locked and cannot be used or deleted.

### Step 1 — Confirm the Migration Has Truly Failed

Check the log for a definitive error (not just a timeout). Wait at least 30 minutes for large volumes before concluding the migration is stuck rather than slow.

```bash
sudo grep <VOLUME_UUID> /var/log/pf9/cindervolume-base.log | tail -20
```

### Step 2 — Clean Up the Temporary Migration Volume

During a cross-backend migration, the Persistent Storage Service creates a temporary volume on the destination backend. If the migration fails, this temporary volume may be left behind. Find and delete it:

```bash
# Temporary volumes are named with the source volume UUID embedded
pcdctl volume list --all-projects | grep <VOLUME_UUID>
```

Delete any volume whose name contains the pattern `migration-<VOLUME_UUID>`:

```bash
pcdctl volume delete <TEMP_VOLUME_UUID>
```

### Step 3 — Reset the Volume State

After the temporary volume is removed, reset the source volume's state to `available`:

```bash
pcdctl volume set --state available <VOLUME_UUID>
```

Also reset the `migration_status` field:

```bash
pcdctl volume set --state available --reset-migration-status <VOLUME_UUID>
```

{% hint style="warning" %}
**State reset is a soft operation**

Resetting state does not undo any partial data copy. If data was partially written to the destination, ensure the temporary volume is deleted before resetting. If you are unsure whether the volume data is consistent, take a snapshot before retrying the migration.
{% endhint %}

### Step 4 — Retry the Migration

After confirming the volume is in `available` state, address the root cause identified in the logs (capacity, driver compatibility, NFS share space), then retry:

```bash
pcdctl volume retype --migration-policy on-demand <VOLUME_UUID> <DESTINATION_VOLUME_TYPE>
```

Or for an explicit host-level migration:

```bash
pcdctl volume migrate <VOLUME_UUID> <DESTINATION_HOST>
```

## Next Steps

* Review [Volume Retype and Migration](/private-cloud-director/storage/volume.md#volume-retype---changing-volume-type--migration-across-backends) for conceptual background and supported scenarios.
* Review [Storage Live Migration](/private-cloud-director/storage/storage-live-migration.md) for live-migration of `in-use` volumes.
* For backend scheduler placement decisions that affect which pool a retype targets, see [Persistent Storage Service Backend Selection and Tuning](/private-cloud-director/storage/troubleshooting-and-log-files/backend-selection-and-tuning.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.platform9.com/private-cloud-director/storage/troubleshooting-and-log-files/volume-migration-retype-troubleshooting.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
