> For the complete documentation index, see [llms.txt](https://docs.platform9.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.platform9.com/private-cloud-director/storage/troubleshooting-and-log-files/volume-attach-detach-troubleshooting.md).

# Volume Attach / Detach Troubleshooting

## Overview

Volume attach and detach operations involve coordinated work between the Compute Service and the Persistent Storage Service. When either operation stalls or fails, a volume can become stuck in a state such as `detaching`, `attaching`, or `reserved`, blocking further operations on the volume and sometimes on the virtual machine.

This guide covers the most common root causes, how to detect each one, and the safe steps to recover a stuck volume without data loss.

In this guide, you will diagnose and recover volumes that are stuck in attach or detach states.

## Prerequisites

* `pcdctl` configured and authenticated against your region.
* Access to the host running the affected VM (for `systemctl` and `virsh` commands).
* For Self-Hosted deployments: `kubectl` access to the management-plane namespace.

## Common Causes

| Symptom                     | Typical Root Cause                                                              |
| --------------------------- | ------------------------------------------------------------------------------- |
| Volume stuck in `detaching` | Compute Service failed to complete the disconnect; volume record not cleaned up |
| Volume stuck in `attaching` | Transport-level failure (iSCSI/NFS) during the LUN-mapping step                 |
| Volume stuck in `reserved`  | Compute Service crashed between reserving and completing the attachment         |
| Stale attachment record     | VM was deleted or migrated without detaching volumes first                      |
| Nova BDM inconsistency      | Block device mapping record left over after a failed live migration             |

## Detect a Stuck Volume

### Check Volume Status

```bash
pcdctl volume show <VOLUME_UUID>
```

Look at the `status` field. A volume in a healthy workflow should pass through intermediate states (`attaching`, `detaching`, `reserved`) in under two minutes. A volume that stays in one of those states for longer than five minutes is stuck.

To list all volumes in a potentially stuck state at once:

```bash
pcdctl volume list --all-projects | grep -E 'detaching|attaching|reserved|maintenance'
```

### Identify the Attached VM and Host

```bash
pcdctl volume show <VOLUME_UUID> -f value -c attachments
```

The output includes the `server_id` (VM UUID) and the `host_name` of the compute node. Record both — you need them for subsequent steps.

### Check the Attachment Record on the Compute Host

SSH to the compute host identified above and inspect the block device mapping:

```bash
sudo virsh domblklist <VM_UUID>
```

If the volume device appears in `virsh` output, the hypervisor still considers it attached. If `virsh` does not list it but the Persistent Storage Service shows `in-use`, the records are inconsistent.

## Recover a Volume Stuck in `detaching`

A volume stuck in `detaching` most often means the Compute Service sent a detach request but did not receive confirmation from the storage host. Follow these steps in order.

### Step 1 — Verify the Compute Service Is Healthy

On the affected compute host:

```bash
sudo systemctl status pf9-ostackhost
```

If the service is not running, restart it and wait up to two minutes for in-flight operations to complete:

```bash
sudo systemctl restart pf9-ostackhost
```

Check whether the volume transitions to `available` on its own within two minutes:

```bash
watch -n 10 pcdctl volume show <VOLUME_UUID> -f value -c status
```

### Step 2 — Verify the Persistent Storage Service Is Healthy

On the block storage host:

```bash
sudo systemctl status pf9-cindervolume-base
```

Review logs for errors related to the volume UUID:

```bash
sudo grep <VOLUME_UUID> /var/log/pf9/cindervolume-base.log | tail -50
```

{% hint style="info" %}
**Self-Hosted deployments only**

If the `cinder-volume` service runs as a pod rather than a systemd unit, check its status with:

```bash
kubectl get pods -n <WORKLOAD_REGION> | grep cinder-volume
kubectl logs -n <WORKLOAD_REGION> <CINDER_VOLUME_POD> | grep <VOLUME_UUID>
```

{% endhint %}

### Step 3 — Force-Reset the Volume State

If the Persistent Storage Service is healthy but the volume remains stuck, force-reset its state to `available`. This is safe only after you have confirmed that the volume is not genuinely connected to a running VM (see the `virsh domblklist` check above).

```bash
pcdctl volume set --state available <VOLUME_UUID>
```

{% hint style="warning" %}
**Before resetting state**

Confirm that `virsh domblklist <VM_UUID>` no longer lists the volume device. Resetting state on a volume that is still presented to a running VM can cause data corruption.
{% endhint %}

### Step 4 — Clean Up Stale Attachment Records

If a stale attachment record remains after the state reset, delete it:

```bash
# List attachment IDs for the volume
pcdctl volume attachment list --volume-id <VOLUME_UUID>

# Delete the stale attachment
pcdctl volume attachment delete <ATTACHMENT_UUID>
```

### Step 5 — Clean Up LUN Mappings on the Storage Backend

For SAN-backed volumes (iSCSI or Fibre Channel), the storage array may retain a LUN mapping even after the software-level detach completes. Check this from the storage management interface of your backend:

* **NetApp ONTAP**: verify that no igroup mapping exists for the host WWN / IQN that was using the volume.
* **Pure Storage**: verify that the host connection is removed in the Pure array management interface.
* **Hitachi VSP**: verify that the host group mapping no longer includes the LUN.

If a stale LUN mapping exists, remove it using your storage array's management tools. A stale mapping does not prevent the volume from being reused by a different VM, but it consumes resources on the storage array and can cause confusion during future attach operations.

## Recover a Volume with Nova BDM Inconsistency

A Nova Block Device Mapping (BDM) inconsistency can occur when a live migration fails partway through and the migration rollback does not fully clean up. The volume appears `in-use` but is not attached to any running VM.

### Detect the Inconsistency

```bash
# Show the VM the volume claims to be attached to
pcdctl volume show <VOLUME_UUID> -f value -c attachments

# Verify that VM exists and is in a stable state
pcdctl server show <VM_UUID>
```

If the VM does not exist, or is in `ERROR` state, the BDM record is stale.

### Clean Up the Inconsistency

1. Reset the volume state to `available`:

   ```bash
   pcdctl volume set --state available <VOLUME_UUID>
   ```
2. Delete stale attachment records:

   ```bash
   pcdctl volume attachment list --volume-id <VOLUME_UUID>
   pcdctl volume attachment delete <ATTACHMENT_UUID>
   ```
3. If the VM is in `ERROR` state and you need to recover it, rebuild or delete it through the Compute Service. The volume will remain `available` and can be reattached to a new or recovered VM.

## Orphaned Attachments After Tenant or VM Deletion

When a tenant is deleted while volumes are still attached to VMs, or when VMs are force-deleted without detaching volumes first, orphaned attachment records can accumulate. These records cause subsequent volume operations to fail with "volume already in use" errors even though no active VM is using the volume.

To find orphaned attachments:

```bash
# List all attachments; cross-reference against running VMs
pcdctl volume attachment list --volume-id <VOLUME_UUID>
pcdctl server list --all-projects | grep <VM_UUID>
```

If the VM UUID from the attachment does not appear in the server list, the attachment is orphaned. Delete it:

```bash
pcdctl volume attachment delete <ATTACHMENT_UUID>
pcdctl volume set --state available <VOLUME_UUID>
```

## Next Steps

* Review [Volume State](/private-cloud-director/storage/volume/volume-state.md) for a full list of volume status values and their meanings.
* If the volume was stuck because of a failed live migration, see [Storage Live Migration](/private-cloud-director/storage/storage-live-migration.md) for live-migration prerequisites and known limitations.
* For persistent or recurring attach/detach failures, see [Troubleshooting Cinder Issues](/private-cloud-director/storage/troubleshooting-and-log-files/troubleshooting-cinder-issues.md) for service-level diagnostics.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.platform9.com/private-cloud-director/storage/troubleshooting-and-log-files/volume-attach-detach-troubleshooting.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
