Volume Attach / Detach Troubleshooting

Overview

Volume attach and detach operations involve coordinated work between the Compute Service and the Persistent Storage Service. When either operation stalls or fails, a volume can become stuck in a state such as detaching, attaching, or reserved, blocking further operations on the volume and sometimes on the virtual machine.

This guide covers the most common root causes, how to detect each one, and the safe steps to recover a stuck volume without data loss.

In this guide, you will diagnose and recover volumes that are stuck in attach or detach states.

Prerequisites

  • pcdctl configured and authenticated against your region.

  • Access to the host running the affected VM (for systemctl and virsh commands).

  • For Self-Hosted deployments: kubectl access to the management-plane namespace.

Common Causes

Symptom
Typical Root Cause

Volume stuck in detaching

Compute Service failed to complete the disconnect; volume record not cleaned up

Volume stuck in attaching

Transport-level failure (iSCSI/NFS) during the LUN-mapping step

Volume stuck in reserved

Compute Service crashed between reserving and completing the attachment

Stale attachment record

VM was deleted or migrated without detaching volumes first

Nova BDM inconsistency

Block device mapping record left over after a failed live migration

Detect a Stuck Volume

Check Volume Status

Look at the status field. A volume in a healthy workflow should pass through intermediate states (attaching, detaching, reserved) in under two minutes. A volume that stays in one of those states for longer than five minutes is stuck.

To list all volumes in a potentially stuck state at once:

Identify the Attached VM and Host

The output includes the server_id (VM UUID) and the host_name of the compute node. Record both — you need them for subsequent steps.

Check the Attachment Record on the Compute Host

SSH to the compute host identified above and inspect the block device mapping:

If the volume device appears in virsh output, the hypervisor still considers it attached. If virsh does not list it but the Persistent Storage Service shows in-use, the records are inconsistent.

Recover a Volume Stuck in detaching

A volume stuck in detaching most often means the Compute Service sent a detach request but did not receive confirmation from the storage host. Follow these steps in order.

Step 1 — Verify the Compute Service Is Healthy

On the affected compute host:

If the service is not running, restart it and wait up to two minutes for in-flight operations to complete:

Check whether the volume transitions to available on its own within two minutes:

Step 2 — Verify the Persistent Storage Service Is Healthy

On the block storage host:

Review logs for errors related to the volume UUID:

Self-Hosted deployments only

If the cinder-volume service runs as a pod rather than a systemd unit, check its status with:

Step 3 — Force-Reset the Volume State

If the Persistent Storage Service is healthy but the volume remains stuck, force-reset its state to available. This is safe only after you have confirmed that the volume is not genuinely connected to a running VM (see the virsh domblklist check above).

Step 4 — Clean Up Stale Attachment Records

If a stale attachment record remains after the state reset, delete it:

Step 5 — Clean Up LUN Mappings on the Storage Backend

For SAN-backed volumes (iSCSI or Fibre Channel), the storage array may retain a LUN mapping even after the software-level detach completes. Check this from the storage management interface of your backend:

  • NetApp ONTAP: verify that no igroup mapping exists for the host WWN / IQN that was using the volume.

  • Pure Storage: verify that the host connection is removed in the Pure array management interface.

  • Hitachi VSP: verify that the host group mapping no longer includes the LUN.

If a stale LUN mapping exists, remove it using your storage array's management tools. A stale mapping does not prevent the volume from being reused by a different VM, but it consumes resources on the storage array and can cause confusion during future attach operations.

Recover a Volume with Nova BDM Inconsistency

A Nova Block Device Mapping (BDM) inconsistency can occur when a live migration fails partway through and the migration rollback does not fully clean up. The volume appears in-use but is not attached to any running VM.

Detect the Inconsistency

If the VM does not exist, or is in ERROR state, the BDM record is stale.

Clean Up the Inconsistency

  1. Reset the volume state to available:

  2. Delete stale attachment records:

  3. If the VM is in ERROR state and you need to recover it, rebuild or delete it through the Compute Service. The volume will remain available and can be reattached to a new or recovered VM.

Orphaned Attachments After Tenant or VM Deletion

When a tenant is deleted while volumes are still attached to VMs, or when VMs are force-deleted without detaching volumes first, orphaned attachment records can accumulate. These records cause subsequent volume operations to fail with "volume already in use" errors even though no active VM is using the volume.

To find orphaned attachments:

If the VM UUID from the attachment does not appear in the server list, the attachment is orphaned. Delete it:

Next Steps

  • Review Volume State for a full list of volume status values and their meanings.

  • If the volume was stuck because of a failed live migration, see Storage Live Migration for live-migration prerequisites and known limitations.

  • For persistent or recurring attach/detach failures, see Troubleshooting Cinder Issues for service-level diagnostics.

Last updated

Was this helpful?