Pods are displaying failedattachvolume or failedmount errors

You may run into a problem where the PCD management plane pods are not functioning or running due to PVC mount failures, specifically displaying "FailedAttachVolume" or "FailedMount" errors.

Warning  FailedMount         5h32m (x31004 over 6h27m)  kubelet                  MountVolume.WaitForAttach failed for volume "pvc-[pvc-id]" : volume attachment is being deleted

Warning  FailedAttachVolume  5h30m (x36 over 6h27m)     attachdetach-controller  AttachVolume.Attach failed for volume "pvc-[pvc-id]" : volume attachment is being deleted

Most common causes

  • The storage backend is unreachable.

  • The underlying host does not have sufficient resources to run these pods.

  • CSI Driver itself is not configured correctly or has some errors.

  • Calico network pods are not working as expected.

Steps to Troubleshoot

1

Check pods in init state

Check the number of pods in the Init state to identify any pod stuck in initialization. The failure of PVC mount attachments can cause pods to remain in an Init state.

$ kubectl get pods -A | grep -i "init"
2

Get CSI drivers

Run the following command to list CSI drivers (storage backend).

$ kubectl get csidrivers
3

Verify CSI driver pods are running

Verify if the CSI driver pods are running. The pods can either be in a dedicated namespace or inside the kube-system namespace. In this example the NetApp backend uses the trident namespace to host its storage backend pods.

$ kubectl get pods -n <CSI-driver-namespace>/<kube-system>

E.g.
$ kubectl get pods -n trident
trident           trident-controller-pod    0/6     ContainerCreating   0                 6h44m
trident           trident-node-linux-pod    0/2     CrashLoopBackOff    20 (5h33m ago)    23m
trident           trident-node-linux-pod    0/2     CrashLoopBackOff    15 (5d4h ago)     23d
trident           trident-node-linux-pod    0/2     CrashLoopBackOff    34 (5d3h ago)     23d
4

Inspect Calico pods

As Calico provides pod networking, review all Calico pods and determine why these pods are in a "CrashLoopBackOff/ContainerCreating/OOMKilled/Pending/Error" state; check the events from the describe output.

$ kubectl describe <pod-name> -n <calico-namespace>
5

Check logs for failing pods

Get more information on the failure from the pod logs.

$ kubectl logs <pod-name> -n <CSI-driver-namespace>/<kube-system>
6

When to escalate

If these steps don't resolve the issue, please contact your Backend Storage Provider or reach out to the Platform9 Support Team for additional assistance.

Last updated

Was this helpful?