Troubleshooting- Useful Kubernetes Commands

While the use of kubectl and its subcommands are too broad for this guide, there are several commands which are very useful and are recommended for use in debugging.

Distressed Pods

This is the Platform9 nomenclature on what pods to look for when users are debugging issues in a large cluster. The primary command used in our experience is to locate the Pods and specifically finds if any of the pods are not in a "Running" state. This is the simplest and the most effective command to execute to get an overview glimpse of issues affecting the pods. The output of this command can delay depending on the number of pods running.

kubectl --kubeconfig ./k get pods -A | grep -v -i running | grep -v -i completed | grep -v -i terminating
 kubectl --kubeconfig ./k get pods -A | grep -v -i running | grep -v -i completed | grep -v -i terminating
NAMESPACE                       NAME                                                  READY   STATUS              RESTARTS   AGE
argus-n2-310                kplane-usermgr-5c774f85cc-gbgcd                       1/2     ImagePullBackOff    0          7h47m
cert-manager                    cert-manager-cainjector-6886449cf8-szgts              0/1     CrashLoopBackOff    69         5h32m
du-argus-ab-368          clarity-6f5fb449b8-7x29c                              4/5     ImagePullBackOff    0          7m

In the example above regarding argus-n2-310, we can see there are issues with imagepull and the cert-manager having problems. Further debugging would be required, and that can be done in the following few ways:

Events

This is one of the most underappreciated tools available to Kubernetes operators. The kubectl get events command provides users a unique glimpse into what is going on within a kubernetes cluster, or even inside a given pod. The following examples illustrate this rather well.

kubectl --kubeconfig ./k get events -n cert-manager

LAST SEEN   TYPE      REASON      OBJECT                                         MESSAGE
3m33s       Warning   BackOff     pod/cert-manager-cainjector-6886449cf8-szgts   Back-off restarting failed container

104s        Warning   Unhealthy   pod/cert-manager-webhook-c677f4f7-c8brx        Readiness probe failed: HTTP probe failed with statuscode: 500

In the example above, it is clear that the pod/cert-manager-webhook-xxx is having issues with the readiness probe, which should be debugged further. You can also run the get events on the whole cluster using the -A flag. piped out to the more command.

Pod Logs

We also recommend using a log aggregation system for you pods. But if this is unworkable, you can' review the Pod logs, which will provide you valuable insights.

Miscellaneous

The other tool we find very useful is the check all nodes and their associated pods.

Or, find which pod belongs to what node.

Last updated

Was this helpful?