# Troubleshooting- Useful Kubernetes Commands

While the use of kubectl and its subcommands are too broad for this guide, there are several commands which are very useful and are recommended for use in debugging.

## Distressed Pods

This is the Platform9 nomenclature on what pods to look for when users are debugging issues in a large cluster. The primary command used in our experience is to locate the Pods and specifically finds if any of the pods are not in a "Running" state. This is the simplest and the most effective command to execute to get an overview glimpse of issues affecting the pods. The output of this command can delay depending on the number of pods running.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl --kubeconfig ./k get pods -A | grep -v -i running | grep -v -i completed | grep -v -i terminating
 kubectl --kubeconfig ./k get pods -A | grep -v -i running | grep -v -i completed | grep -v -i terminating
NAMESPACE                       NAME                                                  READY   STATUS              RESTARTS   AGE
argus-n2-310                kplane-usermgr-5c774f85cc-gbgcd                       1/2     ImagePullBackOff    0          7h47m
cert-manager                    cert-manager-cainjector-6886449cf8-szgts              0/1     CrashLoopBackOff    69         5h32m
du-argus-ab-368          clarity-6f5fb449b8-7x29c                              4/5     ImagePullBackOff    0          7m
```

{% endtab %}
{% endtabs %}

In the example above regarding **argus-n2-310,** we can see there are issues with imagepull and the cert-manager having problems. Further debugging would be required, and that can be done in the following few ways:

## Events

This is one of the most underappreciated tools available to Kubernetes operators. The *kubectl get events* command provides users a unique glimpse into what is going on within a kubernetes cluster, or even inside a given pod. The following examples illustrate this rather well.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl --kubeconfig ./k get events -n cert-manager

LAST SEEN   TYPE      REASON      OBJECT                                         MESSAGE
3m33s       Warning   BackOff     pod/cert-manager-cainjector-6886449cf8-szgts   Back-off restarting failed container

104s        Warning   Unhealthy   pod/cert-manager-webhook-c677f4f7-c8brx        Readiness probe failed: HTTP probe failed with statuscode: 500
```

{% endtab %}
{% endtabs %}

In the example above, it is clear that the *pod/cert-manager-webhook-xxx* is having issues with the readiness probe, which should be debugged further. You can also run the *get events* on the whole cluster using the *-A* flag. piped out to the *more* command.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl --kubeconfig ./k get events -A | more
NAMESPACE                       LAST SEEN   TYPE      REASON                                                                                                   OBJECT                                                    MESSAGE
argus-mithil-310                19m         Normal    BackOff                                                                                                  pod/kplane-usermgr-5c774f85cc-gbgcd                       Back-off pulling image "514845858982.dkr.ecr.us-west-1.amazonaws.com/kplane-usermgr:5.4.0-1335"
argus-mithil-310                4m36s       Warning   Failed                                                                                                   pod/kplane-usermgr-5c774f85cc-gbgcd                       Error: ImagePullBackOff
cert-manager                    6m3s        Warning   BackOff                                                                                                  pod/cert-manager-cainjector-6886449cf8-szgts              Back-off restarting failed container
```

{% endtab %}
{% endtabs %}

## Pod Logs

We also recommend using a log aggregation system for you pods. But if this is unworkable, you can' review the Pod logs, which will provide you valuable insights.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl --kubeconfig ./k -n cert-manager logs cert-manager-webhook-c677f4f7-c8brx | more
E1019 05:10:12.492522       1 dynamic_source.go:88] cert-manager/webhook "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input"  "interval"=1000000000
E1019 05:10:13.493607       1 dynamic_source.go:88] cert-manager/webhook "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input"  "interval"=1000000000
```

{% endtab %}
{% endtabs %}

## Miscellaneous

The other tool we find very useful is the check all nodes and their associated pods.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl --kubeconfig ./k get nodes -o wide
NAME                                       STATUS                        ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
ip-10-0-1-116.us-west-2.compute.internal   Ready                         worker   2d12h   v1.20.5   10.0.1.116    34.222.12.206    Ubuntu 18.04.3 LTS   4.15.0-1054-aws   docker://19.3.11
ip-10-0-1-120.us-west-2.compute.internal   Ready                         worker   17h     v1.20.5   10.0.1.120    35.86.121.102    Ubuntu 18.04.3 LTS   4.15.0-1054-aws   docker://19.3.11
```

{% endtab %}
{% endtabs %}

Or, find which pod belongs to what node.

{% tabs %}
{% tab title="Bash" %}

```bash
kubectl --kubeconfig ./k get pods -o wide -n cert-manager
NAME                                       READY   STATUS             RESTARTS   AGE     IP             NODE                                       NOMINATED NODE   READINESS GATES
cert-manager-84b96f99c7-v6pst              1/1     Running            0          6h      10.20.40.91    ip-10-0-2-94.us-west-2.compute.internal    <none>           <none>
cert-manager-cainjector-6886449cf8-szgts   0/1     CrashLoopBackOff   73         5h50m   10.20.90.34    ip-10-0-1-120.us-west-2.compute.internal   <none>           <none>
cert-manager-webhook-c677f4f7-c8brx        0/1     Running            0          7h25m   10.20.90.239   ip-10-0-1-120.us-west-2.compute.internal   <none>           <none>
```

{% endtab %}
{% endtabs %}
