# Nodelet

### What is a Nodelet?

A nodelet is a software agent that is installed and run on each node as a component of the Platform9 Managed Kubernetes (PMK) stack within a cluster. The nodelet agent provides multiple functions on both the Primary/Master and the worker nodes. This includes the installation and configuration of multiple Kubernetes services including etcd, containerd, Docker, networking, webhooks, and various other components.

### Nodelet Phases

{% stepper %}
{% step %}

#### Generate Certificates

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Generates prerequisites checks needed to install various certificates.
{% endstep %}

{% step %}

#### Prepare Kubeconfigs

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Customizes the kubeconfigs needed to start the Kubernetes cluster.
{% endstep %}

{% step %}

#### Docker Configure

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Installs and configures docker and containerd.
{% endstep %}

{% step %}

#### Docker Start

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Installs and verifies running docker containers.
{% endstep %}

{% step %}

#### Etcd Configure

**Role**: <mark style="color:$warning;">`Master`</mark>

Verifies, configures, and runs etcd on the primary host server's file system.
{% endstep %}

{% step %}

#### Etcd Run

**Role**: <mark style="color:$warning;">`Master`</mark>

Starts and confirms the etcd service is running on the container.
{% endstep %}

{% step %}

#### Network Configure

**Role**: <mark style="color:$warning;">`Master`</mark>

Ensures that the Classless Inter-Domain Routing (or CIDR) configuration for flannel is up-to-date (It does not target other network plugins like Calico, Canal, or Weave).
{% endstep %}

{% step %}

#### CNI Configure

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Configures the Container Network Interface (CNI).
{% endstep %}

{% step %}

#### Auth Webhook

**Role**: <mark style="color:$warning;">`Master`</mark>

Uses bouncer as a simple webhook endpoint server to validate/authenticate images created within the Kubernetes cluster (specifically, the admission controllers *GenericAdmissionWebhook* and the *ValidatingAdmissionWebhook*).
{% endstep %}

{% step %}

#### Misc Scripts

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Responsible for composing the cloud provider config on the file systems of all nodes.
{% endstep %}

{% step %}

#### Kubelet Configure/Start

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Starts and manages the proper configurations on Kubelets.
{% endstep %}

{% step %}

#### Kube Proxy Start

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Starts and configures the kube-proxy service.
{% endstep %}

{% step %}

#### Wait for K8s Services

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Starts and pauses various K8s services to ensure availability.
{% endstep %}

{% step %}

#### Label and Taint Node

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Designates “master” or “worker” nodes. Additionally, taints workloads not allowed on master.
{% endstep %}

{% step %}

#### Uncordon Node

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

Marks nodes as schedulable using the kubectl `uncordon node` command.
{% endstep %}

{% step %}

#### Deploy App Catalog

**Role**: <mark style="color:$warning;">`Master`</mark>

Configures and deploys the *Monocular* and *Tiller* services.
{% endstep %}

{% step %}

#### Configure/Start Keepalived

**Role**: <mark style="color:$warning;">`Master`</mark>

Configures and starts the *keepalived* service.
{% endstep %}

{% step %}

#### Deploy Luigi Operator

**Role**: <mark style="color:$warning;">`Master`</mark>

Activates the [Luigi operator](https://platform9.com/kubernetes-docs/5.14/kubernetes/luigi-network-operator-quickstart)
{% endstep %}

{% step %}

#### Enable PF9 Sentry

**Role**: <mark style="color:$warning;">`Master`</mark>

Initiates and configures the *pf9-sentry* service within the platform9-system namespace.
{% endstep %}

{% step %}

#### Enable PF9 Add-on Operator

**Role**: <mark style="color:$warning;">`Master`</mark>

Starts and configures the `pf9-addon-operator` service within the pf9-addons namespace.
{% endstep %}

{% step %}

#### Drain All Pods (Stop Only)

**Role**: <mark style="color:$warning;">`Master`</mark> <mark style="color:blue;">`Worker`</mark>

If invoked, this task drains the node before implementing a stop function on other tasks. When the `_pf9-kube_` service begins draining the node, it executes a priority stop function. This ensures the task is prioritized over the stop function of other tasks.
{% endstep %}
{% endstepper %}

### CLI

#### Advantages of Using the CLI

Because a CLI does not utilize a graphical user interface (or GUI), it is oftentimes overshadowed by the more user-friendly, visual-based interfaces that a mouse and keyboard affords. What is not apparent is that behind the GUI are many of the same commands which drive the functionality of a program. The strength of the CLI is speed, efficiency, and customization with decreased memory consumption. In addition, It allows for experienced users to create scripts to automate repetitive tasks as well as chain command together to achieve a greater level of customization and capabilities than when using a single mouse click.

Many new users express the steeper learning curve as the primary downside of using the CLI. Additionally, there is less room for error, and understanding the large number of command options available to utilize can be daunting. New users can be stymied when trying to remember a command, its syntax, and the available flags and options it affords. Some relief is granted via the use of quick reference guides that are widely available. Users will often find that the ongoing usage of the CLI will increase productivity over time.

{% hint style="warning" %}
Caution should be exercised when running commands as the root user. Running an errant or malformed command can cause severe issues and damage the system, up to and including needing a full system restore. The only time clients should run commands as a root user are when configuring the underlying file system.

Best practice dictates creating secondary user(s) with limited permission sets. Additionally, backup copies of files or folder should be made before editing any important system configuration files or folders.
{% endhint %}

The following section specifies the `nodeletd phase` related commands used to interact with the k8s stack via the CLI.

### Nodelet CLI Syntax

```bash
_opt_pf9_nodelet_nodeletd phases [command]
```

#### Phases Help

The *help* flag defines the list of available options when running the `nodelet phases` command.

```bash
_opt_pf9_nodelet_nodeletd phases --help
Commands related to phases related to bring up of k8s stack

Usage:
  nodeletd phases [command]

Available Commands:

  list    Lists the phases and their index numbers to use with rest of commands

  restart Restarts pf9 kube stack. Takes optional --phase param to allow restarting from the specific phase

  start   Starts pf9 kube stack. Takes optional --phase param to allow starting from the specific phase

  status  Checks the status of Platform9 Kube on this host. Takes optional --phase param to check the status of a specific phase

  stop    Stops pf9 kube stack. Takes optional --phase param to allow stopping till the specific phase

Flags:
  -h, --help   help for phases

Use "nodeletd phases [command] --help" for more information about a command.
```

#### Phases List

This command lists the available phases by passing the list option.

```bash
_opt_pf9_nodelet_nodeletd phases list
INDEX NUMBER  FILE                             NAME                                         STATUS CHECK
1             020-gen_certs.sh                 Generate certs _ Send signing request to CA  true
2             030-prepare_kube_configs.sh      Prepare configuration                        false
3             040-docker_configure.sh          Configure Docker                             false
4             045-docker_start.sh              Start Docker																	true
```

#### Phases Stop

This command stops the `pf9-kube` stack.

```bash
_opt_pf9_nodelet_nodeletd phases stop
```

#### Phases Start

This command starts the `pf9-kube` stack.

```bash
_opt_pf9_nodelet_nodeletd phases start
```

#### Phases Restart

This command restarts the `pf9-kube` stack.

```bash
_opt_pf9_nodelet_nodeletd phases restart
```

#### Phases Status

The *verbose* flag provides information on the condition and state of the `pf9-kube` stack.

```bash
_opt_pf9_nodelet_nodeletd phases status --verbose
INDEX NUMBER  FILE                             NAME                                         PHASE STATUS
  1           020-gen_certs.sh                 Generate certs _ Send signing request to CA  running
  2           030-prepare_kube_configs.sh      Prepare configuration                        N_A
  3           040-docker_configure.sh          Configure Docker                             N_A
  4           045-docker_start.sh              Start Docker							 									  N_A
```

{% hint style="info" %}
Note: CLI output will contain info about the various phases that run before the following table is displayed. This info is also contained in the */var/log/pf9/kube/kube.log* file.
{% endhint %}

### Node Health

The curl command below provides an overview of the health of the specific node. The `$TOKEN` refers to a temporary authentication token utilized to verify the service user, which removes the need for an interactive authentication method. The `DU` reference is in regard to the deployment unit that operates the platform9's server-side components. The `$UUID` is the universal unique identifier for an object in the cluster. A sample output of the command is shown below.

```bash
curl -H "X-Auth-Token: $TOKEN" -H "Content-Type: application_json" https:__$DU_resmgr_v1_hosts | jq '.[] | select(.extensions.pf9_kube_status.data.pf9_cluster_id |contains("'$UUID'")) | .extensions.pf9_kube_status.data.pf9_kube_node_state'
```

{% code title="Sample Output" %}

```json
{
  "pf9_kube_start_attempt": 0, __ Number of start attempts till now
  "last_failed_status_check": "", __*
  "pf9_cluster_role": "master",
  "last_failed_task": "", __ The task_phase script that failed on (pf9_kube_start_attempt-1) attempt
  "all_tasks": [
    "Generate certs _ Send signing request to CA",
    "Prepare configuration",
    "Configure Docker",
    "Start Docker",
    "Configure etcd",
    "Start etcd",
    "Network configuration",
    "Configure CNI plugin",
    "Configure and start auth web hook _ pf9-bouncer",
    "Miscellaneous scripts and checks",
    "Configure and start kubelet",
    "Configure and start kube-proxy",
    "Wait for k8s services and network to be up",
    "Apply and validate node taints",
    "Uncordon node",
    "Validate k8s DNS",
    "Deploy dashboard",
    "Deploy app catalog",
    "Deploy metrics server",
    "Configure and start Keepalived",
    "Configure and start MetalLB",
    "Configure and start Autoscaler",
    "Configure and start pf9-sentry",
    "Drain all pods (stop only operation)"
  ],
  "pf9_kube_node_state": "ok", __ **
  "current_status_check": "",
  "completed_tasks": [
    "Generate certs _ Send signing request to CA",
    "Prepare configuration",
    "Configure Docker",
    "Start Docker",
    "Configure etcd",
    "Start etcd",
    "Network configuration",
    "Configure CNI plugin",
    "Configure and start auth web hook _ pf9-bouncer",
    "Miscellaneous scripts and checks",
    "Configure and start kubelet",
    "Configure and start kube-proxy",
    "Wait for k8s services and network to be up",
    "Apply and validate node taints",
    "Uncordon node",
    "Validate k8s DNS",
    "Deploy dashboard",
    "Deploy app catalog",
    "Deploy metrics server",
    "Configure and start Keepalived",
    "Configure and start MetalLB",
    "Configure and start Autoscaler",
    "Configure and start pf9-sentry",
    "Drain all pods (stop only operation)"
  ],
  "pf9_kube_service_state": "true",
  "all_status_checks": [
    "Generate certs _ Send signing request to CA",
    "Start Docker",
    "Start etcd",
    "Network configuration",
    "Configure and start auth web hook _ pf9-bouncer",
    "Miscellaneous scripts and checks",
    "Configure and start kubelet",
    "Configure and start kube-proxy",
    "Wait for k8s services and network to be up",
    "Configure and start Keepalived"
  ],
  "last_failed_status_time": 0, __*
  "pf9_cluster_id": "37ba60bb-1a36-4f78-8b83-528adea459bf",
  "current_task": "",
  "status_check_timestamp": 1594197441 __*
}
```

{% endcode %}

{% hint style="info" %}
The `last_failed_status_check` field is cleared 10 minutes after the status check is successful.

The `pf9_kube_service_state` tries to simulate the node state as reported by the hostAgent. The values this field can report on are described in the table below.
{% endhint %}

| Status     | Description                                                                          |
| ---------- | ------------------------------------------------------------------------------------ |
| OK         | Everything is fine.                                                                  |
| Converging | Starting pf9-kube failed and this is the initial attempt to restart it.              |
| Retrying   | Starting pf9-kube failed and Nodelet has tries less than 10 times to start pf9-kube. |
| Failed     | Starting pf9-kube failed and Nodelet has tried more than 10 times to start pf9-kube. |
