Cluster Scaling & Other Operations

This document describes steps to scale your Kubernetes management cluster that is part of your self-hosted Private Cloud Director deployment.

Scale Up Management Cluster

Let's consider that we have 3 master nodes as part of your management cluster, with IP addresses 1.1.1.1, 2.2.2.2 `and` 3.3.3.3.

$ cat /opt/pf9/airctl/conf/nodelet-bootstrap-config.yaml
...
masterNodes:
- nodeName: 1.1.1.1
- nodeName: 2.2.2.2
- nodeName: 3.3.3.3

Lets say that you want to scale up the management cluster to 5 nodes, and that 4.4.4.4 & 5.5.5.5 are the IP addresses of the 2 new nodes to be added.

To scale up the number of cluster nodes to 5:

Configure the pre-requisites on the two new nodes.
Edit the cluster bootstrap configuration file /opt/pf9/airctl/conf/nodelet-bootstrap-config.yamland add the two new IP addresses to the section called masterNodes in the file.
Finally, run the airctl command shown below to scale up the cluster.

Note

airctl expects the node count to be always odd in the cluster bootstrap configuration file /opt/pf9/airctl/conf/nodelet-bootstrap-config.yaml

$ cat /opt/pf9/airctl/conf/nodelet-bootstrap-config.yaml
...
masterNodes:
- nodeName: 1.1.1.1
- nodeName: 2.2.2.2
- nodeName: 3.3.3.3
- nodeName: 4.4.4.4
- nodeName: 5.5.5.5

airctl scale-cluster --config /opt/pf9/airctl/conf/airctl-config.yaml --verbose

Now verify that the management cluster has scaled up by querying the cluster nodes.

$ kubectl get nodes
NAME            STATUS   ROLES    AGE      VERSION
1.1.1.1         Ready    master   44m29s   v1.29.2
2.2.2.2         Ready    master   45m41s   v1.29.2
3.3.3.3         Ready    master   46m42s   v1.29.2
4.4.4.4         Ready    master   5m42s    v1.29.2
5.5.5.5         Ready    master   5m40s    v1.29.2

Management Cluster Status

To check the status of your management cluster, run the following command:

airctl status --config /opt/pf9/airctl/conf/airctl-config.yaml --region <REGION_NAME>

Sample output:

airctl status --config /opt/pf9/airctl/conf/airctl-config.yaml --region foo-region1
# Sample output:
------------- deployment details ---------------
fqdn:                foo-region1.bar.io
cluster:             foo-bork.bar.io
region:              foo-region1
task state:          ready
version:             v-5.12.0-3479469
-------- region service status ----------
desired services:  45
ready services:    45

Scale Down Management Cluster

Now let's assume that we want to remove nodes 2.2.2.2 and 3.3.3.3 from the management cluster. To scale down the cluster:

Edit the cluster bootstrap configuration file and remove the IP addresses of the two nodes.
Then run the airctl command as shown below to scale down the cluster.

$ cat /opt/pf9/airctl/conf/nodelet-bootstrap-config.yaml
...
masterNodes:
- nodeName: 1.1.1.1
- nodeName: 4.4.4.4
- nodeName: 5.5.5.5

airctl scale-cluster --config /opt/pf9/airctl/conf/airctl-config.yaml --verbose

Warning

When we run above command to scale down the cluster, due to a known issue, it tries to remove the first node in this case 2.2.2.2 and fails due to a containerd mount cleanup issue. See the workaround below.

$ airctl scale-cluster --config /opt/pf9/airctl/conf/airctl-config.yaml --verbose
2024-12-05T01:06:21.279Z        info    Removing node 2.2.2.2 from cluster airctl-mgmt
2024-12-05T01:06:21.279Z        info    Deleting nodelet
2024-12-05T01:06:21.279Z        info    Removing nodelet with cmd: apt remove -y nodelet
...
cannot remove '/run/containerd/io.containerd.grpc.v1.cri/sandboxes/f5cc808d52184fa092b1c9de2cef7a4ef9d606cdd1877be9efe5d4c91ecc4604/shm': Device or resource busy<br>"}
Failed to update nodelet cluster: ScaleCluster failed to remove old masters: failed to delete node 2.2.2.2: failed: sudo rm -rf /run/containerd: command sudo sudo rm -rf /run/containerd failed: Process exited with status 1
Error: ScaleCluster failed to remove old masters: failed to delete node 2.2.2.2: failed: sudo rm -rf /run/containerd: command sudo sudo rm -rf /run/containerd failed: Process exited with status 1

Workaround

We need to manually unmount the containerd partitions from the node to be removed.

Then, run command kubectl delete node <IPAddress> on the cluster after which all the terminating pods for that node move to other nodes part of the cluster.

It can take around 10-15 mins for all the pods from this node to move to the other nodes. Please wait and ensure this has happened before proceeding to next step.

Because the scale command errored out on the first node to be removed, run the scale command again to remove the second node. Also, ensure to perform the manual workaround steps again in this case for node 3.3.3.3
Cluster state post scale down operation completion:

$ kubectl get nodes
NAME            STATUS   ROLES    AGE      VERSION
1.1.1.1         Ready    master   44m29s   v1.29.2
4.4.4.4         Ready    master   5m42s    v1.29.2
5.5.5.5         Ready    master   5m40s    v1.29.2

Stop Management Plane/Regions

To stop the specific region of your self-hosted deployment, run the following command. If you want to stop all regions, just remove --region flag.

airctl stop --config /opt/pf9/airctl/conf/airctl-config.yaml --region <REGION_NAME>

 SUCCESS  scaling down management plane foo-region1

Sample output:

airctl stop --config /opt/pf9/airctl/conf/airctl-config.yaml --region <REGION_NAME>                 
 SUCCESS  scaling down management plane foo-region1

Start Management Plane/Regions

To start the specific region of your self-hosted deployment, run the following command. If you want to start all regions, just remove --region flag.

airctl start --config /opt/pf9/airctl/conf/airctl-config.yaml --region <REGION_NAME>

Sample output:

airctl start --config /opt/pf9/airctl/conf/airctl-config.yaml --region <REGION_NAME>
 SUCCESS  scaling up management plane foo-region1

Uninstall Self-Hosted Deployment

To uninstall the specific region of your self-hosted deployment, run the following command. If you want to uninstall all regions, just remove --region flag.

airctl unconfigure-du --config /opt/pf9/airctl/conf/airctl-config.yaml --region <REGION_NAME> --force

This command will uninstall and remove configured regions along with all infrastructure software like consul, vault, percona, k8sniff, etc.

If you plan to reuse the same nodes to deploy a new self-hosed Private Cloud Director environment, make sure to also run the following command on all nodes first.

rm -rf airctl* install-pcd.sh nodelet* options.json pcd-chart.tgz /opt/pf9/airctl/ .airctl/

PreviousUpgrade Management Plane and Hosts NextBackup and Restore Management Plane

Last updated 16 days ago

Was this helpful?

Good morning

hashtagScale Up Management Cluster

hashtagManagement Cluster Status

hashtagScale Down Management Cluster

hashtagStop Management Plane/Regions

hashtagStart Management Plane/Regions

hashtagUninstall Self-Hosted Deployment

Scale Up Management Cluster

Management Cluster Status

Scale Down Management Cluster

Stop Management Plane/Regions

Start Management Plane/Regions

Uninstall Self-Hosted Deployment