Create Horizontally Auto-Scaling Cluster on AWS

While creating a Kubernetes cluster with Amazon Web Services (AWS), you can configure the cluster for horizontal auto-scaling.

Horizontal auto-scaling of a Kubernetes cluster is possible with spot instances or by enabling auto-scaling on the cluster. If you enable spot instances, the auto-scaling decision is made by Amazon EC2; whereas when you enable auto-scaling cluster, the auto-scaling decision is made internally by cluster-autoscaler deployment running on the Kubernetes cluster. You can use one of the two options for horizontal auto-scaling, depending on your budget constraints and app requirements.

While creating an auto-scaling cluster, you must specify the minimum and maximum number of worker nodes. The value for minimum and maximum number of worker nodes must be a positive integer. Platform9 does not restrict the maximum number of nodes for auto-scaling, as long as its value is greater than the value for minimum number of nodes. However, there may be restrictions on the AWS side for the maximum number of worker nodes that can be created. Check the AWS documentation for details on this.

Enable Horizontal Auto-Scaling

You can enable the cluster auto-scaling feature on a Kubernetes cluster deployed on Amazon Web Services (AWS).

When the auto-scale feature is enabled, a Kubernetes cluster is able to auto-scale horizontally.

Horizontal auto-scaling happens dynamically per the requirement of the apps running in the pods on the cluster. If one or more pods are in pending state and current worker nodes cannot host additional pods, the cluster scales up automatically and initiates new nodes.

Similarly, when the existing number of nodes are not necessary, the Kubernetes cluster scales down automatically to remove idle nodes.

When you enable the auto-scale feature, you must specify the maximum number of nodes that can be added to the cluster automatically when the cluster scales up.

If the cluster scales down, it can scale down to a minimum of the Number of Nodes specified during the cluster creation.

When is a Cluster Scaled up?

Autoscaler deployment checks for any unschedulable pods every 10 seconds, by default. This time interval is configurable with the --scan-interval flag.

A pod is unschedulable when the Kubernetes scheduler is unable to find a node that can accommodate the pod. For example, a pod can request more CPU than is available on any of the cluster nodes. Whenever a Kubernetes scheduler fails to find a place to run a pod, Cluster Autoscaler finds a new place to run them, by scaling up and adding one or more worker nodes, as required.

It may take some time before the created nodes appear in Kubernetes. It almost entirely depends on the cloud provider (AWS) and the speed of node provisioning. Cluster Autoscaler expects requested nodes to appear within 15 minutes (this time interval is configurable with the --max-node-provision-time flag)

When is a Cluster Scaled Down?

Every 10 seconds (configurable with the --scan-interval flag), Cluster Autoscaler checks for unnecessary nodes.

If a worker node doesn’t have any workload running for more than 10 minutes (this time interval is configurable with the --scale-down-unneeded-time flag), and if the running workload can be moved to other nodes, the cluster is scaled down and the scheduler places the pods somewhere else. Cluster Autoscaler does this by evicting the pods and tainting the node, so that the pods aren’t scheduled there again.

Cluster is not scaled down if current number of workers is equal to the minimum number of worker nodes for the cluster.

A node is considered for removal when all of the following conditions are true.

  • The sum of CPU and memory requests of all pods running on this node is smaller than 50% of the node Allocatable. The utilization threshold can be customized using --scale-down-utilization-threshold flag.
  • The pods running on the nodes are movable to other nodes.
  • The node doesn’t have scale-down disabled annotation ("": "true")

Create Auto-scaling Cluster

To create a cluster on AWS that auto-scales horizontally, select the Enable Auto Scaling check box under Cluster Configuration when you are creating the cluster.

Follow the steps given below to create a Kubernetes cluster on AWS.

  1. Navigate to Kubernetes > Infrastructure > Clusters.
  2. Click Add Cluster.
  3. Select an option corresponding to Amazon Web Services for Cloud Provider.
  4. Enter the cluster name in Name.
  5. Select Region and Availability Zones.
  6. Select the Operating System and click Next.
  7. Select the Master Node Instance Type, Worker Node Instance Type.
  8. Enter the Number of Master Nodes and the Number of Worker Nodes.
  9. Select Disable Workloads on Master Nodes. This is a recommended step for the stability of the cluster.
  10. Select the Enable Auto Scaling check box.
  11. Click Next.
  12. Select Domain and Network.
  13. Select the Deploy Nodes using Public Subnet check box, if you want to deploy nodes in a public subnet.
  14. Enter API FQDN, Services FQDN, Containers CIDR, Services CIDR.
  15. Select the HTTP proxy check box, if an HTTP proxy is to be used for the cluster. If you select the HTTP proxy check box, specify the credentials in the following format.<scheme>://<username>:<password>@<host>:<port>
  16. Select the Configure Network Backend check box and select Calico or Flannel as the Network Backend. Once this is selected, Calico or Flannel is enabled when the cluster is created.
  17. Click Next.
  18. Review the cluster configuration and click Create Cluster.

The horizontally auto-scaling cluster is created on AWS.

You can deploy your applications on the newly created cluster.

Change Auto-Scale Parameters for Existing Cluster

You can change the minimum number of nodes (Number of Nodes) and maximum number of nodes (Maximum Number of Nodes) in a cluster created on AWS, only if the cluster is an auto-scale enabled AWS cluster.

To change the auto-scale parameters for an existing cluster, you must follow the steps given in Scale Cluster.