Overview and Architecture

This document describes the architecutre of multi-master highly available BareOS clusters in PMK. See What is BareOS for an understanding of BareOS.

Following diagram describes the overall architecture that PMK supports for a multi-master BareOS cluster:

Architecture diagram

Virtual IP Addressing with VRRP

Multi-master Cluster

PMK uses the Virtual Router Redundancy Protocol (VRRP) with Keepalived to provide a virtual IP (VIP) that fronts the active master node in a multi-master Kubernetes cluster. At any point in time, the VRRP protocol associates one of the master nodes with the virtual IP to which the clients (kubelet, users) connect. Lets call this the active master node.

During cluster creation, PMK will bind the virtual IP to a specific physical interface on the master node, specified by the admin during cluster creation. The Virtual IP should be reachable from the network that the specified physical interface connects to. The label for the specified physical interface, such as eth0, for example, must be provided by the user while creating the cluster, and every master must have the same label for the interface to be bound to the virtual IP.

When the cluster is running, all client requests for Kubernetes API server are sent to the active master node only (the master that is currently mapped to the Virtual IP). If that master goes down, the VRRP protocol will elect a new master to be the active master and remap the Virtual IP to that master, making that master the new target of all new client requests.

Hence for high availability of your clusters, it is recommended to design your clusters with 3 or 5 master nodes.

Etcd cluster configuration

In a multi-master cluster, Platform9 runs an instance of etcd in each of the master nodes. For the etcd cluster to be healthy, there must be a quarum (or majority) number of etcd nodes up and running all the time (for eg 2 out of 3 masters should be up and running). Losing quarum will result in a non-functional etcd cluster, thus causing the Kubernetes cluster to also not function. Thus it’s recommended to create your production clusters with at least 3 or 5 master nodes.

See here for more info: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/failures.md

Cluster master configuration vs tolerance for loss of masters

As discussed in the Etcd cluster configuration section above, Platform9 runs an instance of etcd on each master node.

For the etcd cluster to function properly, it needs a majority of nodes, a quorum, to agree on updates to the cluster state.

Hence the number of masters you configure your cluster with has a direct impact on how the cluster can tolerate loss of a master node.

A cluster can lose one or more master nodes in the following scenarios:

  1. One or more master nodes goes down
  2. A network partition results in masters not being able to communicate with each other

For a cluster with n members, quorum is (n/2)+1. For any odd-sized cluster, adding one node will always increase the number of nodes necessary for quorum. Although adding a node to an odd-sized cluster appears better since there are more machines, the fault tolerance is worse since exactly the same number of nodes may fail without losing quorum but there are more nodes that can fail. If the cluster is in a state where it can’t tolerate any more failures, adding a node before removing nodes is dangerous because if the new node fails to register with the cluster (e.g., the address is misconfigured), quorum will be permanently lost.

Following table describes the number of masters you configure a cluster with vs loss of masters the cluster can tolerate

Number of masters Impact of loss of masters
1 A single master cluster is thus never recommended for production environments as it can not tolerate any loss of masters. Its only recommended for test environments where you wish to quickly deploy a cluster and you can tolerate the possibility of cluster downtime caused by the master going down. You can not add more master nodes to a cluster that's created with a single master node today. You need to start with a cluster that has atleast 2 master nodes before you can add any more masters to it.
2 A 2 master cluster can not tolerate any master loss. Losing 1 master will cause quorum to be lost and so the etcd cluster and hence the Kubernetes cluster will not function.
3 3 masters is the minimum we recommend for a highly available cluster. A 3 master cluster can tolerate loss of at most 1 master at a given time. In that case, the remaining 2 masters will elect a new active master if necessary.
4 A 4 master cluster can tolerate loss of at most 1 master at a given time. In that case, the remaining 3 masters will have majroity and will elect a new active master if necessary.
5 A 5 master cluster can tolerate loss of at most 2 masters at a given time. In that case, the remaining 4 or 3 masters will have majroity and will elect a new active master if necessary.

For more information, see the documentation on etcd FAQ here: https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md