Automate Rolling Cluster Upgrade

Managed Kubernetes supports fully automated rolling upgrades to Kubernetes clusters. Upgrades are performed across major versions of Kubernetes, so you can continue leveraging latest Kubernetes features without the hassle of having to upgrade the cluster yourself. 

We upgrade nodes in a cluster one at a time, ensuring that the last upgraded node is healthy before upgrading the next. This is called a rolling upgrade, and has the following benefits.

  • Your applications will not experience downtime during the cluster upgrade, as long as they tolerate the failure of a single  node.
  • Your cluster users and your "Kubernetes-native" applications (i.e.,  ones that talk to the Kubernetes API server) will be able use the API while worker nodes are being upgraded. In addition, if you have created a multi-master highly available Kubernetes cluster, then your API server will also remain available across upgrades to master nodes. For a single master node cluster, the API server will experience a momentary downtime while the master node is being upgraded.
  • All nodes in your cluster will remain compatible during the cluster upgrade, despite running different versions of Kubernetes as the upgrade proceeds.

Important Notes & Warnings

To upgrade a node, we first tell Kubernetes not to schedule any further Pods on the node. We then evacuate (drain, in the Kubernetes parlance) existing Pods from the node. Evacuating a node will remove unmanaged Pods (see explanation below) and permanently erase data in emptyDir Volumes (see explanation below).

Managed vs Unmanaged Pods

Pods are the atomic unit of work in Kubernetes. Cluster users deploy Pods that fall into two categories:

Managed Pods

These are Pods that are managed by a ReplicationController,ReplicaSet, Job, or DaemonSet. Containers in Pods managed by a DaemonSet are stopped during the upgrade, but the Pods remain on the node. Pods managed by other controllers are rescheduled by Kubernetes to other nodes as long as resources are available.

Unmanaged Pods

These are Pods that are not managed by any Kubernetes controller. These Pods will be removed from the node during the upgrade and will not be rescheduled by Kubernetes. Note that, if the node fails, these Pods will not be rescheduled by Kubernetes. For that reason, unmanaged Pods should not be used in production, though they are useful for experimenting and debugging.

emptyDir Volumes

Pods have access to persistent storage through Kubernetes Volumes. If your Pod uses an emptyDir Volume, be warned that all data stored in this Volume will be erased when the Pod is removed from the node. This warning applies to any unmanaged Pod as well as all Pods managed by a ReplicationController, ReplicaSet, or Job. Note that, if the node fails, the data on this Volume may be unrecoverable. For the reason, emptyDir Volumes should not be used in production.

Refreshing Your Application's Service Account Tokens

If your application makes calls to the Kubernetes API server and uses a Kubernetes Service Account, please read on.

As part of the Managed Kubernetes upgrade, we are enforcing TLS authentication on the Kubernetes API Server,and we are decoupling the cluster CA from the cluster nodes.

As part of these changes, we are generating new client and server certificates for the entire cluster. In particular, we replace the key used to generate Service Account tokens, and then re-generate the Secrets that contain the Service Account tokens.

If you application uses a Service Account, Kubernetes makes the token available via the file /var/run/secrets/kubernetes.io/serviceaccount/token to your application's containers. However, this file is not updated if the token is re-generated via the API Server.

To ensure that your application's token is up to date, you must delete and recreate your application's pods. Please choose from the following:


You deployed your application using a Replication Controller and can tolerate momentary downtime

You can temporarily scale your application to zero pods using the following bash script:


temporary_scale_to_zero.sh

#!/usr/bin/env bash
# Usage: temporary_scale_to_zero.sh myrc
replicationcontroller=$1
current_replicas=$(kubectl get rc $replicationcontroller -o jsonpath="{.status.replicas}")
kubectl scale rc $replicationcontroller --replicas=0
kubectl scale rc $replicationcontroller --replicas=$current_replicas

Example usage:

temporary_scale_to_zero.sh myrc

You deployed your application using a Replication Controller and cannot tolerate momentary downtime

If you cannot tolerate any downtime, you can simulate an in-place rolling update. We implement this by copying your existing Replication Controller definition, replacing its name, then performing a rolling update from the existing to the new Replication Controller.


inplace_rolling_update.sh

#!/usr/bin/env bash
# Usage: inplace_rolling_update.sh myrc myrc_newname
replicationcontroller=$1
replicationcontroller_newname=$2
kubectl patch rc/$replicationcontroller -p '{"spec":{"selector":{"secrets":"old"},"template":{"metadata":{"labels":{"secrets":"old"}}}}}'
kubectl get rc/$replicationcontroller -o yaml
| sed -e 's/secrets: old/secrets: new/g' -e "0,/name: ${replicationcontroller}/{s/name:${replicationcontroller}/name:${replicationcontroller_newname}/}" -e 's/resourceVersion.*//' | kubectl rolling-update $replicationcontroller --update-period=10s -f -

Example usage:

inplace_rolling_update.sh myrc myrc_newname

You deployed your application as one or more stand-alone Pods

If you deployed your application as one or more Pods that are not managed by a Replication Controller, you will need to delete and re-create each of the Pods:


delete_recreate_pod.sh

#!/usr/bin/env bash
# Usage: delete_recreate_pod.sh mypod
pod=$1
kubectl get pod $pod -o yaml | kubectl replace -f -

Example usage:

delete_recreate_pod.sh mypod

Conclusion

Our cluster upgrades enable you to take advantage of the latest Kubernetes improvements and new features while ensuring that many applications experience no downtime, and all cluster users experience minimal downtime of the Kubernetes API. Upgrading a Kubernetes node means evacuating Pods from the node. As a consequence, a limited set of Pods and Volumes are erased.

Following the recommendations above to avoid unmanaged Pods and emptyDir Volumes will make your applications more robust to node failures and also avoid downtime and data loss during cluster upgrades.