Cluster Upgrades

PMK supports fully automated rolling upgrades to Kubernetes clusters. Upgrades are performed across major versions of Kubernetes, so you can continue leveraging latest Kubernetes features without the hassle of having to upgrade the cluster yourself. 

We upgrade nodes in a cluster one at a time, ensuring that the last upgraded node is healthy before upgrading the next. This is called a rolling upgrade, and has the following benefits.

  • Your applications will not experience downtime during the cluster upgrade, as long as they tolerate the failure of a single node.
  • Your cluster users and your Kubernetes-native applications - ones that talk to the Kubernetes API server - will be able use the API while worker nodes are being upgraded. If your cluster has multiple masters, then your API server will remain available across upgrades to master nodes as well. For a single master cluster, the API server will experience a momentary downtime while the master node is being upgraded.
  • All nodes in your cluster will remain compatible during the cluster upgrade, despite running different versions of Kubernetes as the upgrade proceeds.

Important Notes and Warnings

To upgrade a node, PMK first requests Kubernetes to not schedule any further pods on the node. It then evacuates (or drains, in the Kubernetes parlance) existing pods from the node. Evacuating a node will remove unmanaged pods (see explanation below) and permanently erase data in emptyDir Volumes (see explanation below).

Managed vs Unmanaged Pods

A Kubernete pod falls into two categories:

Managed Pods

These are pods that are managed by one of the following controllers - ReplicationController, a ReplicaSet, a Job, or a DaemonSet. Containers in pods managed by a daemonset are stopped during the upgrade, but the pods remain on the node. Pods managed by other controllers are rescheduled by Kubernetes to other nodes as long as resources are available.

Unmanaged Pods

These are pods that are not managed by any Kubernetes controller. These pods will be removed from the node during the upgrade and will not be rescheduled by Kubernetes. For that reason, unmanaged pods should not be used in production, though they are useful for experimenting and debugging.

emptyDir Volumes

Pods have access to persistent storage through Kubernetes persistent volumes. If your pod uses an emptyDir Volume, be warned that all data stored in this volume will be erased when the pod is removed from the node. This warning applies to any unmanaged pod as well as all pods managed by a ReplicationController, a ReplicaSet, or a Job. Note that, if the node fails, the data on this volume may be unrecoverable. For the reason, emptyDir Volumes should not be used in production.

Refreshing Your Application’s Service Account Tokens

If your application makes calls to the Kubernetes API server and uses a Kubernetes Service Account, please read on. As part of the upgrade process, we are enforcing TLS authentication on the Kubernetes API Server,and we are decoupling the cluster CA from the cluster nodes. As part of these changes, we are generating new client and server certificates for the entire cluster. In particular, we replace the key used to generate Service Account tokens, and then re-generate the Secrets that contain the Service Account tokens. If you application uses a Service Account, Kubernetes makes the token available via the file /var/run/secrets/kubernetes.io/serviceaccount/token to your application’s containers. However, this file is not updated if the token is re-generated via the API Server. To ensure that your application’s token is up to date, you must delete and recreate your application’s pods.

Please choose from the following:


You deployed your application using a Replication Controller and can tolerate momentary downtime

You can temporarily scale your application to zero pods using the following bash script: temporary_scale_to_zero.sh

#!/usr/bin/env bash
# Usage: temporary_scale_to_zero.sh myrc
replicationcontroller=$1
current_replicas=$(kubectl get rc $replicationcontroller -o jsonpath="{.status.replicas}")
kubectl scale rc $replicationcontroller --replicas=0
kubectl scale rc $replicationcontroller --replicas=$current_replicas

Example usage:

temporary_scale_to_zero.sh myrc

</p>

You deployed your application using a Replication Controller and cannot tolerate momentary downtime

If you cannot tolerate any downtime, you can simulate an in-place rolling update. We implement this by copying your existing Replication Controller definition, replacing its name, then performing a rolling update from the existing to the new Replication Controller.

inplace_rolling_update.sh

#!/usr/bin/env bash
# Usage: inplace_rolling_update.sh myrc myrc_newname
replicationcontroller=$1
replicationcontroller_newname=$2
kubectl patch rc/$replicationcontroller -p '{"spec":{"selector":{"secrets":"old"},"template":{"metadata":{"labels":{"secrets":"old"}}}}}'
kubectl get rc/$replicationcontroller -o yaml
| sed -e 's/secrets: old/secrets: new/g' -e "0,/name: ${replicationcontroller}/{s/name:${replicationcontroller}/name:${replicationcontroller_newname}/}" -e 's/resourceVersion.*//' | kubectl rolling-update $replicationcontroller --update-period=10s -f -

</p>

Example usage:

inplace_rolling_update.sh myrc myrc_newname

</p>

You deployed your application as one or more stand-alone Pods

If you deployed your application as one or more Pods that are not managed by a Replication Controller, you will need to delete and re-create each of the Pods: delete_recreate_pod.sh

#!/usr/bin/env bash
# Usage: delete_recreate_pod.sh mypod
pod=$1
kubectl get pod $pod -o yaml | kubectl replace -f -

Example usage:

delete_recreate_pod.sh mypod