Virtual Machine High Availability (VM HA)
Virtual Machine High Availability (VM HA) is a core Private Cloud Director capability that automatically detects physical host failures within a virtualized cluster and restarts the affected VMs on other healthy hosts in the same cluster.
Private Cloud Director VM HA delivers a safety net for production workloads being deployed on a virtualized clusters. By turning on VM HA for your cluster, you can ensure that the VMs running on this cluster will remain operating across host failures.
Pre-requisites
Following are VM HA pre-requisites:
VM HA always operates in the context of a virtualized cluster. You need to turn it on as a cluster level property. Once turned on, VM HA applies to all virtual machines within the cluster.
Shared storage is a required for VM HA. For VM HA to work at the cluster level:
All VMs should be using block storage volume as root disk (non ephemeral root disk), or
If any VMs are using ephemeral storage for root disk, then Ephemeral Shared Storage should be used for all hosts in the virtualized cluster.
When Ephemeral Shared Storage is not used, any VMs using ephemeral root disk will get rebuilt as part of recovery on another host.
VM HA requires a minimum number of healthy hosts in a cluster to function correctly. A minimum of two hosts is required for HA activation.
VM HA uses VM Evacuation operation behind the scenes. VM Evacuation Prerequisites must be met for the operation to succeed.
How VM HA Works
The process is designed to be automatic and requires minimal manual intervention during a failure event:
Continuous Host Monitoring: VM HA service constantly monitors the health and responsiveness of all hypervisor hosts participating in an HA-enabled virtualized cluster.
Failure Detection: If a host stops responding (due to hardware failure, operating system crash, or certain network isolation scenarios), the system detects the failure
Automatic VM Recovery: Once a host failure is confirmed, which involves both the management plane and cluster hosts to confirm failure, VM HA automatically initiates the process of restarting any VMs that were running on the failed host. These VMs are powered on using available resources on the remaining healthy hosts within the cluster.
Last updated
Was this helpful?
