Create Virtualized Cluster with GPU support
This section guides you through the complete process of setting up, configuring, and managing GPU-enabled Kubernetes clusters. You'll learn how to create clusters with GPU support, select the appropriate partitioning strategy for your workloads, monitor resource utilization, and modify configurations as your requirements change.
Step 1: Set up VM infrastructure for GPU hypervisors
To create a GPU-enabled Kubernetes cluster, you will need to set up your GPU passthrough VM infrastructure by configuring GPU hypervisor hosts and infrastructure clusters with GPU capabilities.
Learn more about Set up GPU Passthrough.
Step 2: Set up GPU Flavor and Image
Now that your GPU hypervisor host and GPU infrastructure cluster are ready, you will need to set up a GPU flavor and GPU image with specific image properties:
To create a GPU Passthrough flavor, see more details on Create GPU Enabled Flavors.
Upload a Cluster API compliant Operating System Image into Image Library and Images specific to the Kubernetes version you want to deploy. See more details on Operating System Image Management.
To use an image for GPU virtualised cluster creation, set
gpu=trueimage property along withk8s_version
Step 3: Create a GPU enabled Kubernetes cluster
To create a GPU enabled Kubernetes cluster:
Navigate to Kubernetes > Infrastructure > Clusters.
Select to Deploy New Cluster with Virtualized Nodes
On the Cluster Configuration page, enter a unique name for your cluster.
Select a GPU-enabled infrastructure cluster with GPU mode (passthrough) from the virtualized clusters.
Select an SSH key for your GPU virtualized Kubernetes nodes.
Select Next to proceed to Node Pool configurations.
Configure Node Pool setting**:**
VM Flavor: Select a GPU-enabled flavor and provide the number of VMs required for your virtualized Kubernetes cluster. Enable Show GPU flavors only to view GPU enabled flavors only.
Network: Configure network settings as needed.
Subnet: Configure subnet settings as needed.
Select Next to proceed to configure your Kubernetes Cluster.
Select the required Kubernetes version and GPU enabled image for your cluster.
Enable the Nvidia GPU Operator add-on to configure your GPU virtualized Kubernetes nodes.
Select 'Submit' to deploy your GPU-enabled Kubernetes virtualized node cluster.
Step 4: Configure GPU partitioning
Update GPU partitioning strategies as your business needs change.
Navigate to Kubernetes > Infrastructure > Clusters
Navigate to Capacity and Health
Select the GPU Nodes Group to configure the GPU partition.
Select Edit GPU Configuration to update GPU partitioning from the default
Passthroughmode.Select a new partitioning strategy:
Switch from Passthrough to MIG or Time Slicing
Change MIG profiles
Adjust Time Slicing replica counts
Choose Save Configuration changes.
The GPU operator pods on your GPU Kubernetes cluster restart and take a couple of minutes to update the GPU partitioning status. Once complete, the new configuration displays with updated GPU instances and memory allocation.
View GPU metrics
To view GPU metrics for GPU enabled virtualized worker nodes:
Navigate to Kubernetes > Infrastructure > Cluster
Navigate to Capacity and Health
Select Manage Columns for Worker Nodes to enable the display of required GPU headers, such as GPU Model, GPU Strategy, GPU details, GPU count, GPU memory, and so on.
Select Show GPU Nodes only to view only GPU enabled nodes in your cluster.
Now you can view the required GPU information for each of your GPU-enabled virtualized nodes.
Known issues
GPU node onboarding doesn't show explicit success messages. Wait 2-3 minutes after onboarding to see the updated status.
Cluster creation might occasionally fail, requiring a
cointanerdrestart on Kubernetes nodes. These issues are tracked and resolved in subsequent releases (BYOH).
Last updated
Was this helpful?
