# GPU Partitioning Strategies

You can enable GPU support in your Kubernetes clusters to run AI/ML, data science, and media processing workloads. GPU support allows you to partition physical GPUs efficiently, maximizing resource utilization and reducing costs.

Learn more about how to [Create Virtualized Cluster with GPU support](https://docs.platform9.com/private-cloud-director/kubernetes-clusters/pcd-kubernetes-clusters/create-virtualized-cluster-with-gpu-support) by configuring GPU partitioning strategies and monitoring GPU resources.

## GPU Partitioning Strategies

Before creating your GPU cluster, understand the three available partitioning strategies. Each strategy serves different workload requirements and resource efficiency needs.

### Passthrough

Passthrough assigns an entire physical GPU directly to a single workload, bypassing any virtualization layer. This strategy delivers near-native performance because the workload has exclusive access to all GPU cores, memory, and processing power.

**Use Passthrough when:**

* You need maximum GPU performance for intensive workloads
* Your applications require exclusive GPU access
* You run large-scale training jobs or high-performance computing tasks
* Resource sharing isn't a priority

**Limitations:**

* One workload per GPU, which can lead to underutilization
* Higher cost per workload due to dedicated resource allocation
* No ability to run multiple smaller workloads simultaneously

### MIG (Multi-Instance GPU)

MIG provides hardware-level partitioning that divides a single GPU into multiple isolated GPU instances. Each instance has dedicated streaming multiprocessors (SMs), memory, cache, and copy engines, ensuring complete isolation between workloads.

**Use MIG when:**

* You need guaranteed resource isolation between workloads
* Multiple teams or applications share the same physical GPU
* You want to maximize GPU utilization while maintaining performance boundaries
* Security and tenant isolation are critical requirements

**Key features:**

* Each MIG instance appears as a separate GPU to applications
* Memory and compute resources are physically partitioned
* Profiles determine the size of each instance (1g, 2g, 3g, 4g, 7g configurations)
* Available only on modern GPUs (Ampere architecture or later)

**Example:** An H100 GPU can be partitioned into one 4g.47gb instance and two 1g.24 gb instances, utilizing 6 out of 7 GPU compute units and 94GB of available memory.

{% hint style="info" %}
**Info**

MIG is supported on GPUs starting with the NVIDIA Ampere generation only. Learn more about [MIG supported GPUs](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-gpus).
{% endhint %}

### Time Slicing

Time Slicing multiplexes multiple workloads on a single GPU by granting each exclusive access for short time periods. The GPU operator schedules workloads sequentially, allowing multiple pods to share the same physical GPU resources.

**Use Time Slicing when:**

* You have bursty or intermittent GPU workloads
* Applications don't require continuous GPU access
* You want to increase GPU utilization for development and testing
* Cost optimization is more important than guaranteed performance

**Important considerations:**

* No memory isolation between workloads
* Performance depends on workload scheduling and resource contention
* Best suited for inference workloads rather than training
* You can configure 2-16 replicas per GPU

**Example:** A GPU configured with 4 time slices allows 4 different pods to run GPU workloads sequentially, with each pod getting shared access during its allocated time window.
