# PMK Scale Guide

## Recommended Management Plane practices

Each PMK customer when on-boarded is provided a Management Plane (also known as Deployment Unit/DU/KDU) and this section outlines the recommendation and best practices for the same.

Following values are listed per Management Plane Instance:

| Criteria                                                                                                                                                                                  | Value |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
| Maximum number of nodes                                                                                                                                                                   | 2500  |
| Maximum number of clusters (Single node clusters)                                                                                                                                         | 300   |
| Maximum number of clusters (Small clusters - upto 8 nodes)                                                                                                                                | 30    |
| Maximum number of clusters (Medium clusters - upto 200 nodes)                                                                                                                             | 8     |
| Maximum number of clusters (Large clusters - upto 400 nodes)                                                                                                                              | 5     |
| <p>Maximum number of clusters (Combination of medium and large clusters)<br><br>Test configuration:<br><br>- 400 Node clusters: 2<br>- 250 Node clusters: 2<br>- 200 Node clusters: 4</p> | 8     |
| Maximum number of nodes onboarded in parallel                                                                                                                                             | 30    |
| Maximum number of clusters created in in parallel (Single node clusters)                                                                                                                  | 10    |

**Note:** Above values are based on latest Platform9 standard tests and are listed to provide guidance to users. Platform9 support can help you to scale to different numbers if above standard results are different from your requirements. Higher scale can be achieved with multiple Management Plane Instances, to go beyond the above listed node and cluster limits.

{% hint style="info" %}
**This guide is applicable for PMK BareOS clusters only.**
{% endhint %}

## Recommended Cluster configuration practices

Following values are listed per PMK cluster which runs on a Management Plane Instance:

| Criteria                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Value                        |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- |
| <p>Maximum number of nodes<br>Test configuration:<br><br>- <strong>Master & worker count</strong>: 5 masters, 395 workers<br>- Kubernetes version: <strong>1.26 - 1.29. (PMK 5.9 and 5.10)</strong><br>- Master node size: 18 vcpus, 30 GB memory<br>- Worker node size: 2 vcpus, 6GB memory<br>- Pod density: 23<br>- Cluster cpu usage max: 63%<br>- CNI: Calico<br>- Calico BGP: True; with Route-reflectors (3 nodes)<br>- Metallb BGP: True</p>                                                                                                  | 400                          |
| <p>Maximum number of nodes<br><br>Test configuration:<br><br>- <strong>Master & worker count</strong>: 5 masters, 395 workers<br>- Kubernetes version: <strong>1.22-1.25 (PMK</strong> <strong>5.6.8, 5.7.3 and 5.9.2 )</strong><br>- Master node size: 18 vcpus, 30 GB memory<br>- Worker node size: 2 vcpus, 6GB memory<br>- Pod density: 23<br>- Cluster cpu usage max: 63%<br>- CNI: Calico<br>- Calico BGP: False<br>- Metallb BGP: False</p>                                                                                                    | 300                          |
| <p>Maximum number of node upgrades in parallel in a cluster<br>Test configuration:<br><br>- <strong>Master & worker count</strong>: 5 masters, 395 workers<br>- Kubernetes version: 1.26 - 1.29<br>- Master node size: 18 vcpus, 30 GB memory<br>- Worker node size: 2 vcpus, 6GB memory<br>- Pod density: 23<br>- Cluster cpu usage max: 65%<br>- CNI: Calico<br>- Calico BGP: Calico with Route-reflectors (3 nodes)<br>- Metallb BGP: True<br>- Upgrades versions tested: <strong>1.26->1.27</strong>, <strong>1.27->1.28, 1.28->1.29</strong></p> | 40 (10 % of total 400 nodes) |
| Maximum number of nodes to be attached to a cluster in parallel                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 15                           |
| Maximum number of nodes to be detached from a cluster in parallel                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 30                           |
| Maximum number of pods per node                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 110 (Kubernetes default)     |

**Some Test Observations:**

Test configuration:

* **Master & worker count**: 5 masters, 395 workers
* Kubernetes version: 1.26 - 1.29
* Master node size: 18 vcpus, 30 GB memory
* Worker node size: 2 vcpus, 6GB memory
* Pod density: 23
* Cluster cpu usage max: 63%
* CNI: Calico
* Calico BGP: Calico wiih Route-reflectors (3 nodes)
* Metallb BGP: True

Observations:

* Number of pods: 9230
* Number of pods per node: 23
* Number of namespaces: 3000
* Number of secrets: 15
* Number of config maps: 1046
* Number of services: 144
* Number of pods per namespace: 7600 on single namespace
* Number of services per namespace: 100
* Number of deployments per namespace: 100

Component resource recommendations:

| Number of nodes  | Component  | Limits                               | Requests                                 | Additional data                                                                                                                                                                                                                                  |
| ---------------- | ---------- | ------------------------------------ | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 350 to 400 nodes |            | <p>cpu: 200m<br><br>memory:400Mi</p> | <p>cpu: 25m<br><br>memory: 100Mi</p>     | Test configuration: Pod density as 23 and cpu usage around 60%                                                                                                                                                                                   |
| 300 nodes        | Prometheus |                                      | <p>cpu: 2510m<br><br>memory: 12266Mi</p> | Requests and limits could be set based on this observation. It is dependent on multiple factors such as number of node, number of promethues exporter being queries, number of time series data being stored, number of calls to Prometheus etc. |

## Management Plane Instance resource recommendations

#### Default (upto 750 nodes)

| Component  | Container  | Limits                                  | Requests                              |
| ---------- | ---------- | --------------------------------------- | ------------------------------------- |
| Qbert      | qbert      | <p>cpu: 1500m<br><br>memory: 4000Mi</p> | <p>cpu: 40m<br><br>memory: 550Mi</p>  |
| Resmgr     | resmgr     | <p>cpu: 1000m<br><br>memory: 1500Mi</p> | <p>cpu: 25m<br><br>memory: 190Mi</p>  |
| Keystone   | keystone   | <p>cpu: 1000m<br><br>memory: 1000Mi</p> | <p>cpu: 250m<br><br>memory: 800Mi</p> |
| Prometheus | prometheus | <p>cpu: 1000m<br><br>memory: 4000Mi</p> | <p>cpu: 250m<br><br>memory: 200Mi</p> |
| Vault      | pf9-vault  | <p>cpu: 500m<br><br>memory: 500Mi</p>   | <p>cpu: 25m<br><br>memory: 100Mi</p>  |
|            |            |                                         |                                       |

#### Scaled configurations (750 to 2500 nodes)

| Component                           | Container                           | Limits(750-1500 nodes)                  | Requests(750-1500 nodes)              | Limits(**1500-2500**)                   | Requests(**1500-2500**)               | Additional changes                     |
| ----------------------------------- | ----------------------------------- | --------------------------------------- | ------------------------------------- | --------------------------------------- | ------------------------------------- | -------------------------------------- |
| Prometheus                          | socat19090                          | <p>cpu: 1000m<br><br>memory: 1500Mi</p> | <p>cpu: 250m<br><br>memory: 400Mi</p> | No Change                               | No Change                             | maxchild: 2500                         |
|                                     | prometheus                          | <p>cpu: 1000m<br><br>memory: 4000Mi</p> | <p>cpu: 250m<br><br>memory: 200Mi</p> | No Change                               | No Change                             | <p>WEB\_MAX\_<br>CONNECTIONS: 4000</p> |
| Rabbitmq                            | socat5673                           | <p>cpu: 400m<br><br>memory: 1000Mi</p>  | <p>cpu: 50m<br><br>memory: 50Mi</p>   | <p>cpu: 800m<br><br>memory: 1800 Mi</p> | <p>cpu: 200m<br><br>memory: 200Mi</p> |                                        |
|                                     | rabbitmq                            | <p>cpu: 1000m<br><br>memory: 1500Mi</p> | <p>cpu: 130m<br><br>memory: 750Mi</p> | No Change                               | No Change                             |                                        |
| Resmgr                              | socat18083                          | <p>cpu: 1000m<br><br>memory: 1500Mi</p> | <p>cpu: 250m<br><br>memory: 400Mi</p> |                                         |                                       |                                        |
| Ingress-nginx-controller            | socat444                            | <p>cpu: 400m<br><br>memory: 1000Mi</p>  | <p>cpu: 50m<br><br>memory: 50Mi</p>   |                                         |                                       |                                        |
| Sidekickserver                      | socat13010                          | <p>cpu: 400m<br><br>memory: 1000Mi</p>  | <p>cpu: 50m<br><br>memory: 50Mi</p>   |                                         |                                       |                                        |
|                                     | sidekickserver                      | <p>cpu: 500m<br><br>memory: 1000Mi</p>  | <p>cpu: 50m<br><br>memory: 100Mi</p>  |                                         |                                       |                                        |
| Sunpike conductor                   | socat19111                          | <p>cpu: 400m<br><br>memory: 1000Mi</p>  | <p>cpu: 50m<br><br>memory: 50Mi</p>   |                                         |                                       |                                        |
| Pf9-vault                           | vault                               | <p>cpu: 1250m<br><br>memory: 800Mi</p>  | <p>cpu: 250m<br><br>memory: 400Mi</p> |                                         |                                       |                                        |
| Sunpike-apiserver                   | sunpike-apiserver                   | <p>cpu: 1000m<br><br>memory: 1000Mi</p> | <p>cpu: 500m<br><br>memory: 256Mi</p> |                                         |                                       |                                        |
| Sunpike-conductor                   | <p>sunpike-<br>conductor</p>        | <p>cpu: 1000m<br><br>memory: 1000Mi</p> | <p>cpu: 200m<br><br>memory: 500Mi</p> |                                         |                                       |                                        |
| Sunpike-kine                        | sunpike-kine                        | <p>cpu: 1000m<br><br>memory: 256Mi</p>  | <p>cpu: 25m<br><br>memory: 256Mi</p>  |                                         |                                       |                                        |
| <p>Sunpike-kube-<br>controllers</p> | <p>sunpike-kube-<br>controllers</p> | <p>cpu: 500m<br><br>memory: 1000Mi</p>  | <p>cpu: 25m<br><br>memory: 800Mi</p>  |                                         |                                       |                                        |

**Mysql/RDS config changes:**

| Configuration       | Value (750-1500 nodes) | Value(1500-2500 nodes) |
| ------------------- | ---------------------- | ---------------------- |
| max\_connections    | 2048                   | No change              |
| mac\_connect\_error | 1000                   | No change              |
