> For the complete documentation index, see [llms.txt](https://docs.platform9.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.platform9.com/private-cloud-director/storage/block-storage/volume-backend-configuration-examples/backend-performance-tuning.md).

# Per-Backend Performance Tuning and Troubleshooting

## Overview

Storage performance problems often surface as slow volume creation, intermittent attach failures, or volume operations that time out. This guide covers the most impactful tuning knobs and common failure patterns for NFS-backed backends — specifically NetApp ONTAP NFS and Tintri NFS — because NFS introduces unique considerations around mount options, share capacity balance, and driver RPC timeouts.

For SAN-backed backends (iSCSI, Fibre Channel), the performance guidance in the vendor-specific configuration pages applies; this guide focuses on NFS.

In this guide, you will identify the cause of NFS storage performance problems and apply configuration changes to resolve them.

## Prerequisites

* Block storage host access for log review and configuration changes.
* NFS exports accessible and mounted on block storage hosts.
* `pcdctl` configured and authenticated.

## Diagnose a Slow or Timing-Out NFS Backend

### Check `get_volume_stats` RPC Latency

The Persistent Storage Service relies on periodic `get_volume_stats` calls to each backend to refresh capacity and capability information. If these calls are slow or timing out, volume creation can fail with `No valid host was found` even when capacity is available, because the scheduler is working from stale or missing pool data.

Check for RPC timeout errors in the storage service log:

```bash
sudo grep -E 'get_volume_stats|Timeout|RPC|MessagingTimeout' /var/log/pf9/cindervolume-base.log | tail -50
```

A `MessagingTimeout` or `Timeout waiting for get_volume_stats` message means the backend took longer than the configured RPC timeout to respond.

**Remediation:**

The default RPC timeout is 60 seconds. If your backend consistently takes longer (common for large NFS shares with many volumes), increase the timeout in the backend configuration:

```ini
# Add to your backend section in cinder.conf
backend_native_threads_pool_size = 20
```

And increase the global RPC timeout in the `[DEFAULT]` section:

```ini
[DEFAULT]
rpc_response_timeout = 180
```

Restart `pf9-cindervolume-base` after making this change:

```bash
sudo systemctl restart pf9-cindervolume-base
```

### Verify NFS Mount Options

Incorrect or suboptimal NFS mount options are among the most common causes of NFS backend performance problems. The mount options used by the driver are set in the backend configuration's `nfs_mount_options` parameter and applied when the Persistent Storage Service mounts NFS shares.

**Check current mounts:**

```bash
mount | grep nfs
```

**Recommended NFS mount options for block storage backends:**

| Option                      | Purpose                                                                               |
| --------------------------- | ------------------------------------------------------------------------------------- |
| `vers=3`                    | Use NFSv3 (required for some drivers; check driver documentation)                     |
| `rsize=262144,wsize=262144` | Maximize read/write block size for throughput (256 KiB)                               |
| `nconnect=16`               | Open multiple TCP connections per mount for parallelism (Linux 5.3+)                  |
| `noatime`                   | Disable access-time updates to reduce metadata write load                             |
| `lookupcache=pos`           | Cache positive dentry lookups; reduces lookup latency                                 |
| `hard,intr`                 | Hard mounts with interrupt support; prevents silent data loss on network interruption |

{% hint style="warning" %}
**Test mount option changes in a non-production environment first**

Changing NFS mount options requires unmounting and remounting active shares. This will interrupt any in-progress volume operations. Schedule this change during a maintenance window. After changing `nfs_mount_options` in the configuration, restart `pf9-cindervolume-base` to force re-mounting with the new options.
{% endhint %}

## NetApp ONTAP NFS — Tuning and Troubleshooting

### Slow Volume Creation: FlexClone Not Available

NetApp NFS volume creation uses FlexClone when creating volumes from images or snapshots. If FlexClone is not licensed or not enabled on the SVM, the driver falls back to a full file copy, which is significantly slower.

Verify FlexClone availability:

```bash
sudo grep -i flexclone /var/log/pf9/cindervolume-base.log | tail -20
```

A log line like `FlexClone feature is not available` confirms the fallback. Contact your NetApp administrator to enable the FlexClone license on the SVM.

### NFS Share Capacity Imbalance

NetApp NFS backends are typically configured with multiple NFS shares (multiple ONTAP FlexVol exports). The driver distributes volumes across shares by free capacity. If shares become unevenly loaded, new volumes consistently land on the same share until others drain.

**Detect imbalance:**

```bash
pcdctl --os-volume-api-version 3.12 volume backend pool list --detail | grep -E 'pool_name|free_capacity|total_capacity'
```

Each share appears as a separate pool. Significant difference in `free_capacity_gb` across pools indicates imbalance.

**Remediation options:**

1. **Expand the underloaded shares.** Increase the FlexVol quota on lightly loaded ONTAP volumes so the driver places more new volumes there.
2. **Migrate volumes off overloaded shares.** Use volume retype to move volumes from a full share's pool to a less-loaded one.
3. **Add a new share.** Add a new NFS export to `nfs_shares_config` and restart `pf9-cindervolume-base`. The driver will prefer it for new placements until it reaches parity.

### Recommended `nfs_mount_options` for NetApp ONTAP NFS

```ini
nfs_mount_options = vers=3,rsize=262144,wsize=262144,nconnect=16,noatime,write=eager,lookupcache=pos
```

Explanation of the NetApp-specific additions:

* `write=eager` — Enables eager write flushing, which reduces write latency on ONTAP NFS exports.
* `nconnect=16` — Opens 16 parallel TCP connections to the NFS server, improving throughput for concurrent I/O. Requires Linux kernel 5.3 or later.

### Check Pool Name Filtering

If volumes are not being placed on the expected ONTAP FlexVols, verify the `netapp_pool_name_search_pattern` setting. An overly restrictive regex can exclude pools that should be eligible:

```bash
sudo grep netapp_pool_name_search_pattern /opt/pf9/etc/pf9-cindervolume-base/conf.d/cinder.conf
```

Test the regex against your pool names using the pool list:

```bash
pcdctl --os-volume-api-version 3.12 volume backend pool list --detail | grep pool_name
```

## Tintri NFS — Tuning and Troubleshooting

### Volume Operations Timing Out

The Tintri driver communicates with the Tintri REST API for operations such as snapshot management and QoS reporting. If the REST API endpoint (`vmstore_rest_address`) is unreachable or slow, volume operations can time out.

**Check REST API connectivity:**

```bash
curl -sk https://<TINTRI_MGMT_IP>/api/v310/info | python3 -m json.tool
```

A successful response returns a JSON object with server version information. An error or timeout indicates a network or Tintri appliance issue.

**Check for REST API errors in storage service logs:**

```bash
sudo grep -E 'vmstore|TintriError|ConnectionError|Timeout' /var/log/pf9/cindervolume-base.log | tail -50
```

### Recommended `nfs_mount_options` for Tintri NFS

The Tintri driver requires NFSv3. The following options are recommended:

```ini
nfs_mount_options = vers=3,proto=tcp,lookupcache=pos,nolock,noacl
```

Do not add `nconnect` for Tintri unless your Tintri firmware version supports it. Check with your Tintri administrator before enabling parallel connections.

### NFS Share Capacity: Tintri Single-Share Deployments

Unlike NetApp NFS, a Tintri backend is typically configured with a single NFS share (`nas_share_path`). Capacity exhaustion on that share means the entire backend is unavailable for new volumes.

**Monitor share utilization:**

```bash
pcdctl --os-volume-api-version 3.12 volume backend pool list --detail | grep -E 'pool_name|free_capacity|total_capacity'
```

When `free_capacity_gb` drops below the volume sizes you regularly create, add capacity on the Tintri appliance or migrate some volumes to another backend.

### QCOW2 Volume Format Performance

Tintri stores volumes in QCOW2 format when `vmstore_qcow2_volumes = true`. QCOW2 enables thin provisioning and space-efficient snapshots. However, QCOW2 has slightly higher I/O overhead for random-write workloads compared to raw volumes.

If you observe higher-than-expected write latency on Tintri-backed volumes, benchmark with and without `vmstore_qcow2_volumes = false` in a non-production environment to determine whether the overhead is significant for your workload. Changing this setting affects only newly created volumes; existing volumes retain their format.

## General NFS Troubleshooting Steps

1. **Verify NFS connectivity** from the block storage host to the NFS server:

   ```bash
   showmount -e <NFS_SERVER_IP>
   ```
2. **Check whether shares are mounted:**

   ```bash
   mount | grep <NFS_SERVER_IP>
   ```

   If shares are not mounted, check that `nfs-common` is installed and that the NFS server is reachable.
3. **Check for stale file handles:**

   ```bash
   sudo grep 'Stale file handle' /var/log/pf9/cindervolume-base.log | tail -20
   ```

   A stale file handle means an NFS mount became invalid (server restarted or export removed and recreated). Restart `pf9-cindervolume-base` to force a re-mount.
4. **Verify available disk space on NFS exports:**

   ```bash
   df -h | grep <NFS_SERVER_IP>
   ```

## Next Steps

* For placement issues that affect which backend receives a volume, see [Persistent Storage Service Backend Selection and Tuning](/private-cloud-director/storage/troubleshooting-and-log-files/backend-selection-and-tuning.md).
* For NetApp-specific configuration parameters, see [NetApp Storage Configurations](/private-cloud-director/storage/block-storage/volume-backend-configuration-examples/netapp-storage-configurations.md).
* For Tintri-specific configuration parameters, see [Tintri Storage Configurations](/private-cloud-director/storage/block-storage/volume-backend-configuration-examples/tintri-storage-configurations.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.platform9.com/private-cloud-director/storage/block-storage/volume-backend-configuration-examples/backend-performance-tuning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
