> For the complete documentation index, see [llms.txt](https://docs.platform9.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.platform9.com/private-cloud-director/images-and-image-library/troubleshooting-and-log-files/image-service-troubleshooting-guide.md).

# Image Service Troubleshooting Guide

## Problem

A troubleshooting guide for image services is needed to address frequent issues with image management in cloud environments, such as image upload failures, slow performance, and incorrect metadata. The guide must provide clear, actionable steps for diagnosing and resolving common errors to ensure the reliability and availability of the image service.

## Environment

* Private Cloud Director Virtualization - v2025.4 and Higher
* Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
* Component - PCD Image service

## Deep Dive

### Image Creation Flow

The image creation process in `space.vars.product_name` is managed by the **Glance** service. The flow begins when a user uploads a new image file, which is then processed and stored.

{% stepper %}
{% step %}

#### User Request

A user initiates an image upload via the OpenStack CLI, `space.vars.product_name` dashboard, or direct API call. The request includes the image file and metadata (e.g., name, format, disk format).
{% endstep %}

{% step %}

#### API Service

The **Glance API** service receives the request, validates the user's authentication token with **Keystone**, and checks for permissions and quotas. Below Glance API logs show the token is being used.

{% tabs %}
{% tab title="Sample Logs" %}

```dart
INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Image Validation and Upload Request

The glance service validates the [image format](https://platform9.com/docs/private-cloud-director/private-cloud-director/image-library---images#image-formats) and confirms that its virtual size (can be fetched using the command `qemu-img info <image_name>.qcow2` on glance host) meets the requirements. Then the Image `PUT /v2/images/<IMAGE_UUID>/file` request is placed to upload image data to a temporary staging area, ideally at the default staging location (If default directory changed, then check the custom location) `/var/lib/glance/os_glance_staging_store/`.

{% tabs %}
{% tab title="Sample Logs" %}

```dart
INFO glance.location [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Image format matched and virtual size computed: 41126400
INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "PUT /v2/images/[IMAGE_UUID]/file HTTP/1.0" 204 468 2.400140
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Glance API to Registry

The API service then communicates with the **Glance Registry**, which creates a new entry for the image in the Glance database. The status is set to `queued`.
{% endstep %}

{% step %}

#### Glance Service

The Glance API hands off the request to the **pf9-glance-api** service, which moves the image data from the staging area to the backend storage (e.g., Swift, Ceph, or a local file system) default image file storage location `/var/opt/imagelibrary/data/glance/`.
{% endstep %}

{% step %}

#### Status Update

Once the image is successfully stored, the Glance Store service updates the image's status in the database from `queued` to `active`. The image is now ready for use. The host glance audit logs (`/var/log/pf9/glance-audit.log`) show information about the request Username, Image UUID, outcome, etc.

{% tabs %}
{% tab title="Sample Logs" %}

```dart
INFO oslo.messaging.notification.audit.http.response [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] {"message_id": "[Audit_Message_ID]", "publisher_id": "glance-api", "event_type": "audit.http.response", "priority": "INFO", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "eventType": "activity", "id": "[Activity_ID]", "eventTime": "[..]", "action": "update", "outcome": "success", "observer": {"id": "target"}, "initiator": {"id": "[..]", "typeURI": "service/security/account/user", "name": "[USER_NAME]", "credential": {"token": "***", "identity_status": "Confirmed"}, "host": {"address": "127.0.0.1", "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36"}, "project_id": "[PROJECT_UUID]"}, "target": {"id": "unknown", "typeURI": "unknown", "name": "unknown"}, "requestPath": "/v2/images/[IMAGE_UUID]/file", "tags": ["correlation_id?value=[..]"], "reason": {"reasonType": "HTTP", "reasonCode": "204"}, "reporterchain": [{"role": "modifier", "reporterTime": "[..]", "reporter": {"id": "target"}}]}, "timestamp": "[..]"}
```

{% endtab %}
{% endtabs %}
{% endstep %}
{% endstepper %}

***

### Image Deletion Flow

The deletion process also uses the Glance services to remove the image's data and its database entry.

{% stepper %}
{% step %}

#### User Request

A user sends a deletion request via the OpenStack CLI, `space.vars.product_name` dashboard, or direct API call. The request includes the image information (e.g Image ID or Name).
{% endstep %}

{% step %}

#### API Service

The **Glance API** service receives the request, validates the user's authentication token with **Keystone**, and performs a permission check, and changes the image's status in the database to `deleting`. Below Glance API logs show the token is being used.

{% tabs %}
{% tab title="Sample Logs" %}

```dart
INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Glance Service

The **pf9-glance-api** service receives a message to delete the image from the backend storage. The Glance API hands off the request to the **pf9-glance-api** service, which moves the image data from backend storage (e.g., Swift, Ceph, or a local file system) default image file storage location `/var/opt/imagelibrary/data/glance/`. This is a crucial step that frees up disk space.

{% tabs %}
{% tab title="Sample logs" %}

```dart
INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "DELETE /v2/images/[IMAGE_UUID] HTTP/1.0" 204 468 1.924124
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Final Status Update

Once the data is confirmed to be deleted from the backend store, the Glance API removes the image's database entry, completing the deletion process.
{% endstep %}
{% endstepper %}

## Procedure

The following steps outline how to troubleshoot the image issue.

{% stepper %}
{% step %}

#### Review image details

Review image details like status and any errors.

{% tabs %}
{% tab title="Command" %}

```bash
$ openstack image show <IMAGE_UUID>
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Validate Glance endpoints

Validate if the glance image endpoints are available and the public endpoint is responding using a curl request. This curl request should return the glance information.

{% tabs %}
{% tab title="Command" %}

```bash
$ openstack endpoint list --service glance
$ openstack endpoint list --service glance-cluster
$ curl -s https://<FQDN>/glance/
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Check if the image service is enabled

{% tabs %}
{% tab title="Command" %}

```bash
$ openstack service list | grep -i image
$ openstack service show <GLANCE/GLANCE_CLUSTER_UUID>
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Check glance-api pod (Self-Hosted only)

The management plane has a **glance-api** pod to provide the image service. Check if the glance-api pod is running in the workload region namespace. Review this pod:

{% hint style="info" %}
**Info**

Step 4 is applicable only for Self-Hosted Private Cloud Director
{% endhint %}

* Check if they are in "CrashLoopBackOff/OOMkilled/Pending/Error/Init" state.
* Also, verify if all containers in the pods are Running.
* See the events section in pod describe output.
* Review pods logs using `REQ_ID` or `VM_UUID` for relevant details.

{% tabs %}
{% tab title="Command" %}

```bash
$ kubectl get pods -o wide -n <WORKLOAD_REGION> | grep -i "glance"

$ kubectl describe -n <WORKLOAD_REGION> <GLANCE_API_POD>

$ kubectl logs -n <WORKLOAD_REGION> <GLANCE_API_POD>
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Validate pf9-glance-api service on host

Validate if the **pf9-glance-api** service is running on the host where glance role is applied.

{% tabs %}
{% tab title="Command" %}

```bash
$ sudo systemctl status pf9-glance-api
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Review host logs

On the host, review the `/var/log/pf9/glance-api.log` to track the relevant events against a specific image ID.
{% endstep %}

{% step %}

#### Escalation

If these steps prove insufficient to resolve the issue, kindly reach out to the [Platform9 Support Team](https://support.platform9.com/hc/en-us) for additional assistance.
{% endstep %}
{% endstepper %}

## Most Common Causes

* Ensure that the Image Library Service [Pre-Requisites](https://platform9.com/docs/private-cloud-director/private-cloud-director/image-library---images#prerequisites) are met.
* While uploading an image, the `admin.rc` file does not have the `OS_INTERFACE` variable set to `admin`.
* Incorrect image format. See [Supported Image Formats](https://platform9.com/docs/private-cloud-director/private-cloud-director/image-library---images#image-formats).
* The `pf9-glance-api` service is down on the Image Library host.
* In Self-Hosted deployments, the `--insecure` flag was not used with the `pcdctl` command and the Image Library host uses a self-signed certificate.

***

## Troubleshoot Image Upload and Queued Status <a href="#troubleshoot-image-upload-and-queued-status" id="troubleshoot-image-upload-and-queued-status"></a>

### Overview

When you upload an image, it passes through the following status sequence: `queued` → `saving` → `active`. An image that stays in `queued` indefinitely means the Image Library Service accepted the metadata but was unable to receive or store the file data. The sections below walk through the most common causes and how to resolve them.

### Check Disk Space on the Image Library Host

The Image Library Service writes image data to `/var/opt/imagelibrary/data/glance/` by default. If the filesystem holding that path is full, the upload stalls in `queued` and the service logs an I/O or disk-full error.

1. On the Image Library host, check available disk space:

```bash
df -h /var/opt/imagelibrary/data/glance/
```

2. If the filesystem is at or near 100%, free space by removing unused images or expanding the filesystem before retrying the upload.
3. Check the Image Library Service log for disk-related errors:

```bash
sudo grep -i "no space\|disk full\|errno\|IOError" /var/log/pf9/glance-api.log | tail -50
```

### Verify the Image Library Service Is Running

A stopped or crashed `pf9-glance-api` service will cause all uploads to stall in `queued`.

```bash
sudo systemctl status pf9-glance-api
```

If the service is not active, start it and then check for errors that might have caused it to stop:

```bash
sudo systemctl start pf9-glance-api
sudo journalctl -u pf9-glance-api -n 100
```

### Verify Image Format

An unsupported or mismatched image format can prevent the service from processing the upload. Confirm the actual format of the image file before uploading:

```bash
qemu-img info <image-file>
```

Use the format reported by `qemu-img info` when specifying `--disk-format` in the `pcdctl` command or selecting the format in the UI. Supported formats are `raw`, `qcow2`, and `iso`.

### Check for Network or Timeout Issues

Large image uploads over slow or intermittent network connections can time out before the data fully transfers.

* Upload from a machine that has a direct, low-latency network path to the Image Library host.
* Check for network errors in the Image Library Service log:

```bash
sudo grep -i "timeout\|connection reset\|broken pipe" /var/log/pf9/glance-api.log | tail -50
```

### Recover a Stuck Image

If an image is stuck in `queued` and you have verified that none of the above conditions apply, delete the stuck image record and re-upload:

```bash
pcdctl image delete <IMAGE_UUID>
```

Then retry the upload. If the upload consistently stalls on re-upload, contact [Platform9 Support](https://support.platform9.com/) with the image UUID and the relevant entries from `/var/log/pf9/glance-api.log`.

### Differences Between UI and CLI Upload Failures

| Upload Path                                                                 | What to Check                                                                                                                                                                                                          |
| --------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| UI upload button is inactive                                                | Image Library Service certificate is not trusted in your browser — see [Image Library Service Certificate Configuration](/private-cloud-director/images-and-image-library/image-library-certificate-configuration.md). |
| UI upload starts but stalls                                                 | Check disk space and service health as described above.                                                                                                                                                                |
| CLI upload returns `SSL: CERTIFICATE_VERIFY_FAILED`                         | Add `--insecure` to the `pcdctl image create` command.                                                                                                                                                                 |
| CLI upload returns `Connection refused` or `Unable to establish connection` | The Image Library endpoint is not reachable — see [Image Library Service Endpoint Health](#image-library-service-endpoint-health).                                                                                     |

***

## Image Library Service Endpoint Health <a href="#image-library-service-endpoint-health" id="image-library-service-endpoint-health"></a>

### Overview

When the Image Library Service endpoint is unreachable, image uploads fail and VM provisioning that requires fetching an image from the Image Library host will also fail. The most common symptoms are:

* `pcdctl image create` returns a connection error or HTTP 502 / 503.
* A `curl` request to the Image Library endpoint returns an NGINX error page (502 Bad Gateway, 503 Service Unavailable).
* The **Service Health** dashboard in the UI shows the Image Library Service as unhealthy.

### Check the Image Library Service Endpoints

List the registered Image Library Service endpoints and verify that they are enabled:

```bash
pcdctl endpoint list --service glance
pcdctl endpoint list --service glance-cluster
```

An endpoint with `Enabled: False` will not serve requests. If any endpoint shows as disabled, re-enable it or reassign the Image Library role to the host.

### Test the Endpoint Directly

Use `curl` to check whether the Image Library endpoint responds:

```bash
curl -sk https://<IMAGE_LIBRARY_HOST_IP_OR_FQDN>:9292/
```

* A JSON response that includes version information indicates the endpoint is healthy.
* An NGINX error page (HTML with "502 Bad Gateway" or "503 Service Unavailable") means the NGINX reverse proxy on the Image Library host is running but the upstream `pf9-glance-api` process is not accepting connections.

### Check the Service on the Image Library Host

When the endpoint returns an NGINX error, the `pf9-glance-api` service on the Image Library host has likely stopped or is not listening:

```bash
sudo systemctl status pf9-glance-api
sudo systemctl status pf9-imagelibrary
```

Restart both services if either is not active:

```bash
sudo systemctl restart pf9-glance-api
sudo systemctl restart pf9-imagelibrary
```

After restarting, check the log for startup errors:

```bash
sudo tail -100 /var/log/pf9/glance-api.log
```

### Review Logs

The primary log for Image Library Service issues is on the Image Library host:

```
/var/log/pf9/glance-api.log
```

For audit events (upload outcomes, user identity, image UUID):

```
/var/log/pf9/glance-audit.log
```

{% hint style="info" %}
**Self-Hosted deployments only**

In Self-Hosted deployments, the management-plane `glance-api` pod in the region namespace provides the endpoint that the Image Library host proxies through. If the host-level service appears healthy but the endpoint still returns errors, check the management-plane pod:

```bash
kubectl get pods -o wide -n <REGION_NAMESPACE> | grep glance
kubectl logs -n <REGION_NAMESPACE> <GLANCE_API_POD>
```

In SaaS deployments, the management-plane components are operated by Platform9. If endpoint errors persist after verifying host-level service health, contact [Platform9 Support](https://support.platform9.com/).
{% endhint %}

### Escalation

If service restarts do not resolve the issue and the endpoint continues to return errors, contact [Platform9 Support](https://support.platform9.com/) with:

* Output of `pcdctl endpoint list --service glance`
* Output of `sudo systemctl status pf9-glance-api`
* The last 200 lines of `/var/log/pf9/glance-api.log`

***

## Persistent Storage Backend and Image Caching at Scale <a href="#persistent-storage-backend-and-image-caching-at-scale" id="persistent-storage-backend-and-image-caching-at-scale"></a>

### Overview

The Image Library Service supports two storage backends: a local file store and a Persistent Storage Service (block storage) volume. When the block storage backend is configured, image data is stored on a block storage volume rather than on the Image Library host's local filesystem. This is the recommended configuration for production environments and for Image Library High Availability.

For background on configuring the block storage backend and enabling Image Library High Availability, see [Image Library High Availability](/private-cloud-director/images-and-image-library/image-library-high-availability.md) and [Block Storage High Availability](/private-cloud-director/storage/block-storage/block-storage-high-availability.md).

### How Hypervisor Image Caching Works

When a virtual machine is provisioned for the first time using a given image, the hypervisor (compute host) must fetch the image data from the Image Library host over the network. After the first successful boot from that image on a given hypervisor, the hypervisor retains a local copy of the image in its image cache. Subsequent VM deployments using the same image on the same hypervisor use the cached copy and avoid the full image transfer.

This caching behavior has important implications for bulk and parallel VM deployments:

* **First deployment of an image on a hypervisor is the most I/O-intensive.** If many VMs are deployed simultaneously using an image that has never been cached on those hypervisors, all of them will attempt to fetch the full image from the Image Library host at the same time.
* **Image Library host bandwidth is the bottleneck.** The Image Library host (or hosts, in an HA setup) must serve the full image to every hypervisor that does not yet have a cached copy. With many hypervisors requesting simultaneously, this can saturate the Image Library host's network interface and cause VM provisioning to time out or fail.
* **Block storage backend improves throughput.** When the Image Library Service uses a block storage volume as its backend, the image data is fetched from the storage array rather than from the Image Library host's local disk, which can improve parallelism for large deployments.

### Recommendations for Bulk Deployments

When you need to deploy many VMs simultaneously — for example, during a large-scale workload rollout — consider the following:

* **Pre-warm the cache.** Before the bulk deployment, deploy a single VM on each hypervisor using the target image. This populates the cache on all hypervisors. Subsequent bulk deployments will use the cached copy and avoid the simultaneous image-fetch bottleneck.
* **Use Image Library HA with shared storage.** An Image Library deployment with multiple hosts sharing a block storage backend distributes read requests across hosts, reducing the load on any single host during the initial image fetch. See [Image Library High Availability](/private-cloud-director/images-and-image-library/image-library-high-availability.md).
* **Stage bulk deploys in waves.** If pre-warming is not practical, deploy VMs in smaller batches rather than all at once, allowing each batch to complete its image fetch before the next batch starts.

### Troubleshoot Slow or Failing Bulk Deploys

If VM provisioning fails or times out during a large bulk deployment and image fetching is suspected:

1. Check the Image Library host's network utilization during the deployment window. Sustained high network I/O concurrent with provisioning failures points to an image-fetch bottleneck.
2. Check `/var/log/pf9/glance-api.log` on the Image Library host for errors or timeouts corresponding to the failed provisioning requests.
3. Verify that the Image Library Service endpoint is healthy before the deployment: see [Image Library Service Endpoint Health](#image-library-service-endpoint-health).
4. If the Image Library uses a block storage backend, verify that the storage volume is healthy and that the Persistent Storage Service is operating normally.

If issues persist, contact [Platform9 Support](https://support.platform9.com/) with the Image Library host logs and a description of the deployment scale (number of VMs, number of hypervisors, image size).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.platform9.com/private-cloud-director/images-and-image-library/troubleshooting-and-log-files/image-service-troubleshooting-guide.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
