image service troubleshooting guide

Problem

A troubleshooting guide for image services is needed to address frequent issues with image management in cloud environments, such as image upload failures, slow performance, and incorrect metadata. The guide must provide clear, actionable steps for diagnosing and resolving common errors to ensure the reliability and availability of the image service.

Environment

Private Cloud Director Virtualization - v2025.4 and Higher
Self-Hosted Private Cloud Director Virtualization - v2025.4 and Higher
Component - PCD Image service

Deep Dive

Image Creation Flow

The image creation process in space.vars.product_name is managed by the Glance service. The flow begins when a user uploads a new image file, which is then processed and stored.

User Request

A user initiates an image upload via the OpenStack CLI, space.vars.product_name dashboard, or direct API call. The request includes the image file and metadata (e.g., name, format, disk format).

API Service

The Glance API service receives the request, validates the user's authentication token with Keystone, and checks for permissions and quotas. Below Glance API logs show the token is being used.

INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.

Image Validation and Upload Request

The glance service validates the image format and confirms that its virtual size (can be fetched using the command qemu-img info <image_name>.qcow2 on glance host) meets the requirements. Then the Image PUT /v2/images/<IMAGE_UUID>/file request is placed to upload image data to a temporary staging area, ideally at the default staging location (If default directory changed, then check the custom location) /var/lib/glance/os_glance_staging_store/.

INFO glance.location [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Image format matched and virtual size computed: 41126400
INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "PUT /v2/images/[IMAGE_UUID]/file HTTP/1.0" 204 468 2.400140

Glance API to Registry

The API service then communicates with the Glance Registry, which creates a new entry for the image in the Glance database. The status is set to queued.

Glance Service

The Glance API hands off the request to the pf9-glance-api service, which moves the image data from the staging area to the backend storage (e.g., Swift, Ceph, or a local file system) default image file storage location /var/opt/imagelibrary/data/glance/.

Status Update

Once the image is successfully stored, the Glance Store service updates the image's status in the database from queued to active. The image is now ready for use. The host glance audit logs (/var/log/pf9/glance-audit.log) show information about the request Username, Image UUID, outcome, etc.

INFO oslo.messaging.notification.audit.http.response [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] {"message_id": "[Audit_Message_ID]", "publisher_id": "glance-api", "event_type": "audit.http.response", "priority": "INFO", "payload": {"typeURI": "http://schemas.dmtf.org/cloud/audit/1.0/event", "eventType": "activity", "id": "[Activity_ID]", "eventTime": "[..]", "action": "update", "outcome": "success", "observer": {"id": "target"}, "initiator": {"id": "[..]", "typeURI": "service/security/account/user", "name": "[USER_NAME]", "credential": {"token": "***", "identity_status": "Confirmed"}, "host": {"address": "127.0.0.1", "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36"}, "project_id": "[PROJECT_UUID]"}, "target": {"id": "unknown", "typeURI": "unknown", "name": "unknown"}, "requestPath": "/v2/images/[IMAGE_UUID]/file", "tags": ["correlation_id?value=[..]"], "reason": {"reasonType": "HTTP", "reasonCode": "204"}, "reporterchain": [{"role": "modifier", "reporterTime": "[..]", "reporter": {"id": "target"}}]}, "timestamp": "[..]"}

Image Deletion Flow

The deletion process also uses the Glance services to remove the image's data and its database entry.

User Request

A user sends a deletion request via the OpenStack CLI, space.vars.product_name dashboard, or direct API call. The request includes the image information (e.g Image ID or Name).

API Service

The Glance API service receives the request, validates the user's authentication token with Keystone, and performs a permission check, and changes the image's status in the database to deleting. Below Glance API logs show the token is being used.

INFO glance.api.v2.image_data [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.

Glance Service

The pf9-glance-api service receives a message to delete the image from the backend storage. The Glance API hands off the request to the pf9-glance-api service, which moves the image data from backend storage (e.g., Swift, Ceph, or a local file system) default image file storage location /var/opt/imagelibrary/data/glance/. This is a crucial step that frees up disk space.

INFO eventlet.wsgi.server [None [REQ-ID] [USER_ID] [TENANT_ID] - - default default] 127.0.0.1 - - [..] "DELETE /v2/images/[IMAGE_UUID] HTTP/1.0" 204 468 1.924124

Final Status Update

Once the data is confirmed to be deleted from the backend store, the Glance API removes the image's database entry, completing the deletion process.

Procedure

The following steps outline how to troubleshoot the image issue.

Review image details

Review image details like status and any errors.

$ openstack image show <IMAGE_UUID>

Validate Glance endpoints

Validate if the glance image endpoints are available and the public endpoint is responding using a curl request. This curl request should return the glance information.

$ openstack endpoint list --service glance
$ openstack endpoint list --service glance-cluster
$ curl -s https://<FQDN>/glance/

Check if the image service is enabled

$ openstack service list | grep -i image
$ openstack service show <GLANCE/GLANCE_CLUSTER_UUID>

Check glance-api pod (Self-Hosted only)

The management plane has a glance-api pod to provide the image service. Check if the glance-api pod is running in the workload region namespace. Review this pod:

Info

Step 4 is applicable only for Self-Hosted Private Cloud Director

Check if they are in "CrashLoopBackOff/OOMkilled/Pending/Error/Init" state.
Also, verify if all containers in the pods are Running.
See the events section in pod describe output.
Review pods logs using REQ_ID or VM_UUID for relevant details.

$ kubectl get pods -o wide -n <WORKLOAD_REGION> | grep -i "glance"

$ kubectl describe -n <WORKLOAD_REGION> <GLANCE_API_POD>

$ kubectl logs -n <WORKLOAD_REGION> <GLANCE_API_POD>

Validate pf9-glance-api service on host

Validate if the pf9-glance-api service is running on the host where glance role is applied.

$ sudo systemctl status pf9-glance-api

Review host logs

On the host, review the /var/log/pf9/glance-api.log to track the relevant events against a specific image ID.

Escalation

If these steps prove insufficient to resolve the issue, kindly reach out to the Platform9 Support Team for additional assistance.

Most common causes

Ensure that the glance Pre-Requisites are met.
While uploading image admin.rc file does not have the OS_INTERFACE variable set to the admin.
Incorrect Image format. Ref - Supported Image Format.
Pf9-glance-api service is down on underlying host.
In case of Self-Hosted PCD the --insecure flag was not used while using the OpenStack command as the Glance node uses self-signed certificates.

Previousimage creation failed using cli NextAdvance Configuration

Last updated 1 month ago

Was this helpful?

Good morning

hashtagProblem

hashtagEnvironment

hashtagDeep Dive

hashtagImage Creation Flow

hashtagUser Request

hashtagAPI Service

hashtagImage Validation and Upload Request

hashtagGlance API to Registry

hashtagGlance Service

hashtagStatus Update

hashtagImage Deletion Flow

hashtagUser Request

hashtagAPI Service

hashtagGlance Service

hashtagFinal Status Update

hashtagProcedure

hashtagReview image details

hashtagValidate Glance endpoints

hashtagCheck if the image service is enabled

hashtagCheck glance-api pod (Self-Hosted only)

hashtagValidate pf9-glance-api service on host

hashtagReview host logs

hashtagEscalation

hashtagMost common causes

Problem

Environment

Deep Dive

Image Creation Flow

User Request

API Service

Image Validation and Upload Request

Glance API to Registry

Glance Service

Status Update

Image Deletion Flow

User Request

API Service

Glance Service

Final Status Update

Procedure

Review image details

Validate Glance endpoints

Check if the image service is enabled

Check glance-api pod (Self-Hosted only)

Validate pf9-glance-api service on host

Review host logs

Escalation

Most common causes