Failed to Deploy Virtual Machine

This guide provides step-by-step instructions for troubleshooting and resolving issues when creating a virtual machine (VM) fails in Private Cloud Director.

Various VM deployment Methods

  • Launch an instance from an Image to quickly deploy a pre-configured environment.

  • Launch an instance from a New Volume to create a fresh setup with dedicated storage.

  • Launch an instance from an Existing Volume to utilize previously used storage for seamless continuation.

  • Launch an instance from a VM snapshot to capture current state and restore it precisely as it was.

  • Launch an instance from a Volume Snapshot to ensure data integrity by reverting to a specific point in time.

Most common causes

  • Insufficient resources (CPU, RAM, storage).

  • Incorrect network configurations or security groups.

  • Unavailable or corrupted images.

  • Issues with the scheduler or compute nodes.

  • Permission or quota restrictions.

  • Virtualised cluster mismatches.

Deep Dive

The Private Cloud Director VM creation process is similar for all VM deployment methods mentioned earlier, and the workflow is orchestrated primarily by the Compute service (Nova). This flow involves a series of steps with critical validations at each stage to ensure the request is valid, resources are available, and the VM is provisioned correctly.

circle-info

Below logs can be only reviewed in Self Hosted Private Cloud Director. For SAAS model, kindly contact the Platform9 Support Team

1

User Request & API Validation

This is the initial stage where the user's request is received and authenticated.

  • User Request: A user submits a request to create a VM (also called an instance) via the OpenStack CLI, Private Cloud Director dashboard, or direct API call. Key parameters are specified, including the image, flavor, network, security group, and key pair.

  • Keystone Authentication: The request is sent to the nova-api-osapi Pod, which immediately validates the user's authentication token with the Identity service (keystone). This ensures the user is who they claim to be. The output below shows the initial VM creation request was successfully received by the Nova API and was accepted with a 202 status code.

circle-info

Here a unique REQ_ID will be generated, which will be further used for tracking the request in other component logs.

  • Authorization & Quota Checks: The Nova API performs two key validations:

    • Authorization: It verifies that the user has the necessary permissions to create a VM within the specified project.

    • Quota Check: It confirms the project has enough available resources (vCPUs, RAM, instances, etc.) to fulfil the request based on the chosen flavor.

  • Initial Database Entry: The database name is nova. The nova-conductor service is the only service that writes to the database. The other Compute services access the database through the nova-conductor service. If all checks pass, nova-conductor creates a database record for the new VM and sets its status to BUILDING(None).

2

Scheduling & Resource Selection

After the initial validation, the request is sent to the Nova Scheduler, which decides where to place the VM.

  • Message Queue: The Nova API sends a message to the Nova Scheduler via a message queue (RabbitMQ), containing all the VM's requirements.

  • The Nova scheduler queries the Placement API to find a suitable resource provider (compute node) that has enough resources based on host filters and host weighing.

  • Host Filtering: The nova-scheduler begins by filtering out unsuitable hosts. This process checks for:

    • Resource availability: It ensures the host has sufficient free RAM, disk space, and vCPUs.

    • Compatibility: It verifies the host is compatible with the image properties and any specific requirements.

    • Availability Zones: It confirms the host is in the requested availability zone.

    • Image Metadata: It checks the image metadata if there is a specific metadata filter for the image. E.g. Images with metadata SRIOV, vTPM, etc.

    • Many more other filters.

circle-info

Other filters:

Details on Nova filters are available on Scheduler filtersarrow-up-right.

  • Host Weighing: The remaining hosts are then ranked based on a weighting system. This can be configured to prioritise hosts with the least load or those that have been least recently used to ensure balanced resource distribution.

circle-exclamation
  • Placement Reservation: The nova-scheduler service queries placement API to fetch eligible compute nodes. Once a host is selected, the scheduler makes a provisional allocation by creating a "claim" via: PUT /allocations/[VM_UUID]. Placement API PUT requests will have VM allocation ID logs that look like the following:

The nova-scheduler pod logs can be reviewed against the request ID captured from nova-api-osapi the pod. In the snippet below, VM requests will verify a suitable host for VM deployment.

  • The Nova-scheduler sends the database update request with the host information to Nova-conductor which further updates the database and sets VM status to BUILDING (Scheduling). Then the request is passed to the Nova-compute service.

3

Compute & Final Service-Level Validation

The Nova-compute service on the selected host performs the final provisioning steps.

  • Resource Allocation: The Nova Compute service receives the scheduling decision and begins allocating resources. It interacts with:

    • Glance: It requests the VM image. Validation occurs here as glance-api pod can perform a signature check to ensure the image's integrity. If an image is not available, then it errors out. Below is an example of an GET image request.

  • Neutron: It requests network resources, and neutron-server pod validates that the specified network and security groups exist and are accessible to the user. It then allocates a virtual network interface and an IP address. Below example shows the IP and network interface port information. Nova-conductor further updates the database and sets VM status to BUILDING(Networking).

  • Cinder (if applicable): If a persistent boot volume is requested, Cinder validates that the volume is available and attaches it to the VM. Nova-conductor further updates the database and sets VM status to BUILDING(Block_Device_Mapping).

  • Hypervisor Instruction: Once all resources are confirmed, nova-compute instructs the pf9-ostackhost service on the hypervisor (Libvirtd KVM) to create the VM using the image, flavor, and other parameters. The VM then boots. The pf9-ostackhost logs look like the below, which outline details like claim successful, device path, network information, time elapsed to spawn an instance, etc.

4

VM Configuration & Finalization

The final step involves configuring the guest OS and updating the status.

  • Cloud-init: As the VM boots, Cloud-init runs with 169.254.169.254 IP address and retrieves metadata from Nova. The cloud-init logs are available within the VM. It performs validations on this metadata before:

    • Injecting the SSH key.

    • Configuring networking and the hostname.

    • Executing any custom user data scripts.

  • Status Update: The nova-compute service updates the VM's status in the database to ACTIVE, indicating a successful creation. The VM is now ready for the user to access.

Procedure

circle-info

OpenStack CLI references virtual machines as 'server'.

1

Get the VM status

Use the PCD UI or CLI to check the error message. Look for status and fault fields to understand the issue.

2

Validate Compute Service Status

Get the Compute Service state and ensure it is up and status is enabled.

3

Trace the VM Events

Retrieve the Request ID i.e. REQ_ID from the server event list details, which uniquely identifies the request. This REQ_ID is displayed in the first column of the server events list command and helps track request failures.

circle-exclamation
4

Review the Pods and their logs on the Management plane

circle-info

Step 4 is applicable only for Self Hosted Private Cloud Director

Management plane has Pods like Nova-api-osapi, Nova-scheduler and Nova-conductor. Review all these pods:

  • Check if they are in "CrashLoopBackOff/OOMkilled/Pending/Error/Init" state.

  • Verify if all containers in the pods are Running.

  • See the events section in pod describe output.

  • Review pods logs using REQ_ID or VM_UUID for relevant details.

5

Validate the Image and Flavor

Check if the image is available and not corrupted. Ensure the resources requested in the flavor are available on the underlying hosts.

6

Validate the service status on the affected VM's underlying hypervisor

Validate if services listed below are running on the Underlying Hypervisor:

7

Check the logs on the affected VM's hypervisor

  • Compute Node: Ostackhost logs are responsible for provisioning the compute resource required by the VM. Review the latest logs and search for REQ_ID or VM_UUID.

  • Cinder Storage Node: cindervolume-base logs are responsible for provisioning the storage resources required by the VM. Review the latest logs and search for REQ_ID or VM_UUID.

  • Network Node: pf9-neutron-ovn-metadata-agent logs are responsible for provisioning the connectivity and networking resources required by the VM. Review the latest logs and search for REQ_ID or VM_UUID.

If these steps prove insufficient to resolve the issue, kindly reach out to the Platform9 Support Teamarrow-up-right for additional assistance.

Last updated

Was this helpful?