Troubleshooting Host Onboarding Issues

This document describes how to identify and resolve issues that occur when onboarding a host using the pcdctl prep-node command.

Most Common Causes

  • Residual configuration from a prior failed node preparation can cause role assignment or installation to fail.

  • The user running the prep-node command may not have proper sudo privileges, causing command failures.

  • Packages are corrupted or not installed properly; dpkg or apt lock files preventing package installation or updates during prep-node execution. Review /var/log/dpkg.log to verify whether packages are partially installed or misconfigured.

  • Incorrect proxy configurations provided to prep-node can block the download of required packages or scripts.

  • Firewalld or other firewall rules may block required ports, preventing communication with the management plane.

  • NTP is not synchronized, which may cause authentication or communication failures.

  • Connectivity to the Private Cloud Director management plane controller is broken or unreachable.

Steps To Troubleshoot

The pcdctl prep-node without a prior configuration prompts interactively for account information that can be directly retrieved from the GUI -> Infrastructure -> Cluster Hosts -> Add a New Host. The GUI is pre-filled with the Account URL, Username, Region, and Tenant for the logged-in user.

All the configuration details like the Platform9 Account URL, Username, Password, Region, and Tenant are persisted in a local config.json file under /pf9/db/.

Logs for pcdctl command execution are stored in the /pf9/logs/pcdctl-<DATE>.log file.

1

Review pcdctl logs

Start by reviewing the pcdctl log file to trace the exact error.

"msg":"Received a call to fetch keystone authentication for fqdn: https://[FQDN] and user: [USER] and tenant: [TENANT], mfa_token: <br>"}
"msg":"Error calling keystone API:Post \"https://[FQDN]/keystone/v3/auth/tokens?nocatalog\": dial tcp: lookup example1.pcd.platform9.co on 127.0.0.53:53: no such host<br>"}
2

Verify input values

Ensure to provide all the details correctly, without typos and extra space. The installer authenticates with the Keystone API using these values. Any incorrect entry can cause authentication failure or DNS resolution errors.

3

Verify network connectivity

Verify that the host has outbound network connectivity to the internet and the Private Cloud Director management plane controller:

  • $ curl -s https://<FQDN>

  • $ ping www.google.com

  • $ telnetwww.google.com443

4

Gather verbose logs

Execute pcdctl prep-node with the --verbose flag to gather detailed logs of the host preparation process, including each command executed, checks performed, and any warnings or errors encountered.

5

Validate prerequisites and host checks

Ensure the primary prerequisites are met. Review the checks below for additional validation.

  • Verify the host is running a supported Ubuntu version. Currently, Platform9 supports Ubuntu 22.04 and 24.04 for Private Cloud Director host onboarding.

$ sudo cat /etc/os-release | grep -E '^NAME=|^VERSION_ID='
  • Confirm the host has sufficient CPU cores and memory. Minimum 8 CPU cores and 16 GB RAM are recommended for host onboarding.

# CPU cores
$ sudo grep -c ^processor /proc/cpuinfo

# Total memory
$ sudo free -h | grep Mem:
  • Verify the root partition (/) has adequate free space. Minimum 250 GB of free disk space is required.

$ sudo df -h /
  • Ensure no other package manager (e.g., apt or dpkg) is running in the background. If either command returns a running process, wait for it to finish or terminate it before continuing.

# Review if any dpkg, apt process is held
$ sudo lsof /var/lib/dpkg/lock
$ sudo lsof /var/lib/apt/lists/lock

# Review package manager logs
$ sudo cat /var/log/dpkg.log 
$ sudo cat /var/log/apt/history.log
  • Confirm the root or current user has passwordless sudo privileges; the user must have unrestricted sudo privileges for all operations.

  • Check the status of firewalld to ensure it does not block Platform9 service communication. Platform9 recommends stopping and disabling firewalld on these hosts using sudo systemctl stop firewalld and sudo systemctl disable firewalld.

  • Ensure NTP is enabled for accurate time synchronization across hosts. Verify if systemd-timesyncd is already running on the host, as it provides basic time synchronization.

6

Proxy configuration (if applicable)

For hosts using a proxy server for outbound connectivity, ensure that the /etc/environment file has required variables configured. Also configure the package manager (apt) to properly fetch required packages through the proxy server; refer to the linked documentation.

7

Verify hostagent installation and logs

As the last step, the hostagent package is downloaded and installed on the host. Verify the pf9-hostagent.service status and /var/log/pf9/hostagent.log file to track the progress.

$ sudo service pf9-hostagent status
$ sudo cat /var/log/pf9/hostagent.log
8

Post-onboarding: authorization and role assignment

After onboarding the host to the Private Cloud Director, it can be Authorized & Assigned Roles. This involves downloading and installation of service specific packages and service initialization, which can be monitored through /var/log/pf9/hostagent.log. The .deb packages are downloaded inside /var/cache/pf9apps directory.

9

Contact support

If these steps prove insufficient to resolve the issue, reach out to the Platform9 Support team for additional assistance.

Re-Onboarding Recovery

Overview

When a host fails to onboard or re-onboard, the cause is often stale agent state left over from a previous attempt. This can include a broken libvirtd configuration, a mismatched host identity file, leftover host configuration, or partially applied role state. Cleaning up that state before retrying gives the onboarding process a clean starting point.

In this section, you will identify and remove stale host state, resolve broken libvirtd conditions, and re-onboard the host cleanly.

Decommission the Host Before Re-Onboarding

If the host was previously onboarded and is reachable, use pcdctl to decommission it cleanly before removing its state. This avoids leaving stale compute service records in the management plane database.

Do NOT use the -r / --skip-installed-role-check flag. If roles are still applied, deauthorize them from the Private Cloud Director UI first (Infrastructure > Cluster Hosts, select the host, click Edit Roles, remove all roles), then run pcdctl decommission-node.

See Hypervisor Role Deauthorization and Reauthorization for the full decommission guidelines.

Remove Stale Host Identity and Configuration

After decommissioning, or if the host cannot be decommissioned cleanly, remove the identity and configuration files that persist host state across re-onboarding attempts:

Removing /etc/pf9/host_id.conf causes the management plane to register the host as a new host on the next pcdctl prep-node run. The old host entry in the UI can be deleted from Infrastructure > Cluster Hosts after re-onboarding succeeds.

Resolve Broken libvirtd State

A failed or interrupted role application can leave libvirtd in a broken state that prevents the Hypervisor role from applying cleanly on re-onboarding. Symptoms include:

  • pf9-ostackhost service failing to start after role assignment.

  • Errors in /var/log/pf9/hostagent.log mentioning libvirtd socket or connection refused.

  • virsh list returning connection errors.

To check and reset libvirtd:

If libvirtd still fails to start, check its log for the specific error:

Common causes of a broken libvirtd state after a failed role application:

  • Stale QEMU hook scripts left in /etc/libvirt/hooks/ from a prior Platform9 installation.

  • Corrupted /etc/libvirt/libvirtd.conf settings written by a failed role application.

If the libvirtd configuration is corrupted, restore the defaults and restart:

After libvirtd is running cleanly, re-attempt role assignment from the Private Cloud Director UI.

Re-Onboard the Host

Once stale state is cleared and libvirtd is healthy, re-run the onboarding command:

Provide the account URL, username, region, and tenant (project) when prompted, or pass them as flags. After prep-node completes, authorize the host and assign roles from Infrastructure > Cluster Hosts in the UI.

Monitor onboarding progress in the hostagent log:

Look for converge successful or role application complete messages. If the host gets stuck in converging state, see Diagnose a Host Agent Stuck in Converging State.

If re-onboarding fails again after clearing stale state, contact the Platform9 Support team with the output of /var/log/pf9/hostagent.log and the pcdctl log from /pf9/logs/.

Last updated

Was this helpful?