Analyze OpenShift bare metal installation failures in Prow CI jobs using dev-scripts artifacts. Use for jobs with "metal" in name, for debugging Metal3/Ironic provisioning, installation, or dev-scripts setup failures. You may also use the prow-job-analyze-install-failure skill with this one.
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill helps debug OpenShift bare metal installation failures in CI jobs by analyzing dev-scripts logs, libvirt console logs, sosreports, and other metal-specific artifacts.
Use this skill when:
This skill is invoked by the main prow-job-analyze-install-failure skill when it detects a metal job.
Metal IPI jobs use dev-scripts (https://github.com/openshift-metal3/dev-scripts) with Metal3 and Ironic to install OpenShift:
The installation process has multiple layers:
Failures can occur at any layer, so analysis must check all of them.
IMPORTANT: The term "disconnected" refers to the cluster nodes, NOT the hypervisor.
When analyzing failures in "metal-ipi-ovn-ipv6" jobs:
gcloud CLI Installation
which gcloudgcloud Authentication (Optional)
test-platform-results bucket is publicly accessibleThe user will provide:
Metal jobs produce several diagnostic archives:
{target}/ofcir-acquire/build-log.txt: Log showing pool, provider, and host detailsartifacts/junit_metal_setup.xml: JUnit with test [sig-metal] should get working host from infra provider{target}/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/.openshift_install*.log) will also be present in the devscripts folders.{target}/baremetalds-devscripts-gather/artifacts/{target}/baremetalds-devscripts-gather/artifacts/bootstrap/journals/ironic.log and bootstrap/journals/metal3-baremetal-operator.log
control-plane/{node-ip}/containers/metal3-ironic-*.log and control-plane/{node-ip}/containers/metal3-baremetal-operator-*.log
{target}/baremetalds-devscripts-gather/artifacts/{target}/baremetalds-devscripts-gather/artifacts/Download OFCIR logs
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/ofcir-acquire/build-log.txt .work/prow-job-analyze-install-failure/{build_id}/logs/ofcir-build-log.txt --no-user-output-enabled 2>&1 || echo "OFCIR build log not found"
gcloud storage cp gs://test-platform-results/{bucket-path}/artifacts/{target}/ofcir-acquire/artifacts/junit_metal_setup.xml .work/prow-job-analyze-install-failure/{build_id}/logs/junit_metal_setup.xml --no-user-output-enabled 2>&1 || echo "OFCIR JUnit not found"
Check junit_metal_setup.xml for acquisition failure
[sig-metal] should get working host from infra providerExtract OFCIR details from build-log.txt
pool: The OFCIR pool nameprovider: The infrastructure providername: The host name allocatedIf OFCIR acquisition failed
Download dev-scripts logs directory
gcloud storage cp -r gs://test-platform-results/{bucket-path}/artifacts/{target}/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/ .work/prow-job-analyze-install-failure/{build_id}/logs/devscripts/ --no-user-output-enabled
Handle missing dev-scripts logs gracefully
Find and download libvirt-logs.tar
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "libvirt-logs\.tar$"
gcloud storage cp {full-gcs-path-to-libvirt-logs.tar} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
Extract libvirt logs
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/libvirt-logs.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Download sosreport (optional)
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "sosreport.*\.tar\.xz$"
gcloud storage cp {full-gcs-path-to-sosreport} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/sosreport-{name}.tar.xz -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Download squid-logs (optional, for IPv6/disconnected jobs)
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "squid-logs.*\.tar$"
gcloud storage cp {full-gcs-path-to-squid-logs} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/squid-logs-{name}.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Check dev-scripts logs FIRST - they show what happened during setup and installation.
Read dev-scripts logs in order
.openshift_install*.log files in the devscripts directoriesKey errors to look for:
.openshift_install*.log) present in devscripts foldersImportant distinction:
Save dev-scripts analysis:
.work/prow-job-analyze-install-failure/{build_id}/analysis/devscripts-summary.txtCRITICAL: Check the RIGHT Ironic logs based on what failed
The log bundle contains TWO sets of Ironic logs in different locations:
Which logs to check:
bootstrap/journals/ironic.logcontrol-plane/{ip}/containers/metal3-ironic-*.logDownload and extract log bundle
gcloud storage ls -r gs://test-platform-results/{bucket-path}/artifacts/ 2>&1 | grep "log-bundle.*\.tar$"
gcloud storage cp {full-gcs-path-to-log-bundle.tar} .work/prow-job-analyze-install-failure/{build_id}/logs/ --no-user-output-enabled
tar -xf .work/prow-job-analyze-install-failure/{build_id}/logs/log-bundle-*.tar -C .work/prow-job-analyze-install-failure/{build_id}/logs/
Find ALL Ironic logs
# Bootstrap Ironic (master provisioning)
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/bootstrap/journals/ironic.log"
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/bootstrap/journals/metal3-baremetal-operator.log"
# Control-plane Ironic (worker provisioning) - CRITICAL for worker failures
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/control-plane/*/containers/metal3-ironic-*.log"
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -path "*/control-plane/*/containers/metal3-baremetal-operator-*.log"
Analyze the Ironic logs:
For Master Provisioning Issues (check bootstrap logs):
bootstrap/journals/ironic.log and bootstrap/journals/metal3-baremetal-operator.logFor Worker Provisioning Issues (check control-plane logs):
control-plane/{node-ip}/containers/metal3-ironic-*.logMap node UUIDs to BareMetalHost names:
b7fa5b83-91d0-46ee-acd2-e4b33e9ac983)Save Ironic analysis:
.work/prow-job-analyze-install-failure/{build_id}/analysis/ironic-summary.txtConsole logs are CRITICAL for metal failures during cluster creation.
Find console logs
find .work/prow-job-analyze-install-failure/{build_id}/logs/ -name "*console*.log"
{cluster-name}-bootstrap_console.log, {cluster-name}-master-{N}_console.logAnalyze console logs for boot/provisioning issues:
Console logs show the complete boot sequence:
Save console log analysis:
.work/prow-job-analyze-install-failure/{build_id}/analysis/console-summary.txtOnly needed for hypervisor-level issues.
Check sosreport for hypervisor diagnostics:
var/log/messages - Hypervisor system logsos_commands/ - Output of diagnostic commandsetc/libvirt/ - Libvirt configurationLook for hypervisor-level issues:
Important for debugging CI access to the cluster.
Check squid proxy logs:
Common issues:
Create comprehensive metal analysis report:
Metal Installation Failure Analysis
====================================
Job: {job-name}
Build ID: {build_id}
Prow URL: {original-url}
Installation Method: dev-scripts + Metal3 + Ironic
OFCIR Host Acquisition
----------------------
Pool: {pool name from OFCIR build log}
Provider: {provider from OFCIR build log}
Host: {host name from OFCIR build log}
Status: {Success or Failure}
{If OFCIR acquisition failed, note that installation never started}
Dev-Scripts Analysis
--------------------
{Summary of dev-scripts logs}
Key Findings:
- {First error in dev-scripts setup}
- {Related errors}
If dev-scripts failed: The problem is in the setup process (host config, Ironic, installer build)
If dev-scripts succeeded: The problem is in cluster installation (see main analysis)
Console Logs Analysis
---------------------
{Summary of VM/node console logs}
Bootstrap Node:
- {Boot sequence status}
- {Ignition status}
- {Network configuration}
- {Key errors}
Master Nodes:
- {Status for each master}
- {Key errors}
Hypervisor Diagnostics (sosreport)
-----------------------------------
{Summary of sosreport findings, if applicable}
Proxy Logs (squid)
------------------
{Summary of proxy logs, if applicable}
Note: Squid logs show CI access to the cluster, not cluster's registry access
Metal-Specific Recommended Steps
---------------------------------
Based on the failure:
For dev-scripts setup failures:
- Review host configuration (networking, DNS, storage)
- Check Ironic/Metal3 setup logs for BMC/provisioning issues
- Verify installer build completed successfully
- Check installer logs in devscripts folders
For console boot failures:
- Check Ignition configuration and network connectivity
- Review kernel boot messages for hardware issues
- Verify network configuration (DHCP, DNS, routing)
For CI access issues:
- Check squid proxy logs for failed CI connections to cluster
- Verify network routing between CI and cluster
- Check proxy configuration
Artifacts Location
------------------
Dev-scripts logs: .work/prow-job-analyze-install-failure/{build_id}/logs/devscripts/
Console logs: .work/prow-job-analyze-install-failure/{build_id}/logs/
sosreport: .work/prow-job-analyze-install-failure/{build_id}/logs/sosreport-*/
squid logs: .work/prow-job-analyze-install-failure/{build_id}/logs/squid-logs-*/
Save report:
.work/prow-job-analyze-install-failure/{build_id}/analysis/metal-analysis.txt| Issue | Symptoms | Where to Look |
|---|---|---|
| Dev-scripts host config | Early failure before cluster creation | Dev-scripts logs (host configuration step) |
| Ironic/Metal3 setup | Provisioning failures, BMC errors | Dev-scripts logs (Ironic setup) |
| BMC communication | BareMetalHost stuck registering, power state failures | Ironic logs (in log-bundle), BareMetalHost status |
| Node boot failure | VMs/nodes won't boot | Console logs (kernel, boot sequence) |
| Ignition failure | Nodes boot but don't provision | Console logs (Ignition messages) |
| Network config | DHCP failures, DNS issues | Console logs (network messages), dev-scripts host config |
| CI access issues | Tests can't connect to cluster | squid logs (proxy logs for CI → cluster access) |
| Hypervisor issues | Resource constraints, libvirt errors | sosreport (system logs, libvirt config) |
.openshift_install*.log files in devscripts directories