Analyzes LVMS must-gather data to diagnose storage issues
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
scripts/analyze_lvms.pyThis skill provides detailed guidance for analyzing LVMS (Logical Volume Manager Storage) must-gather data to identify and troubleshoot storage issues.
Use this skill when:
This skill is automatically invoked by the /lvms:analyze command when working with must-gather data.
Required:
namespaces/openshift-lvm-storage/ (newer versions)namespaces/openshift-storage/ (older versions)pip install pyyamlNamespace Compatibility:
openshift-storage to openshift-lvm-storage in recent versionsMust-Gather Structure:
must-gather/
└── registry-{image-registry}-lvms-must-gather-{version}-sha256-{hash}/
├── cluster-scoped-resources/
│ ├── core/
│ │ └── persistentvolumes/
│ │ └── pvc-*.yaml # Individual PV files
│ ├── storage.k8s.io/
│ │ └── storageclasses/
│ │ ├── lvms-vg1.yaml
│ │ └── lvms-vg1-immediate.yaml
│ └── security.openshift.io/
│ └── securitycontextconstraints/
│ └── lvms-vgmanager.yaml
├── namespaces/
│ └── openshift-lvm-storage/ # or openshift-storage for older versions
│ ├── oc_output/ # IMPORTANT: Primary location for LVMS resources
│ │ ├── lvmcluster.yaml # Full LVMCluster resource with status
│ │ ├── lvmcluster # Text output (oc describe)
│ │ ├── lvmvolumegroup # Text output
│ │ ├── lvmvolumegroupnodestatus # Text output
│ │ ├── logicalvolume # Text output
│ │ ├── pods # Text output (oc get pods)
│ │ └── events # Text output
│ ├── pods/
│ │ ├── lvms-operator-{hash}/
│ │ │ └── lvms-operator-{hash}.yaml
│ │ └── vg-manager-{hash}/
│ │ └── vg-manager-{hash}.yaml
│ └── apps/ # May contain deployments/daemonsets
└── ...
Key Note: LVMS resources are primarily in the oc_output/ directory, with lvmcluster.yaml being the most important file containing full cluster and node status.
Before running analysis, verify the must-gather directory structure:
# Check if LVMS namespace directory exists (try both namespaces)
ls {must-gather-path}/namespaces/openshift-lvm-storage 2>/dev/null || \
ls {must-gather-path}/namespaces/openshift-storage
# Verify required resource directories
ls {must-gather-path}/cluster-scoped-resources/core/persistentvolumes
Namespace Detection: The analysis script automatically detects which namespace is present:
openshift-lvm-storageopenshift-storageCommon Issue: User provides parent directory instead of subdirectory
must-gather.local.12345/registry-ci-openshift-org-origin-4-18.../Handling:
# If user provides parent directory, try to find the correct subdirectory
if [ ! -d "{path}/namespaces/openshift-lvm-storage" ] && \
[ ! -d "{path}/namespaces/openshift-storage" ]; then
# Try to find either namespace
find {path} -type d \( -name "openshift-lvm-storage" -o -name "openshift-storage" \) -path "*/namespaces/*"
# Suggest the correct path to user
fi
Use the Python analysis script for structured analysis:
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path}
Script Location:
plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.pyComponent-Specific Analysis:
For focused analysis on specific components:
# Analyze only storage/PVC issues
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component storage
# Analyze only operator health
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component operator
# Analyze only volume groups
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component volumes
# Analyze only pod logs
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component logs
The script provides structured output across several sections:
1. LVMCluster Status
Key fields to check:
state: Should be "Ready"ready: Should be trueconditions: All should have status "True"
Example healthy output:
LVMCluster: lvmcluster-sample
✓ State: Ready
✓ Ready: true
Conditions:
✓ ResourcesAvailable: True
✓ VolumeGroupsReady: True
Example unhealthy output (real case from must-gather):
LVMCluster: my-lvmcluster
❌ State: Degraded
❌ Ready: false
Conditions:
✓ ResourcesAvailable: True
Reason: ResourcesAvailable
Message: Reconciliation is complete and all the resources are available
❌ VolumeGroupsReady: False
Reason: VGsDegraded
Message: One or more VGs are degraded
2. Volume Group Status
Checks volume group creation per node and device availability:
Example output (real case from must-gather):
Volume Group/Device Class: vg1
Nodes: 3
Node: ocpnode1.ocpiopex.growipx.com
⚠ Status: Progressing
Devices: /dev/mapper/3600a098038315048302b586c38397562, /dev/mapper/mpatha
Excluded devices: 24 device(s)
- /dev/sdb: /dev/sdb has children block devices and could not be considered
- /dev/sdb4: /dev/sdb4 has an invalid filesystem signature (xfs) and cannot be used
- /dev/mapper/3600a098038315047433f586c53477272: has an invalid filesystem signature (xfs)
... and 21 more excluded devices
Node: ocpnode2.ocpiopex.growipx.com
❌ Status: Degraded
Reason:
failed to create/extend volume group vg1: failed to extend volume group vg1:
WARNING: VG name vg0 is used by VGs VVnkhP-khYQ-blyc-2TNo-d3cv-b6di-4RbSyY and EUV3xv-ft6q-39xK-J3ki-rglf-9H44-rVIHIq.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
Physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' is already in volume group 'vg0'
Unable to add physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' to volume group 'vg0'
... (truncated, see LVMCluster status for full details)
Devices: /dev/mapper/mpatha
This real example shows a common LVMS issue: duplicate volume group names preventing VG extension.
3. Storage (PVC/PV) Status
Lists pending or failed PVCs:
Example output:
Pending PVCs:
database/postgres-data
❌ Status: Pending (10m)
Storage Class: lvms-vg1
Requested: 100Gi
Recent Events:
⚠ ProvisioningFailed: no node has enough free space
4. Operator Health
Checks LVMS operator pods, deployments, and daemonsets:
Example issues:
❌ vg-manager-abc123 (worker-0)
Status: CrashLoopBackOff
Restarts: 15
Error: volume group "vg1" not found
5. Pod Logs
Extracts and analyzes error/warning messages from pod logs:
Example output (from real must-gather):
═══════════════════════════════════════════════════════════
POD LOGS ANALYSIS
═══════════════════════════════════════════════════════════
Pod: vg-manager-nz4pc
Unique errors/warnings: 1
❌ 2025-10-28T10:47:28Z: Reconciler error
Controller: lvmvolumegroup
Error Details:
failed to create/extend volume group vg1: failed to extend volume group vg1:
WARNING: VG name vg0 is used by VGs WsNJwk-DK3q-tSHg-zvQJ-imF1-SdRv-8oh4e0 ...
Cannot use /dev/dm-10: device is too small (pv_min_size)
Command requires all devices to be found.
Pod: lvms-operator-65df9f4dbb-92jwl
Unique errors/warnings: 1
❌ 2025-10-28T10:52:48Z: failed to validate device class setup
Controller: lvmcluster
Error: VG vg1 on node Degraded is not in ready state (ocpnode1.ocpiopex.growipx.com)
Key Points:
Connect related issues to identify root causes:
Common Pattern 1: Device Filesystem Conflict
Chain of failures:
1. Device /dev/sdb has existing ext4 filesystem
2. vg-manager cannot create volume group
3. Volume group missing on node
4. PVCs stuck in Pending
Root cause: Device not properly wiped before LVMS use
Common Pattern 2: Insufficient Capacity
Chain of failures:
1. Thin pool at 95% capacity
2. No free space for new volumes
3. PVCs stuck in Pending
Root cause: Insufficient storage capacity or old volumes not cleaned up
Common Pattern 3: Node-Specific Failures
Chain of failures:
1. Volume group missing on specific node
2. TopoLVM CSI driver not functional on that node
3. PVCs with node affinity to that node stuck Pending
Root cause: Node-specific device configuration issue
Based on analysis results, provide prioritized recommendations:
CRITICAL Issues (Fix Immediately):
Device Conflicts:
# Clean device on affected node
oc debug node/{node-name}
chroot /host wipefs -a /dev/{device}
# Restart vg-manager to recreate VG
oc delete pod -n openshift-lvm-storage -l app.kubernetes.io/component=vg-manager
Pod Crashes:
# After fixing underlying issue, restart failed pods
oc delete pod -n openshift-lvm-storage {pod-name}
LVMCluster Not Ready:
# Review and fix device configuration
oc edit lvmcluster -n openshift-lvm-storage
# Ensure devices match actual available devices
WARNING Issues (Address Soon):
Capacity Issues:
# Check logical volume usage
oc debug node/{node} -- chroot /host lvs --units g
# Remove unused volumes or expand thin pool
Partial Node Coverage:
# Investigate why daemonsets not on all nodes
oc get nodes --show-labels
oc describe daemonset -n openshift-lvm-storage
Always provide clear next steps:
Review logs (if available in must-gather):
namespaces/openshift-lvm-storage/pods/lvms-operator-*/logs/namespaces/openshift-lvm-storage/pods/vg-manager-*/logs/namespaces/openshift-lvm-storage/pods/topolvm-*/logs/Verify fixes (if cluster is accessible):
# After implementing fixes, verify:
oc get lvmcluster -n openshift-lvm-storage
oc get lvmvolumegroup -A
oc get pvc -A | grep Pending
Re-collect must-gather (if making changes):
oc adm must-gather --image=quay.io/lvms_dev/lvms-must-gather:latest
Script not found:
# Verify script exists
ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
# Ensure it's executable
chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
Python dependencies missing:
# Install PyYAML
pip install pyyaml
# Or use pip3
pip3 install pyyaml
Invalid YAML in must-gather:
Missing directories:
Incomplete must-gather:
# Run comprehensive analysis
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/registry-ci-openshift-org-origin-4-18.../
Output:
═══════════════════════════════════════════════════════════
LVMCLUSTER STATUS
═══════════════════════════════════════════════════════════
LVMCluster: lvmcluster-sample
❌ State: Failed
❌ Ready: false
...
═══════════════════════════════════════════════════════════
LVMS ANALYSIS SUMMARY
═══════════════════════════════════════════════════════════
❌ CRITICAL ISSUES: 3
- LVMCluster not Ready (state: Failed)
- Volume group vg1 not created on worker-0
- 3 PVCs stuck in Pending state
# Focus on PVC issues
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/... --component storage
Analyzes only:
# Check operator components
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/... --component operator
Analyzes only:
Always validate path first:
namespaces/openshift-lvm-storage/ directoryRun full analysis first:
Correlate issues:
Check timestamps:
Provide actionable output:
Reference documentation: