From full
Validates materials simulations in three stages — pre-flight config checks, runtime log monitoring for NaN/Inf/collapse, and post-flight result validation (physical bounds, conservation, convergence). Use when launching, monitoring, or debugging simulations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/full:simulation-validatorThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Provide a three-stage validation protocol: pre-flight checks, runtime monitoring, and post-flight validation for materials simulations.
CHANGELOG.mdevals/evals.jsonevals/files/config.jsonevals/files/crash.logevals/files/results.jsonevals/files/run.logevals/files/simulation.jsonevals/files/simulation.logreferences/log_patterns.mdreferences/validation_protocol.mdscripts/failure_diagnoser.pyscripts/preflight_checker.pyscripts/result_validator.pyscripts/runtime_monitor.pyProvide a three-stage validation protocol: pre-flight checks, runtime monitoring, and post-flight validation for materials simulations.
Before running validation scripts, collect from the user:
| Input | Description | Example |
|---|---|---|
| Config file | Simulation configuration (JSON/YAML) | simulation.json |
| Log file | Runtime output log | simulation.log |
| Metrics file | Post-run metrics (JSON) | results.json |
| Required params | Parameters that must exist | dt,dx,kappa |
| Valid ranges | Parameter bounds | dt:1e-6:1e-2 |
Is simulation about to start?
├── YES → Run Stage 1: preflight_checker.py
│ └── BLOCK status? → Fix issues, do NOT run simulation
│ └── WARN status? → Review warnings, document if accepted
│ └── PASS status? → Proceed to run simulation
│
Is simulation running?
├── YES → Run Stage 2: runtime_monitor.py (periodically)
│ └── Alerts? → Consider stopping, check parameters
│
Has simulation finished?
├── YES → Run Stage 3: result_validator.py
│ └── Failed checks? → Do NOT use results
│ → Run failure_diagnoser.py
│ └── All passed? → Results are valid
| Metric | Conservative | Standard | Relaxed |
|---|---|---|---|
| Mass tolerance | 1e-6 | 1e-3 | 1e-2 |
| Residual growth | 2x | 10x | 100x |
| dt reduction | 10x | 100x | 1000x |
| Script | Output Fields |
|---|---|
scripts/preflight_checker.py | report.status, report.blockers, report.warnings |
scripts/runtime_monitor.py | alerts, residual_stats, dt_stats (alerts include NaN/Inf/overflow detection, residual growth, and dt collapse) |
scripts/result_validator.py | checks, confidence_score, failed_checks, status (PASS / FAIL / INSUFFICIENT_DATA); confidence_score is null when no check ran |
scripts/failure_diagnoser.py | probable_causes, recommended_fixes |
scripts/preflight_checker.py --config simulation.jsonNote:
preflight_checker.pyvalidates required keys, numeric ranges, output-directory access, and disk space. It does not evaluate numerical stability (CFL / diffusion-Fourier). For explicit stability gating useskills/core-numerical/numerical-stability/scripts/cfl_checker.py.
python3 scripts/preflight_checker.py \
--config simulation.json \
--required dt,dx,kappa \
--ranges "dt:1e-6:1e-2,dx:1e-4:1e-1" \
--min-free-gb 1.0 \
--json
scripts/runtime_monitor.py --log simulation.log periodicallypython3 scripts/runtime_monitor.py \
--log simulation.log \
--residual-growth 10.0 \
--dt-drop 100.0 \
--json
scripts/result_validator.py --metrics results.jsonpython3 scripts/result_validator.py \
--metrics results.json \
--bound-min 0.0 \
--bound-max 1.0 \
--mass-tol 1e-3 \
--json
For variational / gradient-flow models (Allen-Cahn, Cahn-Hilliard), add
--variational to enforce a strict monotone non-increasing energy check.
When validation fails:
python3 scripts/failure_diagnoser.py --log simulation.log --json
User: My phase field simulation crashed after 1000 steps. Can you help me figure out why?
Agent workflow:
python3 scripts/failure_diagnoser.py --log simulation.log --json
python3 scripts/runtime_monitor.py --log simulation.log --json
| Error | Cause | Resolution |
|---|---|---|
Config not found | File path invalid | Verify config path exists |
Non-numeric value | Parameter is not a number | Fix config file format |
out of range | Parameter outside bounds | Adjust parameter or bounds |
Output directory not writable | Permission issue | Check directory permissions |
Insufficient disk space at <path> | Disk nearly full on the output volume | Free up space or reduce output |
Invalid parameter name | --required name has disallowed characters | Use only letters, digits, _, ., - |
range max ... must be greater than min | Inverted/degenerate --ranges or bounds | Ensure max > min |
must be a finite positive number | nan/inf/negative threshold supplied | Pass a finite positive value |
Log file too large | Log exceeds the 500 MB parse cap | Truncate or pre-filter the log |
| Status | Meaning | Action |
|---|---|---|
| PASS | All checks passed | Proceed with confidence |
| WARN | Non-critical issues found | Review and document |
| BLOCK | Critical issues found | Must fix before proceeding |
| Score | Meaning |
|---|---|
| 1.0 | All validation checks passed → proceed with confidence |
| 0.75+ | Most checks passed, minor issues |
| 0.5-0.75 | Significant issues, review carefully |
| < 0.5 | Major problems, do not trust results |
null (status INSUFFICIENT_DATA) | No recognized metrics fields; no check ran — NOT a pass. Inspect the metrics file. |
A requested bound (--bound-min/--bound-max) with no matching field_min/field_max
in the metrics is reported as a failed bounds_unverifiable check, never a vacuous pass.
For variational/gradient-flow runs, pass --variational (or set "energy_variational": true
in the metrics) to enforce a strict monotone non-increasing energy check (energy_monotone);
otherwise a weaker energy_net_decrease check is used, which does not detect mid-run spikes.
| Pattern in Log | Likely Cause | Recommended Fix |
|---|---|---|
| NaN, Inf, overflow | Numerical instability | Reduce dt, increase damping |
| max iterations, did not converge | Solver failure | Tune preconditioner, tolerances |
| out of memory | Memory exhaustion | Reduce mesh, enable out-of-core |
| dt reduced | Adaptive stepping triggered | May be okay if controlled |
Do not trust a validation verdict until each applicable item below is satisfied with the concrete artifact named. Record these in your summary to the user.
result_validator.py --json and confirmed results.status is PASS (not INSUFFICIENT_DATA) AND results.confidence_score == 1.0; a null score or INSUFFICIENT_DATA means no check ran — treat as unverified, not as a pass.results.checks and confirmed every requested check actually appears (e.g. mass_conserved, bounds_satisfied, no_nan, and energy_monotone/energy_net_decrease); confirmed results.failed_checks is empty and contains no bounds_unverifiable entry (which means a requested bound had no field_min/field_max to compare against).--variational (or set "energy_variational": true) so energy_monotone is enforced; recorded that the weaker energy_net_decrease was NOT relied on, since it cannot detect mid-run energy spikes.--mass-tol, default 1e-3) and confirmed it matches the Conservative/Standard/Relaxed column appropriate to the run; did not silently accept the default for a tight-conservation problem.runtime_monitor.py --json and recorded residual_stats (min/max/last) and dt_stats; confirmed there are no alerts for NaN/Inf/overflow, residual growth above --residual-growth, or dt collapse below --dt-drop.core-numerical/numerical-stability/scripts/cfl_checker.py (CFL/Fourier limit) — preflight_checker.py does NOT evaluate CFL/Fourier and a PASS preflight says nothing about temporal/spatial stability.FAIL or alert, ran failure_diagnoser.py --json and recorded the probable_causes/recommended_fixes, rather than reusing the results.| Tempting shortcut | Why it's wrong / what to do |
|---|---|
| "Preflight passed, so the run is numerically stable." | preflight_checker.py checks required keys, ranges, output-dir writability, and disk space only. It does NOT compute CFL/Fourier. Gate stability with cfl_checker.py separately. |
"result_validator printed a confidence score, so results are good." | An empty or unrecognized metrics file returns confidence_score: null and status INSUFFICIENT_DATA — that is "no check ran", not a pass. Verify recognized fields are present and status == PASS. |
| "Energy ends lower than it started, so the dissipative run is fine." | The default energy_net_decrease only compares first vs last and misses mid-run spikes. For gradient-flow models use --variational to enforce the strict monotone energy_monotone check. |
"I asked for bounds and didn't get a bounds_satisfied: false, so bounds hold." | If field_min/field_max are absent the validator emits bounds_unverifiable (a FAILED check), never a vacuous pass. Ensure the metrics file actually carries the field extrema. |
| "The simulation finished without crashing, so the results are trustworthy." | Run completion is not correctness. Verify mass conservation, energy behavior, physical bounds, and a clean runtime_monitor alert list before using results. |
| "dt got smaller during the run, so the solver is failing." | runtime_monitor dt-collapse is direction-aware (running-max vs current) and only alerts past --dt-drop; a controlled adaptive ramp is expected. Check the actual dt_stats and whether an alert fired. |
| "I'll just use the default thresholds." | Defaults (--mass-tol 1e-3, --residual-growth 10, --dt-drop 100) are the Standard column; a conservation-critical problem needs the Conservative tolerances. Pick thresholds for the physics, then record them. |
--required parameter names are validated against a safe-character allowlist (^[A-Za-z0-9_.-]+$); names with shell metacharacters are rejected--ranges entries are parsed as name:min:max with finite numeric bounds enforced and max > min required--min-free-gb is validated as a finite positive number (negatives, zero, nan, inf rejected)--residual-growth and --dt-drop thresholds are validated as finite positive numbers--bound-min and --bound-max are validated as finite numbers (nan/inf rejected), and --bound-max > --bound-min is enforced; --mass-tol is validated as a finite positive numberpreflight_checker.py reads a single user-specified config file (JSON/YAML) and checks disk space on the volume hosting the resolved output directoryruntime_monitor.py reads a single log file specified by --log; log files are size-limited (500 MB max) and rejected before parsing if largerresult_validator.py reads a single metrics file (JSON) specified by --metricsfailure_diagnoser.py reads a single log file specified by --log; log files are size-limited (500 MB max) before parsingpreflight_checker.py, runtime_monitor.py, result_validator.py, failure_diagnoser.py) with explicit argument listseval(), exec(), or dynamic code generationshell=True)failure_diagnoser.py uses hardcoded, pre-compiled diagnostic regex patterns; runtime_monitor.py accepts optional --residual-pattern / --dt-pattern overrides that are compiled with re.compile (no eval) and applied only to the user's own logreferences/validation_protocol.md - Detailed checklist and criteriareferences/log_patterns.md - Common failure signatures and regex patternsnpx claudepluginhub heshamfs/materials-simulation-skills --plugin core-numericalExtracts, analyzes, and summarizes simulation output data including field extraction, time-series trends, line profiles, statistical summaries, derived quantities, and comparison to reference data.
Designs rigorous numerical simulations with formal V&V: defines mathematical models, selects methods (Monte Carlo, FDM, FEM), specifies convergence criteria, and quantifies uncertainty.
Designs and executes Monte Carlo simulations to evaluate finite-sample properties of statistical estimators including bias, RMSE, coverage, size, and power.