From tao-skill-bank
Runs NVIDIA's three-phase training pipeline: AutoML HPO, DEFT iterative data improvement loop (RCA→SDG→mining→retrain), and post-DEFT AutoML refinement. Bridges tao-run-automl and tao-run-deft-aoi skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tao-skill-bank:tao-run-automl-deft-pipelineThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
A workflow-bridge skill that runs **three phases** in sequence by delegating to two existing skills — `tao-run-automl` for HPO and a DEFT application skill (default `tao-run-deft-aoi` for AOI; other `skills/applications/deft-*` skills for non-AOI cases) for the iterative data-improvement loop.
A workflow-bridge skill that runs three phases in sequence by delegating to two existing skills — tao-run-automl for HPO and a DEFT application skill (default tao-run-deft-aoi for AOI; other skills/applications/deft-* skills for non-AOI cases) for the iterative data-improvement loop.
This skill does not re-implement AutoML or DEFT. It owns only the connective tissue: HPO spec inputs, the spec-handoff between AutoML and DEFT, and the post-DEFT AutoML re-run on the augmented dataset.
tao-run-deft-aoi directly. The bare DEFT loop is the inner stage of this pipeline.tao-run-deft-aoi directlytao-run-automl directlyPhase 1 (AutoML baseline) Phase 2 (DEFT loop, plain train) Phase 3 (AutoML refinement)
───────────────────────── ──────────────────────────────── ───────────────────────────
specs/baseline_spec.yaml (Phase 1 winner pre-seeds baseline ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv — DEFT skips its baseline train) train_combined_iter${N}.csv
│ │ │
▼ ▼ ▼
[ AutoML HPO sweep ] [ DEFT: baseline-inference → RCA [ AutoML HPO sweep ]
N recommendations → iter 1..N (plain retrain) ] re-tunes HPs against the
pick best by val_loss / FAR RCA / route / SDG / mining DEFT-augmented dataset
│ │ │
▼ ▼ ▼
best HPs spec + ckpt ─────► DEFT-augmented CSV ───────────► final best checkpoint
+ iter winner checkpoint (the deliverable; no
(Phase 3 warm-starts from it) further retrain)
The two handoffs are:
specs/baseline_spec.yaml, copies the checkpoint into ${RESULTS_DIR}/baseline/train/, and pre-populates deft_state.json / loop_log.jsonl so DEFT skips its baseline train and resumes at baseline inference → evaluate → RCA → iter 1. DEFT stays plain-train (automl_policy: off preserved).train_combined_iter${N_final}.csv) AND the iter winner's checkpoint — the checkpoint is wired into each rec's train.pretrained_model_path so Phase 3 fine-tunes from Phase 2's winner. Phase 3's winning checkpoint is the deliverable; no separate retrain after Phase 3.See references/phase-handoffs.md for the exact steps, code, and DEFT-honors-this-handoff details of both handoffs.
specs/baseline_spec.yaml was hand-authored with — usually not optimal.Running all three: AutoML cheap-tunes once on the original data, DEFT does the heavy data work with reasonable HPs, then AutoML tunes again on the now-richer dataset. Phase 3 is the most important of the three for the final deployed FAR/recall.
The pipeline is sequential. Total wall-clock ≈ Phase 1 (N_automl × per-rec train) + Phase 2 (M iterations × per-iter cost) + Phase 3 (N_automl × per-rec train).
Note that Phase 2 has no separate baseline train — Phase 1's winning checkpoint is reused as DEFT's baseline, so the baseline cost lands inside Phase 1's N_automl trainings rather than as an extra retrain. Surface this to the user before kickoff. Typically Phase 2's iterations still dominate (each includes SDG + retrain), but Phase 1 and Phase 3 each add several hours on a single-GPU box. Use the per-job estimate from the user's setup (if they have one) rather than guessing minutes. See references/pitfalls-and-quality-checks.md (Compute budget) for the per-phase term breakdown.
The pipeline has exactly one user gate. Before any side-effecting action (docker pull, docker login, any job-launch call delegated to a downstream skill, file mutations under ${RESULTS_DIR}/), the agent must produce a single consolidated Pre-Flight Summary that subsumes every downstream skill's preflight. Once the user approves, the run is autonomous through all three phases — no further interactive pauses.
The user explicitly does not want to be paged between phases. The DEFT loop's own inline ## Pre-Flight Summary gate becomes a zero-question display step (every value pre-supplied from this consolidated gate), as does tao-run-automl's shared launch preflight in Phase 1 and Phase 3.
Before printing the summary, the agent must open and read every downstream skill's preflight section in full, run every read-only check those sections prescribe, and surface the outcome of each check. The summary has nine mandatory sections (workspace/host/platform/network; credentials status; container images; dataset table; Phase 1 config; Phase 2 config; Phase 3 config; compute estimate; confirmation line). After the gate, every downstream interactive gate is suppressed by passing through the collected values. The only allowed post-gate pauses are mid-run hard-stop safety gates the downstream skill cannot bypass.
See references/consolidated-preflight.md for: the full list of preflight sections to read, the required DEFT ## Pre-Flight run, the exact nine-section summary contents, the value pass-through for gate suppression, and the procedure when the skill bank version doesn't yet support gate suppression.
Invoke tao-skill-bank:tao-run-automl with:
| Input | AOI default | Notes |
|---|---|---|
network_arch | visual-changenet | Same model the DEFT loop expects |
train_dataset_uri | <workspace>/train/base/training_set.csv | Same training set DEFT will start from |
eval_dataset_uri | <workspace>/train/base/validation_set.csv | Held-out — must NOT be the KPI test set (<workspace>/kpi/testing_set.csv), since that set is reserved for DEFT's final reporting |
metric | FAR @ 100% recall (preferred) or val_loss | See Metric pitfalls in references/pitfalls-and-quality-checks.md — ChangeNet AOI is class-imbalanced, val_loss alone can mode-collapse |
algorithm | bayesian | LLM-brain or autoresearch if compute is tight |
automl_max_recommendations | 5–10 for AOI | More recs = better HPs but linear in compute |
spec_overrides | Pin epochs / batch_size; sweep optimizer-related HPs only | Otherwise AutoML wanders into long-train regimes that blow Phase 2's budget |
After the sweep finishes, AutoML's result["best"]["specs"] is the winning hyperparameter dict.
Phase 1 hands over two artifacts: the winning spec and the winning checkpoint. Retraining the same HPs in DEFT's baseline step is wasted compute — instead, pre-seed DEFT's baseline state from Phase 1's outputs so DEFT starts at baseline inference → evaluate → RCA → iter 1. This is a four-step bridge (write merged spec → pre-seed baseline/train/ → initialise deft_state.json with baseline already done → invoke DEFT), followed by a quality check of the winning checkpoint (per-class prediction counts; compare to zero-shot ChangeNet).
See references/phase-handoffs.md for the verbatim Steps 1–4 (including the cp command, the deft_state.json patch code, and the loop_log.jsonl append) and the quality-check checklist.
Invoke tao-skill-bank:tao-run-deft-aoi (read its SKILL.md for the full interface). For non-AOI applications, invoke the matching DEFT skill; the handoff shape is the same.
The DEFT loop's baseline-train sub-step is skipped. Phase 1 already produced a checkpoint trained at the winning HPs, and Phase 1's handoff (see references/phase-handoffs.md) pre-populated ${RESULTS_DIR}/baseline/train/ and ${RESULTS_DIR}/deft_state.json so DEFT resumes at baseline inference → evaluate → RCA → iter 1. The rest of the DEFT loop runs unchanged. Do not modify its automl_policy: off invariant.
The DEFT loop owns: its Pre-Flight Summary display step (not a fresh user gate — the Consolidated Pre-Flight above is the single gate; the DEFT summary still prints as an audit-trail display of the pre-seeded baseline/train/ source and must not re-prompt); baseline inference → evaluate → RCA on the pre-seeded checkpoint; the full per-iteration RCA → routing → SDG → mining → assemble → train cycle; KPI gating and stop conditions; and the ${RESULTS_DIR}/ layout (deft_state.json, loop_log.jsonl, DEFT_Loop_Report.html).
After the loop exits (KPI met or max_iterations reached), capture two values from deft_state.json: iterations.<best>.best_ckpt_path (the loop's best plain-train checkpoint) and the final iteration label N_final (used to locate the augmented training CSV).
If the DEFT loop hard-stops on an unrecoverable gate, skip Phase 3. There is no validated augmented CSV to feed AutoML.
Re-invoke tao-skill-bank:tao-run-automl with the augmented training CSV as the train dataset, the same held-out validation CSV as before, and Phase 2's iter winner checkpoint as the warm-start:
| Input | AOI value |
|---|---|
network_arch | visual-changenet |
train_dataset_uri | ${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv |
eval_dataset_uri | Same as Phase 1 (<workspace>/train/base/validation_set.csv) — keep the comparison apples-to-apples |
metric | Same metric as Phase 1 |
algorithm | Same as Phase 1 |
automl_max_recommendations | 5–10 |
| Initial spec | Start from <workspace>/specs/baseline_spec_automl.yaml (Phase 1's winner) — gives the sweep a strong centroid to refine around |
| Warm-start checkpoint | iterations.<best>.best_ckpt_path from ${RESULTS_DIR}/deft_state.json — set spec_overrides["train"]["pretrained_model_path"] to this path. Each Phase 3 rec then fine-tunes from Phase 2's winner instead of training from scratch. |
The warm-start is mandatory: without it every rec starts from random init with only 10-20 epochs to reconverge, val_loss regresses by 0.03-0.05 vs iter1, and the _pick_best safety net silently rolls back to the iter winner. Output goes to ${RESULTS_DIR}/final_automl/; the winning checkpoint of this sweep is the pipeline's deliverable. After the sweep, register Phase 3's checkpoint under iterations.final_automl in deft_state.json and re-run prepare_inference_spec.py so the handoff sees it (falling back to the loop's best if Phase 3 regressed).
See references/phase-handoffs.md for: the full "why the warm-start is mandatory" rationale and tradeoff, the concrete spec_overrides selection code, the exact two-step wiring of Phase 3's output back into the DEFT report, and the safety note on regression.
These apply to both AutoML phases. Bake them into agent behavior — don't just paste once. The full detail lives in references/pitfalls-and-quality-checks.md; in brief:
val_loss winner can be a mode-collapsed model. Prefer FAR @ 100%-recall directly, or guard val_loss with a pred_counts sanity check, or eval top-K by FAR @ 100%-recall before picking. For balanced / regression tasks, val_loss is fine.<workspace>/kpi/testing_set.csv), which is reserved for DEFT's final reporting. Phase 3 trains on the augmented CSV but keeps the same validation set so Phase 1 and Phase 3 numbers stay comparable.N_automl × per-rec train; Phase 2 M_iter × (RCA + SDG + mining + retrain) (usually largest); Phase 3 N_automl × per-rec train on the larger augmented dataset. Ask the user for their per-job time before quoting wall-clock.When starting fresh from "run the AOI workflow", the agent presents a three-phase plan to the user (Phase 1 AutoML baseline → Phase 2 DEFT loop → Phase 3 AutoML refinement), states the total cost structure (no extra baseline retrain at the front, no extra retrain at the end), asks for the user's per-run time for a wall-clock estimate, and waits for approval. After confirmation it invokes Phase 1, writes the merged spec, pre-seeds deft_state.json, invokes the DEFT loop with every input pre-supplied, then invokes Phase 3 — with no further pauses unless a downstream skill hits an unrecoverable hard-stop. It summarizes the trajectory at the end (baseline AutoML best → DEFT iter 1 → ... → DEFT iter N_final → Phase 3 best).
See references/quick-start-example.md for the verbatim customer-facing message block and the exact post-confirmation invocation sequence.
Same three-phase pattern applies to other DEFT skills. Swap:
network_arch to the relevant modelThe handoff shape — Phase 1 emits a spec + checkpoint (the checkpoint pre-seeds the DEFT baseline), Phase 2 consumes both and emits an augmented dataset, Phase 3 emits the final checkpoint — is identical. The Phase 1 → Phase 2 baseline-skip mechanism is generic: any DEFT-style loop that exposes a resumable baseline state can be seeded the same way.
tao-skill-bank:tao-run-automl — AutoML interface, algorithms, HP rangestao-skill-bank:tao-run-deft-aoi — full DEFT AOI loop (Phase 2 default)tao-skill-bank:tao-train-visual-changenet — underlying ChangeNet train/eval/infer skill (used by both AutoML and DEFT)skills/applications/deft-* skills — non-AOI Phase 2 targetsreferences/consolidated-preflight.md — the single-gate preflight in fullreferences/phase-handoffs.md — both handoffs, baseline pre-seed, and Phase 3 warm-start, verbatimreferences/pitfalls-and-quality-checks.md — metric pitfalls, run-to-run noise, leakage, compute budgetreferences/quick-start-example.md — the customer-facing worked-example messagenpx claudepluginhub nvidia-tao/tao-skills-bank --plugin tao-daft-processCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.