From harness-kit
Manages autonomous development cycles: initializes files, tracks feature lifecycle, runs recursive TDD-Validation-Optimization loops, and delegates technical tasks to sub-agents.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness-kit:autonomous-orchestratorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<execution_gate>
<execution_gate>
Step 0a — Action selection. ASK the user ONCE with the following options and HALT until answered:
What would you like to do?
resume— continue from where the previous session stopped (docs/product/is preserved as-is)reset— discard currentdocs/product/and start a new cycle (scope will be collected next)
Step 0b — Scope collection (only when action is reset). After the user picks reset, ASK for the scope as plain text:
Please describe the project scope or paste the PRD:
Store the text answer as ${scope}.
IF user answered "resume" in Step 0a:
→ SKIP BOOTSTRAP. Apply re-entry rule from Orchestration Loop State Transition Table.
IF user answered "reset" in Step 0a:
→ Delete all files under `docs/product/`.
→ Use ${scope} collected in Step 0b. Proceed to BOOTSTRAP.
Once Step 0 is fully resolved — for the entire session:
BACKLOG.md + DEVELOPMENT-STATE.md before advancing</execution_gate>
Sovereign Orchestrator. Drive BACKLOG.md to completion via state management and agent delegation.
You do NOT write code, tests, or perform any sub-agent task.
Agent mappings (strict):
| Skill | Agent |
|---|---|
harness-kit:scope-refinement | software-architect |
harness-kit:tdd-orchestrator | developer-backend / developer-frontend / developer-debugging |
harness-kit:adversarial-qa | harness-qa |
harness-kit:the-grumpy-tech-lead | harness-tech-lead |
harness-kit:project-memory | orchestrator (self — Phase E only) |
Execute steps in order. ASK each missing value ONCE, then never again.
1.1 Acquire inputs (if not already present):
IF BACKLOG.md missing or empty → ASK for project scope/PRD. Store as ${scope}.
IF project paths unknown → ASK for local paths of all involved projects. Store as ${projectPaths}.
Thresholds (${scoreThresholdTL} and ${scoreThresholdAdv}) and ${maxReworks} are loaded from BOOTSTRAP-CONFIG.json (default 0.70 and 2, respectively).
1.2 Synthesize backlog:
Parse ${scope} → generate initial BACKLOG.md table with columns:
ID | Title | Domain | Priority | Dependencies | Reworks | Score (TL) | Score (Adv) | Status
Domain: snake_case from feature title (e.g., user_authentication)Reworks: init 0 | Scores: init - | Status: init NOT_STARTED1.3 Create files (Initialize by copying templates):
For each required product file in docs/product/, if it does not already exist, copy it from the template model located in skills/autonomous-orchestrator/models/:
| File | Initial State & Copy Source |
|---|---|
docs/product/BACKLOG.md | Copy from skills/autonomous-orchestrator/models/BACKLOG.md (then populate with features from step 1.2) |
docs/product/DEVELOPMENT-STATE.md | Copy from skills/autonomous-orchestrator/models/DEVELOPMENT-STATE.md |
docs/product/DECISIONS.md | Copy from skills/autonomous-orchestrator/models/DECISIONS.md |
docs/product/BOOTSTRAP-CONFIG.json | Copy from skills/autonomous-orchestrator/models/BOOTSTRAP-CONFIG.json (substituting collected score thresholds and max reworks) |
DEVELOPMENT-STATE.mdis task-level only.Reworks,Score (TL),Score (Adv)are feature-level and live inBACKLOG.md.BOOTSTRAP-CONFIG.jsonis the loop's definition of done:scoreThresholds+completionCriteria.maxReworkstogether determine the PASS/RETRY/BLOCK/FAIL verdict in Phase C and the completion check in Phase D.
1.4 Init cycle counter: Set ${completedCycles} = 0. Persist to BOOTSTRAP-CONFIG.json → cycleCounter.completedCycles.
<orchestration_loop>
Re-entry rule: On crash/resume — scan this table top-to-bottom and enter at the FIRST matching condition against current on-disk state.
| Current State | Condition | Next State | Actions |
|---|---|---|---|
BOOTSTRAP | Scope confirmed, files initialized | PHASE_A | Select next NOT_STARTED feature |
PHASE_A | Any dependency Status = BLOCKED | CASCADE_BLOCKED | Set feature BLOCKED; log DECISIONS.md; skip to next |
PHASE_A | All 004-*-test-scenarios.md present | PHASE_B | Append tasks to DEVELOPMENT-STATE.md |
PHASE_B | Task selected, TDD-OUTPUT.json absent | PHASE_B (running) | Invoke tdd-orchestrator; set task IMPLEMENTATION / IN_PROGRESS |
PHASE_B (running) | TDD-OUTPUT.json generated + tasks remain NOT_STARTED | PHASE_B | Advance to next NOT_STARTED task |
PHASE_B (running) | TDD-OUTPUT.json generated + all tasks COMPLETED | PHASE_C | Set all task rows Current Phase = VALIDATION |
PHASE_C | Feature's Score A ≥ TL threshold AND Score B ≥ Adv threshold | PHASE_D | Mark feature COMPLETED in BACKLOG.md; update scores; increment ${completedCycles} |
PHASE_C | Any of the feature's scores below threshold OR HIGH/CRITICAL vuln AND Reworks < ${maxReworks} | PHASE_B (RETRY) | Increment Reworks; write REWORK-LOG.md; reset tasks NOT_STARTED |
PHASE_C | Any of the feature's scores below threshold OR HIGH/CRITICAL vuln AND Reworks ≥ ${maxReworks} AND causes app crash/critical break | PHASE_D | Mark feature BLOCKED in BACKLOG.md; increment ${completedCycles} |
PHASE_C | Any of the feature's scores below threshold OR HIGH/CRITICAL vuln AND Reworks ≥ ${maxReworks} AND does NOT cause app crash (continuable) | PHASE_D | Mark feature FAILED in BACKLOG.md; increment ${completedCycles} |
PHASE_D | Executable features remain | PHASE_E | Save memory then loop to next feature |
PHASE_D | No executable features remain | PHASE_E | Save memory; halt |
A1. State log:
BACKLOG.md[feature].Status → IN_PROGRESS
DECISIONS.md → "Started planning for {ID}."
A2. Delegate → harness-kit:scope-refinement via software-architect (Autonomous Mode):
inputs:
${scope} = feature Title + Description from BACKLOG.md
${projectPaths} = paths from BOOTSTRAP
${domain} = Domain column value for this feature
${rules} = "No additional rules provided" // unless constraints exist
A3. Verify: Wait for all docs/specs/{domain}/004-*-test-scenarios.md files to exist.
A4. Task breakdown:
For each docs/specs/{domain}/003-*-tactical-design.md file (one per project in ${projectPaths}) → extract ordered dev tasks from Section 6 → append to DEVELOPMENT-STATE.md:
Feature ID | Task ID | Project | Description | Domain | Current Phase: - | Status: NOT_STARTED
Project = root folder name of the source project (e.g., order-service, checkout-ui).
B1. State log:
DEVELOPMENT-STATE.md[task].Current Phase → IMPLEMENTATION
DEVELOPMENT-STATE.md[task].Status → IN_PROGRESS
B2. Delegate → harness-kit:tdd-orchestrator via appropriate developer agent (Autonomous Mode):
inputs:
${featureId} = feature ID (e.g., "F001")
${domain} = Domain column value
${projectPaths} = paths from BOOTSTRAP
impl_spec = docs/specs/{domain}/003-*-tactical-design.md
test_spec = docs/specs/{domain}/004-*-test-scenarios.md
// IF RETRY: also pass docs/specs/{domain}/REWORK-LOG.md
B3. Verify: Wait for docs/specs/{domain}/TDD-OUTPUT.json to be generated.
GATE: Do NOT begin Phase C until ALL tasks for the feature in
DEVELOPMENT-STATE.mdhaveStatus = COMPLETED. If any task isIN_PROGRESSorNOT_STARTED→ remain in Phase B.
C1. Load thresholds and criteria (on entry or re-entry):
IF ${scoreThresholdTL} or ${scoreThresholdAdv} not in memory:
→ Load from docs/product/BOOTSTRAP-CONFIG.json -> scoreThresholds.theGrumpyTechLead.threshold / scoreThresholds.adversarialQA.threshold
IF ${maxReworks} not in memory:
→ Load from docs/product/BOOTSTRAP-CONFIG.json -> completionCriteria.maxReworks
C2. State log:
DEVELOPMENT-STATE.md[all tasks for feature].Current Phase → VALIDATION
C3–C4. Parallel dispatch (both MUST run simultaneously):
C3: harness-kit:the-grumpy-tech-lead (harness-tech-lead agent, Autonomous Mode)
inputs: ${featureId}, ${domain}, ${projectPaths}
→ extract Score A via JSON Extraction Protocol
C4: harness-kit:adversarial-qa (harness-qa agent, Autonomous Mode)
inputs: ${featureId}, ${domain}, ${projectPaths}
→ extract Score B via JSON Extraction Protocol
C5. Verdict Gate:
IF feature's Score A >= ${scoreThresholdTL} AND Score B >= ${scoreThresholdAdv}:
BACKLOG.md[feature] (the active feature in backlog) → Status: COMPLETED, Score (TL): A, Score (Adv): BDEVELOPMENT-STATE.md → Current Phase: -, Status: COMPLETEDDECISIONS.md → "Feature {ID} ACCEPTED — TL: {A}, Adv: {B}."${completedCycles}++IF (feature's Score A < ${scoreThresholdTL} OR Score B < ${scoreThresholdAdv} OR HIGH/CRITICAL vuln)
AND Reworks < ${maxReworks}:
BACKLOG.md[feature].Reworks++ (for the active feature in backlog)docs/specs/{domain}/REWORK-LOG.md:
openPoints from the-grumpy-tech-leadedgeCasesMissed from adversarial-qaDECISIONS.md → "Feature {ID} RETRY #{n} — TL: {A}, Adv: {B}. Reason: {top finding}."Current Phase: IMPLEMENTATION, Status: NOT_STARTEDIF (feature's Score A < ${scoreThresholdTL} OR Score B < ${scoreThresholdAdv} OR HIGH/CRITICAL vuln)
AND Reworks >= ${maxReworks}
AND (failure causes application crash or breaks core functionality):
BACKLOG.md[feature] (the active feature in backlog) → Status: BLOCKEDCurrent Phase: -, Status: BLOCKEDDECISIONS.md → "Feature {ID} BLOCKED after {maxReworks} attempts. Rationale: crash/critical break."${completedCycles}++IF (feature's Score A < ${scoreThresholdTL} OR Score B < ${scoreThresholdAdv} OR HIGH/CRITICAL vuln)
AND Reworks >= ${maxReworks}
AND (failure does NOT cause a crash and development can continue, e.g., security vulnerability or minor bugs):
BACKLOG.md[feature] (the active feature in backlog) → Status: FAILEDCurrent Phase: -, Status: FAILEDDECISIONS.md → "Feature {ID} FAILED after {maxReworks} attempts. Rationale: non-blocking issue, continuing development."${completedCycles}++D1. Completion check — verify ALL of the following against BACKLOG.md:
BACKLOG.md are COMPLETED, BLOCKED, or FAILEDCOMPLETED feature: Score (TL) >= ${scoreThresholdTL} AND Score (Adv) >= ${scoreThresholdAdv}BLOCKED or FAILED feature: Reworks >= ${maxReworks}adversarial-qa verdicts (unless the feature was marked as FAILED)IF any criterion fails → log reason in DECISIONS.md
D2. Loop:
IF executable features remain → Phase E (save memory, then Phase A next feature)
IF feature is IN_PROGRESS → read DEVELOPMENT-STATE.md, resume from last completed phase
DO NOT restart from Phase A
→ Always pass through Phase E before transitioning
Trigger: After every Phase D (both mid-loop and final HALT). Ensures project memory reflects current state before any loop or termination.
E1. State log:
DECISIONS.md → "Phase E: persisting project memory in `docs/feature/{domain}.md`."
E2. MANDATORY — Delegate project-memory skill to software-architect agent (Autonomous Mode, no exceptions, no skipping):
inputs:
context = summary of changes made in completed cycle:
- Feature IDs processed (COMPLETED, BLOCKED, FAILED, or RETRY'd)
- For each COMPLETED feature: ${domain}, scope description (Title + Description from BACKLOG.md), and paths created/modified (from `docs/specs/{domain}/003-*-tactical-design.md` or `TDD-OUTPUT.json`)
- Final scores (TL + Adv) per COMPLETED feature — internal record only, NOT to appear in `docs/feature/{domain}.md`
- Key decisions logged in DECISIONS.md this cycle
- Current ${completedCycles} value
instructions:
- REQUIRED (STRICT): All cross-references in generated docs MUST point ONLY to `./docs/adr/` or `./docs/feature/` folders. NO other folder paths are permitted. Validate every reference before finalizing.
- REQUIRED: Document the developed feature under `docs/feature/{domain}.md` (create if missing, update if it already exists), following `./references/DOCUMENT-TEMPLATE.md` strictly
- REQUIRED: `FOLDER STRUCTURE` section → reflect the module's current relevant structure, incorporating paths added or modified this cycle (do not drop paths documented in prior cycles)
- REQUIRED: Keep all content direct and minimal — enough for a future LLM to orient itself in the codebase without re-reading source files
- PROHIBITED: Narrative explanations, justifications, or process history (no TDD/validation/score details — those belong in `DECISIONS.md`, not in feature docs)
- IF the cycle introduced architectural changes (new layers, patterns, integrations, test strategy changes) → update the corresponding `docs/adr/*.md` file, following `./references/ARCHITECTURE-RULES.md` (for `ARCHITECTURE.md`), `./references/TESTS-RULES.md` (for `TESTS.md`), or `./references/DOCUMENT-TEMPLATE.md` (for any other existing ADR)
- PROHIBITED: Creating a new ADR file during this step unless explicitly requested/decided by a human
E3. Transition:
IF executable features remain → Phase A (next feature)
IF no executable features remain → HALT
</orchestration_loop>
<json_extraction_protocol>
When parsing sub-agent output for metrics:
1. Search raw string for Markdown fences containing JSON.
2. IF none found: extract substring from first '{' to last '}'.
3. Parse as JSON.
4. IF parse fails: log error in DECISIONS.md → force RETRY cycle.
</json_extraction_protocol>
| Rule | Constraint |
|---|---|
| No developer emulation | Never touch Python, JS, or write tests |
| No questions | Execute A → B → C without user confirmation |
| Persistence first | Write every status change to disk BEFORE executing sub-agent command |
See EXAMPLES.md for complete templates: BACKLOG.md, DEVELOPMENT-STATE.md and DECISIONS.md
npx claudepluginhub romabeckman/harness-kit --plugin harness-kitOrchestrates multi-session projects by implementing one feature per cycle from feature-list.json through TDD pipeline with quality gates and code review.
Orchestrates multi-phase project execution by dispatching dedicated persona agents for planning, execution, verification, and review. Use after spec approval for automated phase chaining.
Autonomous technical lead that drives projects from commander's intent to completion using an OODA loop. Invokes other skills for implementation, refactoring, review, and bug-hunting.