From empire-dev
Drives a coding task autonomously from implementation through PR, CI, and labelling. Authorizes push and PR creation. Flags judgment calls, not routine steps.
How this skill is triggered — by the user, by Claude, or both
Slash command
/empire-dev:handoff [task description | spec path][task description | spec path]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Drive one task from intent to a labelled PR with green CI, autonomously. The human hands off; you take it the whole way and surface what they should look at later.
Drive one task from intent to a labelled PR with green CI, autonomously. The human hands off; you take it the whole way and surface what they should look at later.
User input: $ARGUMENTS
Before any work, build the spine so progress survives a long run and the user can see where you are.
/empire-git:worktree-open with a branch derived from the task, and do all work inside it. Rationale: handoff mutates files and pushes; keeping it off the user's current branch is the whole reason worktrees exist. The worktree lives until CI is green and labels are set — never close it mid-run (the CI fix loop needs it).main, master), or the working tree already carries unrelated uncommitted changes. Pushing handoff's commits onto the user's main branch or entangling them with unrelated work is exactly the kind of irreversible surprise autonomy-boundaries exists to prevent.superpowers:* skills referenced below are not installed, don't error mid-run: do the equivalent inline (plan/TDD by hand) or skip-with-a-flag. Never stall an unattended run on a missing optional dependency.CONTEXT.md at repo root and relevant docs/adr/ entries if present. Carry their vocabulary and decisions through every phase — plans, code, PR body, and flags all use project terms verbatim.A spec means: the user pointed at a spec/plan file, docs/superpowers/specs/ has a matching one, or the task is large/multi-step enough that implementing blind would thrash.
superpowers:writing-plans to produce a written plan, then implement against it.If the task is ambiguous enough that you can't tell what "done" means, that's a hard stop — see autonomy-boundaries. Don't invent requirements.
superpowers:test-driven-development (test first, then code). Where it doesn't, don't bolt on a framework unasked — flag the absence instead./empire-dev:team-review on the diff. It picks the specialist roster, dispatches in parallel, and returns a tiered consensus report.Here handoff deliberately overrides team-review's wait-for-the-user gate. The user already authorized action by invoking handoff, so apply the safe fixes and flag the rest rather than blocking.
| Finding tier / kind | Action |
|---|---|
| Consensus Must-fix / Should-fix | Apply. High agreement, clear defect. |
| Corroborated Must-fix | Apply. |
| Corroborated Should-fix | Apply if the fix is mechanical and low-risk; otherwise flag. |
| Single-source (low-confidence) | Do NOT auto-apply. Flag with the specialist's rationale so the human can decide. |
| Conflicts between specialists | Do NOT pick a side silently. Flag both positions. |
| Nits | Apply if trivial; otherwise drop. Don't flag nits — noise. |
| Any fix that changes intended behaviour, public API, or security posture | Flag, don't apply. That's a product decision. |
After applying, re-run the project's checks. A re-review pass is optional — use judgment on whether the change surface warrants it. If you re-dispatch team-review, it hits its own wait-for-the-user gate; override it exactly as this phase does — apply the safe tier, flag the rest, keep moving. Never let an autonomous run stall waiting for input it promised not to need.
Generate the body with /empire-git:pr-description and use its output verbatim — never hand-write a gh pr create --body. This is a repo rule, not a preference.
Append a flags section to the body so the human sees the judgment calls on the PR itself, not buried in chat:
## Decisions & flags for review
- <decision or open question> — <why it needs a human> [<file:line> if applicable]
Place it after the generated body. If the flag log is empty, omit the section entirely — don't write "None".
Title: Conventional Commits, lowercase, no period, ≤ 72 chars.
Push with git push -u origin <branch> from inside the worktree, then gh pr create. Do NOT run /empire-git:worktree-close here — it removes the worktree, and the CI fix loop in Phase 5 still needs it. Worktree teardown is the user's call after the PR lands; mention it in the final report rather than doing it mid-run.
Push only handoff's own feature branch. Never push to the base/default branch, and never force-push.
gh pr checks on an interval. Prefer polling over gh pr checks --watch: --watch blocks until a terminal state and can't honor the wall-clock bound below, so a stuck pipeline would hang the run.queued/pending without progress past a reasonable wall-clock window (~15–20 min), stop watching and flag — a stuck pipeline is the human's to chase, not yours to spin on.--no-verify, skip, lowering coverage gates). That's defeating the signal, not fixing the code. Flag instead.gh label list. Never invent labels that don't exist.feat/fix/docs/chore), affected area/scope, size if the repo uses size labels.gh pr edit --add-label. Map to the closest existing label; if no label clearly fits, add none and note it in the final report rather than forcing a wrong one.The skill's value is knowing the difference between "decide and move on" and "a human needs to weigh in." Flag — don't silently decide — when:
Each flag is one line: what you decided or what's open, why it needs a human, and a file:line anchor when there is one. Flags land in the PR body and the final chat report. Routine, reversible choices do not get flagged — over-flagging buries the signal as badly as under-flagging.
Authorization to ship does not extend to these. Stop and ask the user:
When stopped, report what you've done so far, the specific decision you need, and your recommendation. Resume when answered.
End every run with a compact status the user can scan in seconds:
## Handoff complete (or: paused / blocked)
- PR: <url> · CI: <green / failing job> · Labels: <applied>
- Phases: Plan <state> · Implement <state> · Review <state> · Address <state> · PR <state> · CI <state> · Label <state>
## Flags for review
- <flag> — <why> [<file:line>]
## What I decided on my own
- <notable routine decision the user might want to know, kept short>
Omit empty sections. If paused at a hard stop, lead with the decision you need.
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
npx claudepluginhub marcoskichel/empire --plugin empire-dev