Skill

handoff

Drives a coding task autonomously from implementation through PR, CI, and labelling. Authorizes push and PR creation. Flags judgment calls, not routine steps.

automation

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/empire-dev:handoff [task description | spec path]

User invocable

Model invocable

Inline context

Default effort

Argument hint[task description | spec path]

Tool Access

This skill is limited to the following tools:

BashReadEditWriteGlobGrepSkillAgentTodoWrite

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Drive one task from intent to a labelled PR with green CI, autonomously. The human hands off; you take it the whole way and surface what they should look at later.

SKILL.md

201 lines · ~3.3k tokens

Stats

LanguageShell

Parent stars9

MaintenanceExcellent

Last CommitJun 19, 2026

Actions

View Source View Plugin View on GitHub View README

Handoff

Drive one task from intent to a labelled PR with green CI, autonomously. The human hands off; you take it the whole way and surface what they should look at later.

User input: $ARGUMENTS

Invoking this skill IS the user's ship-intent signal. It overrides the default "work stays local until the user says ship" rule: you are authorized to push, open a PR, and drive CI without stopping to ask permission for those steps.

Authorization is NOT unconditional. It covers the normal forward path. It does NOT cover the hard-stop cases in autonomy-boundaries — there you still stop and ask, because guessing wrong on those is more expensive than a pause.

The point of handoff is to remove babysitting, not judgment. Run the routine; flag the consequential.

Before any work, build the spine so progress survives a long run and the user can see where you are.

Create a TodoWrite list with one item per phase below (0 Plan, 1 Implement, 2 Review, 3 Address, 4 PR, 5 CI, 6 Label). Mark each in-progress/done as you go — this is the user's progress bar for an unattended run.
Open an isolated worktree via /empire-git:worktree-open with a branch derived from the task, and do all work inside it. Rationale: handoff mutates files and pushes; keeping it off the user's current branch is the whole reason worktrees exist. The worktree lives until CI is green and labels are set — never close it mid-run (the CI fix loop needs it).
Work in place ONLY if the user explicitly says so. If working in place, hard-stop and ask before mutating when EITHER holds: the current branch is the default/protected branch (main, master), or the working tree already carries unrelated uncommitted changes. Pushing handoff's commits onto the user's main branch or entangling them with unrelated work is exactly the kind of irreversible surprise autonomy-boundaries exists to prevent.
If superpowers:* skills referenced below are not installed, don't error mid-run: do the equivalent inline (plan/TDD by hand) or skip-with-a-flag. Never stall an unattended run on a missing optional dependency.
Read CONTEXT.md at repo root and relevant docs/adr/ entries if present. Carry their vocabulary and decisions through every phase — plans, code, PR body, and flags all use project terms verbatim.
Start a running flag log (in-memory list). Append to it whenever you hit a decision worth human eyes (see flagging). It feeds the PR body and the final report.

Phase 0 — Plan (only if a spec exists)

A spec means: the user pointed at a spec/plan file, docs/superpowers/specs/ has a matching one, or the task is large/multi-step enough that implementing blind would thrash.

Spec or plan-worthy scope → invoke superpowers:writing-plans to produce a written plan, then implement against it.
No spec and the task is a small, well-bounded change → skip planning. A plan for a one-file fix is overhead, not safety. State that you're skipping and why.

If the task is ambiguous enough that you can't tell what "done" means, that's a hard stop — see autonomy-boundaries. Don't invent requirements.

Phase 1 — Implement

Follow the plan if one exists; otherwise work directly from the task.
Match the surrounding code — its conventions, naming, test style, comment density. Read neighbours before writing.
Where the project has tests, prefer superpowers:test-driven-development (test first, then code). Where it doesn't, don't bolt on a framework unasked — flag the absence instead.
Run the project's own checks (build, lint, tests, formatters) before declaring the phase done. Never claim it works without running it — evidence before assertion.
Commit in atomic, logically-scoped commits following repo git rules (Conventional Commits with plugin scope per the repo conventions).

Phase 2 — Team review

Invoke /empire-dev:team-review on the diff. It picks the specialist roster, dispatches in parallel, and returns a tiered consensus report.
Let it choose the roster from the diff signals. Don't hand-pick unless the diff is so narrow one specialist obviously covers it.

Phase 3 — Address findings

Here handoff deliberately overrides team-review's wait-for-the-user gate. The user already authorized action by invoking handoff, so apply the safe fixes and flag the rest rather than blocking.

Finding tier / kind	Action
Consensus Must-fix / Should-fix	Apply. High agreement, clear defect.
Corroborated Must-fix	Apply.
Corroborated Should-fix	Apply if the fix is mechanical and low-risk; otherwise flag.
Single-source (low-confidence)	Do NOT auto-apply. Flag with the specialist's rationale so the human can decide.
Conflicts between specialists	Do NOT pick a side silently. Flag both positions.
Nits	Apply if trivial; otherwise drop. Don't flag nits — noise.
Any fix that changes intended behaviour, public API, or security posture	Flag, don't apply. That's a product decision.

After applying, re-run the project's checks. A re-review pass is optional — use judgment on whether the change surface warrants it. If you re-dispatch team-review, it hits its own wait-for-the-user gate; override it exactly as this phase does — apply the safe tier, flag the rest, keep moving. Never let an autonomous run stall waiting for input it promised not to need.

Phase 4 — Open the PR

Generate the body with /empire-git:pr-description and use its output verbatim — never hand-write a gh pr create --body. This is a repo rule, not a preference.
Append a flags section to the body so the human sees the judgment calls on the PR itself, not buried in chat:
```
## Decisions & flags for review
- <decision or open question> — <why it needs a human> [<file:line> if applicable]
```
Place it after the generated body. If the flag log is empty, omit the section entirely — don't write "None".
Title: Conventional Commits, lowercase, no period, ≤ 72 chars.
Push with git push -u origin <branch> from inside the worktree, then gh pr create. Do NOT run /empire-git:worktree-close here — it removes the worktree, and the CI fix loop in Phase 5 still needs it. Worktree teardown is the user's call after the PR lands; mention it in the final report rather than doing it mid-run.
Push only handoff's own feature branch. Never push to the base/default branch, and never force-push.

Phase 5 — Watch CI

Watch checks by polling gh pr checks on an interval. Prefer polling over gh pr checks --watch: --watch blocks until a terminal state and can't honor the wall-clock bound below, so a stuck pipeline would hang the run.
If the repo has no CI configured (no checks ever register), skip this phase cleanly — don't watch nothing. Note it in the final report.
On failure: read the failing job's logs, diagnose, fix, commit, push to the same feature branch (never force-push), re-watch. This is a bounded loop, not an infinite one.
Bound the wait, not just the retries: if checks stay queued/pending without progress past a reasonable wall-clock window (~15–20 min), stop watching and flag — a stuck pipeline is the human's to chase, not yours to spin on.
Stop the loop and flag when ANY of these holds:
- 3 fix attempts on the same check have not turned it green — you're likely missing context the human has.
- The failure is in infra/flaky territory (runner died, network, unrelated job), not your diff.
- The fix would require a decision that belongs to autonomy-boundaries (e.g. changing a security check, disabling a test).
Never make a check pass by weakening it (deleting the assertion, adding --no-verify, skip, lowering coverage gates). That's defeating the signal, not fixing the code. Flag instead.

Phase 6 — Assign labels

List the repo's actual labels first: gh label list. Never invent labels that don't exist.
Infer labels from the diff content and PR intent — type (feat/fix/docs/chore), affected area/scope, size if the repo uses size labels.
Apply with gh pr edit --add-label. Map to the closest existing label; if no label clearly fits, add none and note it in the final report rather than forcing a wrong one.

What to flag for human review

The skill's value is knowing the difference between "decide and move on" and "a human needs to weigh in." Flag — don't silently decide — when:

You picked between materially different implementation approaches and the choice has lasting consequences (data model, public API shape, dependency added).
You inferred a requirement the task didn't state.
A review finding was low-confidence, conflicting, or behaviour-changing (per phase-address).
You worked around a CI failure rather than truly fixing it, or skipped a check.
You noticed something out of scope that looks wrong but you chose not to touch.

Each flag is one line: what you decided or what's open, why it needs a human, and a file:line anchor when there is one. Flags land in the PR body and the final chat report. Routine, reversible choices do not get flagged — over-flagging buries the signal as badly as under-flagging.

Hard stops — ask, don't guess

Authorization to ship does not extend to these. Stop and ask the user:

Ambiguous done-criteria — you can't tell what success looks like from the task + spec. Building the wrong thing fast is worse than asking.
Destructive / irreversible — data migrations, deleting files you didn't create, force-push, history rewrite, dropping resources, pushing to the base/default branch. Handoff pushes only its own feature branch.
Security & secrets — anything touching auth, crypto, credentials, permissions in a way the task didn't clearly call for; committing anything that looks like a secret.
Scope explosion — the task turns out to need far more than implied (new service, broad refactor, breaking change). Surface the revised scope before sinking hours into it.
External side effects beyond the PR — deploys, publishing packages, posting to external systems, anything outward-facing the task didn't authorize.

When stopped, report what you've done so far, the specific decision you need, and your recommendation. Resume when answered.

Final handoff report

End every run with a compact status the user can scan in seconds:

## Handoff complete  (or: paused / blocked)
- PR: <url>  ·  CI: <green / failing job> ·  Labels: <applied>
- Phases: Plan <state> · Implement <state> · Review <state> · Address <state> · PR <state> · CI <state> · Label <state>

## Flags for review
- <flag> — <why> [<file:line>]

## What I decided on my own
- <notable routine decision the user might want to know, kept short>

Omit empty sections. If paused at a hard stop, lead with the decision you need.

handoff

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

handoff

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Handoff

Phase 0 — Plan (only if a spec exists)

Phase 1 — Implement

Phase 2 — Team review

Phase 3 — Address findings

Phase 4 — Open the PR

Phase 5 — Watch CI

Phase 6 — Assign labels

What to flag for human review

Hard stops — ask, don't guess

Final handoff report

Similar Skills

Handoff

Phase 0 — Plan (only if a spec exists)

Phase 1 — Implement

Phase 2 — Team review

Phase 3 — Address findings

Phase 4 — Open the PR

Phase 5 — Watch CI

Phase 6 — Assign labels

What to flag for human review

Hard stops — ask, don't guess

Final handoff report

Similar Skills