By vasuag09
Lean full-SDLC Claude Code harness — Discover, Plan, Implement, Verify, Maintain — with a vague-intent discovery entry above /spec, subagent orchestration, memory persistence, a product/UX + system design altitude, benchmark-gated cost/token optimization, guarded long-running agent runs, multi-agent parallel fan-out, a bug-fix fast lane, and a release & feedback loop (deploy + observe) that closes the pipeline.
Expert code-review specialist. MUST BE USED immediately after writing or modifying code, before merge. Reviews quality, correctness, and maintainability. Read-only — reports findings with severity, does not edit.
Implementation-planning specialist. Use PROACTIVELY for non-trivial features, refactors, or anything spanning multiple files. Turns a spec into a phased, risk-assessed task plan. Read-only — produces a plan, does not edit code.
System-design specialist. Use PROACTIVELY for architectural decisions, new subsystems, interface/data-model design, or significant refactors. Produces a design note / ADR with trade-offs. Read-only — designs, does not implement.
Security vulnerability specialist. MUST BE USED for any change touching auth, user input, DB queries, file I/O, external calls, crypto, secrets, or payments — and before merging such code. Scans for OWASP Top-10 and secret leakage. Read-only — reports, does not edit.
Test-driven development specialist. Use PROACTIVELY when implementing a feature or fixing a bug. Enforces tests-first (RED → GREEN → REFACTOR) and ≥80% coverage. May write tests and implementation.
Execute a plan via test-driven development — write tests first, then minimal code, phase by phase. Use after /plan (and /architect if needed) to write the actual feature. Delegates to the harness-claude:tdd-guide agent.
Make and record an architecture/design decision for non-trivial work — new subsystems, interfaces, data models, or significant refactors. Use when /plan flags a load-bearing decision. Delegates to the harness-claude:architect agent.
Measure whether a harness component (a skill, agent, or rule) actually earns its keep — run the same task k times with and without it, in isolated git worktrees, and report pass@k (works once) and pass^k (works consistently). Opt-in; not wired into the default pipeline. Use to decide keep-vs-cut on evidence, or to fill the R5 benchmark for a /extract proposal.
Resolve build failures and type/compile errors fast, with minimal diffs. Use when a build or typecheck fails during implementation. Delegates to the harness-claude:build-error-resolver agent.
Release step that closes the loop after /ship — drive a change out to a running environment using the PROJECT'S OWN deploy mechanism (detect, never prescribe a stack), then smoke-test the deployed artifact and keep a guarded rollback path. Deploy is more outward-facing than git push, so it is arm-to-fire — plan and pre-check, then HALT and require an explicit `arm deploy` before any real outward action. Opt-in; invoked explicitly. Use after /ship to push a verified change to staging/prod under the harness's boundary discipline.
Matches all tools
Hooks run on every tool call, not just specific ones
Executes bash commands
Hook triggers when Bash tool is used
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Modifies files
Hook triggers on file write and edit operations
Modifies files
Hook triggers on file write and edit operations
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
A lean, full-SDLC harness for Claude Code that benchmark-gates its own features — and kills the ones that don't earn their keep.
This is the first time I'm sharing this publicly. I built it privately, iterating toward v0.8.0 (an internal version count, not a prior release history), to turn Claude Code from a powerful-but-undisciplined chat loop into a real Plan → Implement → Verify → Maintain pipeline — scoped subagents, test-first gates, cross-session memory, reversible hooks.
The part I'm most proud of isn't a feature — it's the discipline behind one of them. Every cost optimization I tried had to beat a bare baseline on cost-per-successful-task, or I killed it and wrote down why. I ran 4 experiments. I shipped 2. See what I killed and why ↓
Stack focus: TypeScript/JS (React, Next, Vercel) and Python. MIT-licensed, plugin-installable, ~0-dependency hooks (plain Node).
PLAN ─────────────► IMPLEMENT ─────► VERIFY ─────────────► MAINTAIN
/spec /research /implement /review /security-review /refactor-clean
/plan /architect /build-fix /test /verify /ship /onboard
claude plugin marketplace add vasuag09/harness-claude
/plugin # find "harness-claude", enable it
/harness # run the full pipeline on a real task
Built to evolve toward eval loops, retrieval, long-running agents, multi-agent orchestration, and computer-use (see ROADMAP.md).
Every optimization here had to clear one gate: does it beat a bare baseline on cost-per-successful-task, at held consistency (pass^k)? Not "does it work" — "is it worth it." Two of four didn't clear it.
| # | Optimization | Result | Why |
|---|---|---|---|
| 1 | Input-compression proxy (wire-level, compresses what's fed to the model) | 🔴 Killed | Broke Claude Code's cache economics — compressing the cached prefix invalidated the cache that made it cheap, so it cost more |
| 2 | A code-graph MCP (a popular ~51k★ structural-index server) | 🔴 Killed | Its fixed per-session context tax was bigger than the entire task cost on a normal-sized repo (might win on very large repos — out of scope here) |
| 3 | Generation reduction — the /lazy "build the minimum that actually works" reflex | 🟢 Shipped, always-on | −35% generated output tokens, −23% LOC, at held accuracy (k=3, Opus/Sonnet, measured against the harness's own existing YAGNI rules) |
| 4 | Structural orientation hook (lightweight local code-index) | 🟢 Shipped, always-on, with a caveat | A large-repo bet, not yet benchmarked at scale — does no harm on small repos, but isn't a proven win there either |
One honest caveat on #3: the dollar savings on a small task were only ~−3% (within noise) — cost here is dominated by cached input, and the output cut only moves real dollars on output-heavy work. The win is real, just not a headline multiplier.
| Guide | Read it for |
|---|---|
| docs/USAGE.md | Install → full pipeline → reuse-an-existing-codebase → common scenarios |
| docs/SKILLS.md | Every command + agent, what each does, what it delegates to |
| docs/HOOKS.md | Exactly what runs on your machine, and how to disable any hook |
| docs/TROUBLESHOOTING.md | Hooks not firing, slow tsc, MCP, git boundary — fixes |
| CONTRIBUTING.md | Adding skills/hooks, portability rules, PR checklist |
| ROADMAP.md | Where this is going (eval loops → retrieval → multi-agent → computer-use) |
New here? Start with docs/USAGE.md.
It lives in its own repo and installs as a Claude Code plugin/marketplace, so you can:
/plugin, run a real task, judge the result.
Toggle it off to fall back. It never overwrites your existing ~/.claude.npx claudepluginhub vasuag09/harness-claude --plugin harness-claudeHarness-native ECC plugin for engineering teams - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, MCP conventions, and operator workflows for Claude Code plus adjacent agent harnesses
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
Permanent coding companion for Claude Code — survives any update. MCP-based terminal pet with ASCII art, stats, reactions, and personality.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.