Skill

audit

Runs parallel code quality, security, and test audits with semantic dedup and per-dimension PASS/WARN/FAIL verdicts. Supports strict mode for trust-boundary isolation.

code-quality

security

testing

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/epic:audit

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**CRITICAL**: Run `HARNESS_DIR=$(epic path)` first. NEVER use `.harness/` in the project directory.

SKILL.md

245 lines · ~2.2k tokens

Stats

LanguageRust

Stars8

Forks2

MaintenanceExcellent

Last CommitJun 22, 2026

Actions

View Source View Plugin View on GitHub View README

Audit — Verify Everything

CRITICAL: Run HARNESS_DIR=$(epic path) first. NEVER use .harness/ in the project directory.

Execution Modes

This skill has 3 internal modes that run in parallel:

audit:code — Code quality, logic, style, test coverage, spec coverage
audit:security — OWASP Top 10 + performance (N+1, leaks)
audit:test — Full test suite, AC verification, coverage delta

`--strict` Mode (Trust Boundary Isolation)

When invoked with --strict (or when .harness/engagement.md has mode: strict), the audit enforces independence between verification agents to prevent reward hacking:

Artifact-only delivery: Each mode receives only the code diff and spec — no builder context, no session history, no prior agent conclusions.
Cross-check independence: audit:code and audit:security run without visibility into each other's findings. Results are combined only during synthesis (Step 4).
Blind scoring: No mode can see another mode's verdict until synthesis. This prevents anchoring bias where a clean code review inflates the security score.
No self-review: If the same agent built the code (via /go), a different agent instance must run audit. The builder's session ID is checked and excluded.

Use --strict for security-sensitive projects, compliance requirements, or when the build phase had ambiguous outcomes.

Process

Step 0: Prerequisites

Confirm go has run:

git symbolic-ref --short HEAD  # must NOT be main/master

Load the spec to know what was supposed to be built:

ls -t $HARNESS_DIR/specs/SPEC-*.md | head -1

Read the Requirements and Acceptance Criteria sections.

Step 1: Gather Scope

git diff --stat $(git merge-base HEAD main)
git diff --name-only $(git merge-base HEAD main)

Step 2: Scope Detection

Pattern	Scope	Extra checks
`.api.`, `route`, `controller`, `handler`	API	+ Contract testing, request validation
`.tsx`, `.jsx`, `.vue`, `.svelte`, `*.css`	Frontend	+ Accessibility, semantic HTML
`.sql`, `migration`, `schema*`	Database	+ Migration safety, rollback plan
`.rs`, `Cargo.toml`, `.go`, `go.mod`	Backend	+ Build verification, type safety
`.test.`, `.spec.`, `__tests__/`	Tests	+ Coverage delta, flaky test detection
`Dockerfile`, `.yml`, `*.yaml`, `Makefile`	Infra	+ Config validation, secret detection
`.md`, `.txt`	Docs	+ Link checking, freshness

Step 3: Run Checks in Parallel

Launch all 3 modes with run_in_background: true.

--strict isolation protocol: When strict mode is active, each mode agent must be launched with:

Only the diff output from Step 1 as input (no session context)
No access to other modes' intermediate or final results
A fresh context window containing only: spec, diff, and the mode-specific checklist

This ensures each mode forms independent conclusions. Results are combined only in Step 4 synthesis.

Mode: audit:code (Review)

Constraints

Be specific — cite file and line number for every finding
Suggest fixes, don't just flag problems — every finding needs a one-line fix hint

Review Dimensions

Correctness: Does the code do what it claims? Edge cases handled?
Logic: Race conditions, off-by-one, null pointer risks?
Style: Consistent with project conventions? Readable?
Tests: Changes covered by tests? Tests meaningful?
Naming: Do names clearly convey intent?
Spec coverage: Each Requirement addressed in the diff?

Output Format

## Code Review: <file or area>
- [BLOCKER] <description> (line X)
- [WARN] <description> (line Y)
- [NIT] <description> (line Z)

## Summary
- Blockers: N
- Warnings: N
- Verdict: APPROVE / REQUEST_CHANGES

Mode: audit:security (Security)

Constraints

False positives are better than false negatives for security
Always check .env files are in .gitignore

Security Checklist (OWASP Top 10)

Injection (SQL, XSS, command)
Broken authentication
Sensitive data exposure
Access control failures
Security misconfiguration

Performance Checklist

N+1 queries
Unbounded data loading
Missing indexes
Memory leaks (event listeners, growing caches)
Blocking main thread

Output Format

## Security Audit
- [CRITICAL] SQL injection risk in <file>:<line>
- [HIGH] Hardcoded secret in <file>:<line>
- [MEDIUM] Missing rate limit on <endpoint>

## Performance Audit
- [HIGH] N+1 query in <file>:<line>
- [MEDIUM] Unbounded array growth in <file>:<line>

## Summary
- Security: PASS / FAIL (N critical, N high)
- Performance: PASS / WARN (N issues)

Mode: audit:test (Test Runner)

Run the full test suite
Verify each Acceptance Criterion is demonstrably met
Report coverage delta
Flag any flaky tests

Step 3.5: Semantic Deduplication

After all 3 modes complete, merge their findings and deduplicate:

Collection: Gather all findings from code, security, and test modes into a single pool.

Root-Cause Grouping: For each finding, identify the root cause. Findings sharing the same root cause (same file, same function, same underlying pattern) form a group.

Classification (per group):

Classification	Meaning	Action
`NEW`	First finding for this root cause	Include in report
`DUP_BETTER`	Duplicate with better evidence or higher severity	Replace original with this
`DUP_SKIP`	Duplicate with weaker or equal evidence	Drop; reference the `NEW` finding

Severity Reassessment: The surviving finding in each group takes the highest severity across all modes. For example, if code review says [WARN] but security says [CRITICAL] for the same root cause, the deduped finding is [CRITICAL].

Output: Only deduplicated findings proceed to Step 4 synthesis. The report should note: "N findings deduplicated from M total (K groups collapsed)."

Step 4: Synthesize

Combine deduplicated findings into a single report:

## Audit Report
- Spec: SPEC-{timestamp} ({goal_slug})
- Branch: {current branch}

### Change Scope
- Scopes detected: [API, Frontend, Backend, Database, Infra, Docs, Tests]
- Scope-specific checks: [list what ran]

### Code Quality: [PASS/WARN/FAIL]
### Security: [PASS/WARN/FAIL]
### Performance: [PASS/WARN/FAIL]
### Tests: [X/Y passing, Z% coverage]

### Deduplication
- Total findings: M
- Deduplicated: N (K groups collapsed)

### Spec Coverage
- R1: ✅/❌ addressed in diff
- R2: ✅/❌ addressed in diff
- AC1: ✅/❌ verified by test
- AC2: ✅/❌ verified by test

### Action Items
1. [blocker or warning]

Step 5: Act

All PASS + all AC verified: "Audit passed. Run /ship to create a PR."
WARN: Show warnings, ask if user wants to fix before shipping
FAIL or AC missing: List each blocker with a one-line fix hint. "Fix with /go, then re-run /audit."

Anti-Rationalization

Excuse	Rebuttal	What to do instead
"It's a small change, skip security"	Small changes introduce big vulnerabilities	Always run the security checklist
"Tests are passing, that's enough"	Tests don't catch security or performance issues	Run all 3 modes
"I'll fix the warnings later"	Later never comes	Fix blockers now, warnings before merge
"Dedup is overkill for small audits"	Small audits can still have cross-mode overlap	Always dedup — the cost is trivial
"Strict mode is overkill"	Without isolation, the builder can influence reviewers via shared context	Use `--strict` for security-sensitive or compliance-driven projects
"The agents are independent enough"	Shared context creates anchoring bias — a clean code review inflates security scores	Strict mode ensures blind scoring until synthesis

Evidence Required

All 3 modes (code, security, test) completed
Each Requirement has a coverage verdict
Each AC has a test/verification verdict
No BLOCKER items remain on PASS
Deduplication applied: total vs. deduplicated count reported

Red Flags

Skipping security review for "small changes"
Approving code with failing tests
Ignoring performance warnings in hot paths
Marking audit PASS when any AC is unverified
Reporting raw findings without deduplication

audit

Popularity

Invocation

Context Preview

SKILL.md

audit

Popularity

Invocation

Context Preview

SKILL.md

Audit — Verify Everything

Execution Modes

--strict Mode (Trust Boundary Isolation)

Process

Step 0: Prerequisites

Step 1: Gather Scope

Step 2: Scope Detection

Step 3: Run Checks in Parallel

Mode: audit:code (Review)

Constraints

Review Dimensions

Output Format

Mode: audit:security (Security)

Constraints

Security Checklist (OWASP Top 10)

Performance Checklist

Output Format

Mode: audit:test (Test Runner)

Step 3.5: Semantic Deduplication

Step 4: Synthesize

Step 5: Act

Anti-Rationalization

Evidence Required

Red Flags

Similar Skills

Audit — Verify Everything

Execution Modes

--strict Mode (Trust Boundary Isolation)

Process

Step 0: Prerequisites

Step 1: Gather Scope

Step 2: Scope Detection

Step 3: Run Checks in Parallel

Mode: audit:code (Review)

Constraints

Review Dimensions

Output Format

Mode: audit:security (Security)

Constraints

Security Checklist (OWASP Top 10)

Performance Checklist

Output Format

Mode: audit:test (Test Runner)

Step 3.5: Semantic Deduplication

Step 4: Synthesize

Step 5: Act

Anti-Rationalization

Evidence Required

Red Flags

Similar Skills

`--strict` Mode (Trust Boundary Isolation)

`--strict` Mode (Trust Boundary Isolation)