Evaluate AI agent setups for best practices, redundancy, security, and cross-component issues by running shell scripts that initialize the plugin environment.
Deep-evaluate a single skill with static analysis and qualitative review, both individually and in context of the full setup. Check if a skill is worth keeping, well-built, or redundant.
Run deterministic static analysis on the full agent setup (CLAUDE.md, skills, commands, hooks, agents, MCP configs). 43 rules + system-level analysis. No LLM, fast, CI-suitable.
Full qualitative review of the agent setup. Per-component rubrics, 21 cross-type checks, KEEP/REVIEW/REMOVE verdicts. Use for deep review, redundancy check, or quality assessment.
Deep security audit of the agent setup. Deterministic security rules (prompt injection, credential access, exfiltration, taint tracking, YARA, CVE) plus LLM semantic review.
Deep-evaluate a single skill with static analysis and qualitative issue detection, both individually and in context of the full setup. Use when the user wants to check if a specific skill is worth keeping, well-built, or redundant.
Run deterministic static analysis on the full agent setup (CLAUDE.md, skills, commands, hooks, agents, MCP configs). 43 rules + system-level analysis (token budget, trigger overlaps, dependencies, context utilization). No LLM. Use when the user wants a fast lint check, CI gate, or structural health report.
Full qualitative review of the agent setup. Reads every file, applies per-component rubrics, runs 21 cross-type optimization checks, and produces KEEP/REVIEW/REMOVE verdicts. Use when the user wants a deep review, redundancy check, or quality assessment of their setup.
Deep security audit of the agent setup. Runs all deterministic security rules (prompt injection, credential access, data exfiltration, obfuscation, reverse shells, AST behavioral analysis, taint tracking, MCP permission analysis, tool poisoning, YARA signatures, CVE lookups) plus LLM-based semantic security review. Use when the user asks about security, safety, wants to audit their setup, or needs a pre-deployment security check.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Evaluate AI code agent setups for best practices, redundancy, security, and cross-component issues.
Available as a CLI tool, a Claude Code plugin, and Cursor commands.
Supports Claude Code and Cursor projects. Auto-detects which tool(s) a project uses.
Most tools test whether a skill produces correct output. This tool checks the setup itself: CLAUDE.md, skills, commands, hooks, MCP configs, agents, .cursor/rules/*.mdc, .cursorrules.
Four commands, same engine:
| Command | What it does | LLM in CLI | LLM in Claude Code / Cursor |
|---|---|---|---|
setup-eval-lint | 43 deterministic rules + system analysis (token budget, trigger overlaps, dependencies). Fast, CI-suitable. | No | No |
setup-eval-review | Per-component rubric review with 0-3 scoring per dimension, 21 cross-type checks. KEEP/REVIEW/REMOVE verdicts. | Yes (API key) | Yes (in-session) |
setup-eval-security | All security rules + YARA + CVE lookups + semantic review. SAFE/CAUTION/UNSAFE. | Scan: no. Semantic review: --review flag | Yes (in-session) |
eval-skill | Deep-evaluate one skill individually and in context of the full setup. | Lint: no. Rubric: --rubric flag | Yes (in-session) |
Install from PyPI and run from the terminal:
pip install setup-eval
setup-eval setup-eval-lint .
setup-eval setup-eval-lint . --watch # re-run lint automatically on file changes
setup-eval setup-eval-review . --provider gemini
setup-eval setup-eval-security . --review
setup-eval eval-skill ./skills/my-skill --context . --rubric
Requires GEMINI_API_KEY or ANTHROPIC_API_KEY for review/security/skill commands.
setup-eval-security supports optional YARA malware signature scanning. To enable it: pip install setup-eval[yara]
No pip install needed. Install directly from within Claude Code:
/plugin marketplace add redhat-community-ai-tools/harness-eval-lab
/plugin install setup-eval@harness-eval-lab
/reload-plugins
The 4 commands appear in the / menu:
/setup-eval:setup-eval-lint/setup-eval:setup-eval-review/setup-eval:setup-eval-security/setup-eval:eval-skillNo API key needed. Claude evaluates in-session.
Updating: Re-run the install command to get the latest rules.
Requires the CLI tool installed first (Cursor commands call it for the deterministic scan):
pip install setup-eval
Then copy .cursor/commands/ from this repo into your project. The 4 commands appear in Cursor's command palette:
/setup-eval-lint/setup-eval-review/setup-eval-security/eval-skillNo API key needed for review/security/skill. Cursor evaluates in-session.
| Category | Rules | What they check |
|---|---|---|
| Structural | 1 | SKILL.md exists |
| Frontmatter | 3 | Description required/quality, format valid |
| Content | 4 | Duplicate detection (TF-IDF), broken references, circular references, token budget |
| Security | 9 | Credential access, prompt injection (17 patterns), data exfiltration, obfuscation, reverse shells, AST analysis, taint tracking, MCP least-privilege, tool poisoning |
| Security (opt-in) | 2 | YARA signatures, CVE lookups via OSV.dev |
| Commands | 8 | Description, script exists, duplicates, credentials, injection, skill overlap, shadows built-in, references nonexistent skill |
| CLAUDE.md | 3 | Exists, skill duplication, generic advice detection |
| Hooks | 1 | Structure validation, dangerous patterns, network access |
| Agents | 9 | Description, skills exist, tool format, constraint matching, credentials, injection, exfiltration, obfuscation, reverse shells |
Four presets: recommended (default), strict, security, pre-workflow.
See CONTRIBUTING.md for adding rules and submitting PRs.
See CHANGELOG.md for release history.
See future-plans/ for planned improvements (SARIF output, security benchmarks, runner abstraction, dynamic workflows, impact measurement).
Set up the redhat-community-ai-tools Slack MCP server for use with Claude Code
Comprehensive Claude Code plugin providing expert-level skills for OpenShift cluster management, troubleshooting, and operations
Containerized development workflow using skipper — build, test, and run commands inside Docker/Podman containers with reproducible toolchains
npx claudepluginhub redhat-community-ai-tools/harness-eval-lab --plugin setup-evalHarness-native ECC plugin for engineering teams - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, MCP conventions, and operator workflows for Claude Code plus adjacent agent harnesses
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, and rules evolved over 10+ months of intensive daily use
Efficient skill management system with progressive discovery — 410+ production-ready skills across 33+ domains