Skill

evaluate-plugin

Evaluate a local Codex plugin in engineer-friendly language. Use when the user says "evaluate this plugin", "audit this plugin", "why did this score that way", "what should I fix first", "help me benchmark this plugin", or asks for a plugin-wide report before comparing versions.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/plugin-eval:evaluate-plugin

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this skill when the target is a plugin root with `.codex-plugin/plugin.json`.

SKILL.md

44 lines · ~470 tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitMay 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Evaluate Plugin

Use this skill when the target is a plugin root with .codex-plugin/plugin.json.

Workflow

Treat "Evaluate this plugin." as the default entrypoint.
If the request comes in as natural chat language, use plugin-eval start <plugin-root> --request "<user request>" --format markdown first so the user sees the routed local path.
Run plugin-eval analyze <plugin-root> --format markdown.
Read Fix First before drilling into manifest findings, nested skill findings, and code or coverage details.
If the plugin contains multiple skills, summarize the strongest and weakest ones explicitly.
If the user wants measured usage, switch to "Help me benchmark this plugin." and use the starter benchmark flow.
If the user wants trend data, compare two JSON outputs with plugin-eval compare.

Chat Requests To Recognize

Evaluate this plugin.
Audit this plugin.
Why did this score that way?
What should I fix first?
Help me benchmark this plugin.
What should I run next?

Commands

plugin-eval start <plugin-root> --request "Evaluate this plugin." --format markdown
plugin-eval analyze <plugin-root> --format markdown
plugin-eval start <plugin-root> --request "What should I run next?" --format markdown
plugin-eval compare before.json after.json
plugin-eval report result.json --format html --output ./plugin-eval-report.html
plugin-eval init-benchmark <plugin-root>
plugin-eval benchmark <plugin-root> --dry-run

Reference

../../references/chat-first-workflows.md

evaluate-plugin

Invocation

Context Preview

SKILL.md

evaluate-plugin

Invocation

Context Preview

SKILL.md

Evaluate Plugin

Workflow

Chat Requests To Recognize

Commands

Reference

Similar Skills

Evaluate Plugin

Workflow

Chat Requests To Recognize

Commands

Reference

Similar Skills