Skill

evaluate-skill

Evaluate a local Codex skill in engineer-friendly terms. Use when the user says "evaluate this skill", "give me an analysis of the game dev skill", "audit this skill", "why did this score that way", "what should I fix first", or asks for a skill-specific report before benchmarking it.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/plugin-eval:evaluate-skill

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this skill when the target is a local skill directory or `SKILL.md` file.

SKILL.md

55 lines · ~660 tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitMay 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Evaluate Skill

Use this skill when the target is a local skill directory or SKILL.md file.

Workflow

Treat "Evaluate this skill." as the default entrypoint.
If the user names a skill instead of giving a path, resolve it locally first, preferring ~/.codex/skills/<skill-name> and then repo-local skills/<skill-name>.
If the user says the request in natural language first, use plugin-eval start <skill-path> --request "<user request>" --format markdown to show the routed path clearly.
Run plugin-eval analyze <skill-path> --format markdown.
Review At a Glance, Why It Matters, Fix First, and Recommended Next Step before drilling into details.
Explain which findings are structural, which are budget-related, and which are code-related.
If the user asks for an "analysis" of the skill, do not stop at the report. Also run plugin-eval init-benchmark <skill-path> and show the setup questions for refining the starter scenarios in .plugin-eval/benchmark.json.
If the user wants real usage numbers, switch to "Measure the real token usage of this skill." and run the benchmark flow.
After observed usage is available, use plugin-eval measurement-plan <skill-path> --observed-usage <usage.jsonl> --format markdown to recommend what to instrument or improve next.
If the user wants a rewrite plan, route to ../improve-skill/SKILL.md.

Skill-Specific Priorities

frontmatter validity
name and description quality
progressive disclosure and reference usage
broken relative links
oversized SKILL.md or descriptions
helper script quality for TypeScript and Python files

Chat Requests To Recognize

Evaluate this skill.
Give me an analysis of the game dev skill.
Audit this skill.
Why did this skill score that way?
What should I fix first?
Measure the real token usage of this skill.

Commands

plugin-eval start <skill-path> --request "Evaluate this skill." --format markdown
plugin-eval analyze <skill-path> --format markdown
plugin-eval explain-budget <skill-path> --format markdown
plugin-eval measurement-plan <skill-path> --format markdown
plugin-eval init-benchmark <skill-path>
plugin-eval benchmark <skill-path> --dry-run

Reference

../../references/chat-first-workflows.md

evaluate-skill

Invocation

Context Preview

SKILL.md

evaluate-skill

Invocation

Context Preview

SKILL.md

Evaluate Skill

Workflow

Skill-Specific Priorities

Chat Requests To Recognize

Commands

Reference

Similar Skills

Evaluate Skill

Workflow

Skill-Specific Priorities

Chat Requests To Recognize

Commands

Reference

Similar Skills