Skill

scoring

Applies a 100-point deterministic scoring rubric to NL artifacts with penalty tables per artifact type. Includes calibration examples for anchoring borderline quality judgments.

code-quality

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/nlpm:scoring

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

100-point quality scale for all NL programming artifacts. Apply penalties deterministically. Use calibration examples to anchor judgment on borderline cases.

Supporting Files

references/calibration-examples.md

SKILL.md

432 lines · ~5.4k tokens(exceeds 5k compaction limit)

Stats

LanguageHTML

Stars79

Forks23

MaintenanceExcellent

Last CommitJun 25, 2026

Actions

View Source View Plugin View on GitHub View README

NLPM Quality Scoring Rubric

100-point quality scale for all NL programming artifacts. Apply penalties deterministically. Use calibration examples to anchor judgment on borderline cases.

Scoring Formula

base_score = 100
adjustments = sum of all applicable penalties (all penalties are negative)
final_score = max(0, min(100, base_score + adjustments))

Penalties stack. The floor is 0; the ceiling is 100. No bonuses — the default assumption is that an artifact is well-formed, and quality is measured by what is missing or wrong.

Penalty Tables

Skills

Rule	Check	Condition	Penalty
--	`name` present	Missing	-25
--	`name` matches parent directory	Frontmatter `name:` value does not equal parent directory name (per `nlpm:conventions` §5 — open spec MUST)	-15
R04	`description` present	Missing	-25
R04	Trigger quality	Description is generic (≤1 specific phrase)	-15
R04	Description length	Description 500–800 chars	-5
R04	Description length	Description >800 chars	-10
R05	Body length	400–500 lines	-5
R05	Body length	>500 lines	-10
R06	Code examples	Complex concepts with no examples	-5
R06	Code examples	No examples at all in a technical skill	-10
R06	`<example>` blocks	Zero `<example>` blocks on a `user_invocable: true` skill	-10
R07	Scope note	No scope note / cross-references	-3

Scope-note discipline: R07 means "scope note when related skills exist." Do NOT apply R07 to missing example blocks — that is the new R06 row above (penalty -10, not -15). The 2026-05-13 lijigang/ljg-skills audit applied R07 + −15 fourteen times for missing example blocks; both labels were wrong (R07 is not example-related, and -15 is the agents penalty, not the skills penalty). The validator at auditor/scripts/validate-rule-ids.py catches this kind of drift in CI.

name matches parent directory (added 2026-05-25, audit: google/skills): the open Agent Skills spec at agentskills.io makes this a MUST. Mismatch is deterministic, high-confidence, and reproducible by single-line diff (frontmatter name: vs basename($(dirname FILE))). Mark such findings confidence: high per the manifest-vs-disk-diff principle in agents/scorer.md step 6. Note: this penalty did not exist before 2026-05-25 — re-scoring past audits will yield slightly lower scores for any corpus containing this defect, but no contribute outcomes are retroactively affected since no PRs were ever opened against a name-mismatch finding under the prior rubric.

Agents

Rule	Check	Condition	Penalty
R09	`description` present	Missing	-25
R09	`<example>` blocks	Exactly 1 example	-5
R09	`<example>` blocks	Zero examples	-15
R10	`model` declared	Not declared	-5
R10	`model` appropriate	Wrong tier for task (e.g. opus for parsing)	-5
R11	`tools` declared	Not declared	-5
R11	Unused tools	Each tool declared but not used in body	-3 each
R12	Output format	No output format spec in body	-10
R11	Write on read-only	Audit/review/scan agent declares Write or Edit	-10

Commands

Rule	Check	Condition	Penalty
--	`description` present	Missing	-25
R18	`argument-hint` present	Command takes input but no hint	-5
R14	Steps numbered	Multi-step body with no numbered steps	-10
R15	Empty input handling	No handling for empty/missing input	-10
R16	Output format	No output format defined	-10
R17	Error paths	No error handling for missing files or bad data	-5

Shared Partials

Rule	Check	Condition	Penalty
R19	`user-invocable: false`	Missing or set to true	-25
R20	Purpose clear	Description doesn't state it's a partial	-10

Rules

Rule	Check	Condition	Penalty
R21	`description` present	Missing frontmatter description	-10
R21	Format: bold imperative	No bold imperative opening	-5
R21	Format: rationale	No rationale following the imperative	-10
R22	Enforceability	Rule is not specific/testable	-10
R23	Budget	Rule file over 500 lines	-15
R26	Conflicts with other rules	Direct contradiction with another rule in same set	-20
R24	Duplicates tooling	Re-states what eslint/ruff/clippy already catches	-10

Hooks — universal checks (apply to all tools)

Rule	Check	Condition	Penalty
--	Valid syntax	Hook config file fails to parse (JSON or TOML per tool)	-25
R29	Scripts exist	Referenced script file does not exist	-20
--	Command safety	Hook command contains dangerous patterns (`rm -rf`, `git push --force`, `DROP TABLE`)	-15
--	Matcher regex valid	Matcher pattern doesn't compile as valid regex	-10
--	Timeout reasonable	Hook specifies timeout > 30s (likely hangs)	-5

Hooks (Claude Code — Tier 2-Claude only)

Authoritative event list: nlpm:conventions-claude §7. Per the multi-tool design (analysis/multi-tool-design-2026-05.md decision #4), Claude / Codex / Antigravity hook event vocabularies are NOT 1:1 mappable — three separate tables, no translation.

Rule	Check	Condition	Penalty
R27	Event names valid (Claude)	Uses unrecognized event name — confirmed Claude events: `SessionStart`, `SessionEnd`, `UserPromptSubmit`, `PreToolUse`, `PostToolUse`, `PermissionRequest`, `Stop`, `StopFailure`, `FileChanged`	-15
R27	Case correct (Claude)	Event name has wrong case (e.g. `pretooluse`)	-10
--	Hook type valid (Claude)	Uses unrecognized `type` value — confirmed Claude types: `command`, `http`, `mcp_tool`, `prompt`, `agent`	-10
--	MCP matcher format (Claude)	Matcher targets MCP tool but doesn't use `mcp__<server>__<tool>` pattern	-5

Hooks (Codex CLI — Tier 2-Codex only)

Authoritative event list: nlpm:conventions-codex §6.

Rule	Check	Condition	Penalty
R27	Event names valid (Codex)	Uses unrecognized event name — confirmed Codex events: `SessionStart`, `UserPromptSubmit`, `PreToolUse`, `PostToolUse`, `PermissionRequest`, `PreCompact`, `PostCompact`, `SubagentStart`, `SubagentStop`, `Stop`	-15
R27	Case correct (Codex)	Event name has wrong case	-10
--	Hooks config key (Codex)	`config.toml` uses deprecated `[features].codex_hooks` instead of `[features].hooks` (renamed ~CLI 0.129+)	-5 (advisory)

Hooks (Antigravity / Gemini lineage — Tier 2-Antigravity only) — ADVISORY

Authoritative event list: nlpm:conventions-antigravity §5. All Antigravity-specific hook scoring is advisory-only (confidence:low) until the Antigravity 2.0 spec stabilizes — see analysis/multi-tool-design-2026-05.md decision #3.

Rule	Check	Condition	Penalty
R27	Event names valid (Gemini lineage)	Uses unrecognized event name — confirmed events: `SessionStart`, `BeforeAgent`, `BeforeModel`, `BeforeToolSelection`, `BeforeTool`, `AfterTool`, `AfterModel`, `AfterAgent`, `SessionEnd`, `Notification`, `PreCompress`	-10 (advisory)
R27	Case correct (Gemini lineage)	Event name has wrong case	-5 (advisory)

plugin.json (Claude Code — `.claude-plugin/plugin.json`)

Check	Condition	Penalty
`name` present	Missing	-25
`version` is semver	Present but not valid semver	-10
`description` present	Missing	-5

.codex-plugin/plugin.json (Codex CLI — Tier 2-Codex only)

Schema reference: nlpm:conventions-codex §3.

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
`name` present	Missing	-25
`name` kebab-case	Mixed case or underscores	-10
`version` is semver	Present but not valid semver	-10
`description` present	Missing	-5
Component paths relative	`skills`/`mcpServers`/`apps`/`hooks` paths absolute or missing `./` prefix	-5 each

.agents/plugins/marketplace.json (Codex marketplace — Tier 2-Codex)

Schema reference: nlpm:conventions-codex §4. Schema is largely compatible with Claude's .claude-plugin/marketplace.json.

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
`name` present	Missing	-25
`plugins` array present	Missing or empty	-10
Per-plugin `source` valid	`source.source` not in `github`/`git`/`local`, or required `repo`/`path` missing	-10 each
Per-plugin `category` present	Missing (informational, helps marketplace navigation)	-3 each

agents/openai.yaml (Codex skill sidecar — Tier 2-Codex)

Schema reference: nlpm:conventions-codex §2.

Check	Condition	Penalty
Valid YAML	File fails YAML parse	-25
Sidecar is colocated	`agents/openai.yaml` not in same directory as a `SKILL.md`	-10
`interface.display_name` present	Missing	-5 (informational)

gemini-extension.json (Gemini/Antigravity — Tier 2-Antigravity) — ADVISORY

Schema reference: nlpm:conventions-antigravity §3. All Antigravity-specific manifest scoring is advisory-only until the post-2026-06-18 Antigravity spec stabilizes.

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
`name` present	Missing	-25
`version` present	Missing	-10
`contextFileName` includes `AGENTS.md`	Single-tool projects use only `GEMINI.md`; multi-tool should include `AGENTS.md`	-3 (advisory; multi-tool nudge)

.gemini/commands/*.toml (Gemini slash commands — legacy/transitional, Tier 2-Antigravity)

Schema reference: nlpm:conventions-antigravity §4.

Check	Condition	Penalty
Valid TOML	File fails TOML parse	-25
`prompt` field present	Missing required field	-25
`description` field present	Missing (auto-generated from filename, but explicit is better)	-3

.mcp.json (Claude Code — `.mcp.json` at repo root)

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
Server `command` present	MCP server entry missing `command` field	-15

.codex/config.toml (Codex configuration — Tier 2-Codex)

Schema reference: nlpm:conventions-codex §5.

Check	Condition	Penalty
Valid TOML	File fails TOML parse	-25
Deprecated `[features].codex_hooks`	Should be `[features].hooks` (renamed ~CLI 0.129)	-5 (advisory)
Per-MCP `command` present	`[mcp_servers.<id>]` table missing `command` field	-15 each

.lsp.json (Claude Code LSP — Tier 2-Claude)

Schema details: nlpm:conventions-claude §12. Stable in 2026.

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25

monitors/monitors.json (Claude Code monitors — Tier 2-Claude)

Schema details: nlpm:conventions-claude §13. Stable in 2026.

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25

Settings Files (.claude/settings.json, .claude/settings.local.json)

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
No hardcoded secrets	Contains API keys, tokens, or passwords	-25
Permission mode sanity	`bypassPermissions` enabled in a shared project settings file (not .local)	-15
Recognized keys	Contains unknown top-level keys not in Claude Code schema	-5 each, cap -15
Hook definitions valid	`hooks` key present — check event names valid and case-correct	-10 per invalid

CLAUDE.md

Rule	Check	Condition	Penalty
R49	File exists	No CLAUDE.md in plugin root	-10
--	Under 200 lines	CLAUDE.md exceeds 200 lines	-5
R38	Actionable content	CLAUDE.md has no actionable guidance (just filler)	-10
R33	Build/run command	No instructions for how to build or run the project	-10
R34	Test command	No instructions for how to run tests	-5
R35	Architecture overview	No structure/component description (what lives where)	-5
R36	Valid `@` imports	Contains `@` import syntax referencing a file that doesn't exist	-10
R37	No stale file references	Mentions files or functions that no longer exist in the repo	-10
R38	Actionability ratio	>60% of content is description rather than instructions	-5
--	Prerequisites section	No section covering required tools, versions, or setup steps	-5
R39	No rule conflicts	CLAUDE.md says X while a `.claude/rules/` file says not-X	-15

Memory Files

Applies to .md files located in ~/.claude/projects/*/memory/ directories.

Rule	Check	Pass (+0)	Penalty
--	Has YAML frontmatter	Present	-15
--	`name` in frontmatter	Present	-10
--	`description` in frontmatter	Present	-10
--	`type` in frontmatter	Present (`user`/`feedback`/`project`/`reference`)	-5
--	Content matches declared type	Yes	-10
--	Referenced in MEMORY.md index	Yes	-5 (orphaned memory)
R37	Stale content	No references to removed files or functions	-10

Agent Workflow Programs (project-root `program.md`-style files)

A new artifact type recognized 2026-05-28 (see nlpm:conventions §2 and auditor/exemplars/karpathy-autoresearch.md): a project-root Markdown file driving an autonomous agent loop, hybrid between a memory file (AGENTS.md-shaped context) and a slash command (numbered workflow with output format + error paths).

No type-specific penalty rows. This artifact type is scored as the UNION of:

Command rules R14–R17: numbered steps for multi-phase work, empty-input handling, output format, error paths.
Memory file rules R33–R39: build/run commands, architecture overview, no stale refs, instructive-not-descriptive.
Universal R01 (vague quantifiers) and R03 (positive framing).

Type-specific penalty rows are deferred until N ≥ 3 examples surface — the existing rules cover the artifact adequately as a hybrid, and inventing rows from N = 1 risks over-fitting (calibrated per the same discipline applied to multi-tool discovery deferrals).

Patterns this artifact type rewards (loaded on demand from nlpm:patterns):

P10 (numeric anchoring of subjective principles)
P11 (paired CAN/CANNOT contract)
P12 (autonomy instruction + rationale + fallback ladder)
P13 (vivid closing use-case)

All Artifact Types: Vague Quantifiers

Rule	Check	Condition	Penalty
R01	Vague quantifier	Each occurrence of: "appropriate", "relevant", "as needed", "sufficient", "adequate", "reasonable", "properly", "correctly", "some", "several", "various" without measurable criteria	-2 each
R01	Vague quantifier cap	Total vague quantifier penalty	max -20

All Artifact Types: Vocabulary Drift (R51 — opt-in, disabled by default)

Applied only when R51: { enabled: true, vocabulary_skill: <path> } appears in .claude/nlpm.local.md. Without the opt-in, R51 contributes zero penalty regardless of artifact content. The configured vocabulary_skill must contain a registry.yaml listing canonical and deprecated terms; without it, R51 emits an advisory and contributes zero penalty.

Rule	Check	Condition	Penalty
R51	Deprecated synonym	Each occurrence of a term marked `deprecated:` in the project's `registry.yaml`, in the scope the artifact belongs to	-2 each
R51	Drift cap	Total R51 penalty	max -10 per file
R51	Missing registry	`enabled: true` but `vocabulary_skill:` not set or points to a directory with no `registry.yaml`	0 (advisory only)

Why opt-in: vocabulary discipline is high-leverage for projects with accumulated drift but premature for projects still discovering their domain. Each project decides when it has enough literary warrant (P6) to lock terms in. See analysis/vocabulary-design-principles.md for the six principles R51 operationalizes.

Cross-Component (--plugin flag)

Applied when linting an entire plugin rather than individual files.

Check	Condition	Penalty
Broken partial refs	Command references `commands/shared/X.md` that doesn't exist	-20
Broken skill refs	Agent references `plugin:skill` that isn't installed	-20
Missing scripts	Hook references script that doesn't exist	-20
Orphaned files	Agent/command/skill file not referenced by anything	-5 per file
Contradictions	Two rules/instructions in same plugin directly contradict each other	-15 per pair

Score Bands

Range	Label	Meaning
90–100	Excellent	Production-ready; minor or no issues
80–89	Good	Solid; one or two non-critical gaps
70–79	Adequate	Meets threshold; noticeable gaps to address
60–69	Weak	Below threshold; significant issues
<60	Rewrite	Fundamental problems; recommend rewriting from scratch

Default pass threshold: 70. Configurable in .claude/nlpm.local.md.

Calibration Examples

Four worked examples — Excellent Agent (95), Rewrite Agent (41), Excellent Rule (92), Weak Rule (40) — live in references/calibration-examples.md. Load that file on demand when scoring a borderline case (around band boundaries: 88-92, 68-72, 58-62) and you need an anchored reference.

The examples are not needed for routine scoring — the penalty tables above are self-contained. They were extracted from this file 2026-05-28 to keep the rubric under R05's 500-line body budget while preserving the calibration material verbatim.

Scope Note

This skill covers the NLPM scoring formula, penalty tables, score bands, and calibration examples. It does NOT cover:

Artifact schemas and valid field values → see nlpm:conventions (universal) and the tool-specific overlays nlpm:conventions-claude, nlpm:conventions-codex, nlpm:conventions-antigravity (the latter three created in PR-B; see analysis/multi-tool-design-2026-05.md)
Patterns and anti-patterns catalog → see nlpm:patterns
How to run the score command → see commands/score.md

Multi-tool scoring (PR-B landed 2026-05-25)

nlpm now scores artifacts across three tool ecosystems — Claude Code, Codex CLI, and Antigravity (which absorbs Gemini CLI on 2026-06-18). The tier classification in agents/scorer.md separates open-spec (Tier 1), Tier 1.5 open-spec corpora, and per-tool Tier 2 overlays (2-Claude / 2-Codex / 2-Antigravity).

Hooks are scored per tool (Claude / Codex / Antigravity tables above). The three tools' event vocabularies are not 1:1 mappable; no universal translation layer. See analysis/multi-tool-design-2026-05.md decision #4.
Codex-specific artifacts scored: .codex-plugin/plugin.json, .agents/plugins/marketplace.json, .codex/config.toml, agents/openai.yaml sidecars.
Antigravity-specific artifacts scored (advisory-only until spec stabilizes): gemini-extension.json, .gemini/commands/*.toml, Antigravity hook events.
Claude-specific additions in 2026: .lsp.json, monitors/monitors.json (validate JSON-parse only until detailed schemas land). New SKILL.md fields documented in nlpm:conventions-claude.

Known False Positive Patterns

The following findings have historically been reported by the scorer despite having no backing in this rubric. They MUST NOT be penalized:

Invalid finding	Why it is invalid
Missing `namespace:` on skill	Not in the skill schema; `conventions` §5 does not list it
Missing inline `hooks:`/`skills:` registration blocks in plugin.json	`conventions` §1 defines these as optional path strings
`AskUserQuestion` / `Task` / `WebFetch` flagged as undocumented tool	Built-in per `conventions` §14
Agent missing `skills:` when omission is documented in CLAUDE.md	Intentional architectural choice
plugin.json missing `engines:` / `minClaudeVersion:` / `main:`	All optional per `conventions` §1
plugin.json description shorter than sibling marketplace.json description	Desynchronization ≠ defect; only penalize if required field is absent

When in doubt: if a finding cannot be cited to a specific row in the penalty tables above, drop it.

scoring

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

scoring

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

NLPM Quality Scoring Rubric

Scoring Formula

Penalty Tables

Skills

Agents

Commands

Shared Partials

Rules

Hooks — universal checks (apply to all tools)

Hooks (Claude Code — Tier 2-Claude only)

Hooks (Codex CLI — Tier 2-Codex only)

Hooks (Antigravity / Gemini lineage — Tier 2-Antigravity only) — ADVISORY

plugin.json (Claude Code — .claude-plugin/plugin.json)

.codex-plugin/plugin.json (Codex CLI — Tier 2-Codex only)

.agents/plugins/marketplace.json (Codex marketplace — Tier 2-Codex)

agents/openai.yaml (Codex skill sidecar — Tier 2-Codex)

gemini-extension.json (Gemini/Antigravity — Tier 2-Antigravity) — ADVISORY

.gemini/commands/*.toml (Gemini slash commands — legacy/transitional, Tier 2-Antigravity)

.mcp.json (Claude Code — .mcp.json at repo root)

.codex/config.toml (Codex configuration — Tier 2-Codex)

.lsp.json (Claude Code LSP — Tier 2-Claude)

monitors/monitors.json (Claude Code monitors — Tier 2-Claude)

Settings Files (.claude/settings.json, .claude/settings.local.json)

CLAUDE.md

Memory Files

Agent Workflow Programs (project-root program.md-style files)

All Artifact Types: Vague Quantifiers

All Artifact Types: Vocabulary Drift (R51 — opt-in, disabled by default)

Cross-Component (--plugin flag)

Score Bands

Calibration Examples

Scope Note

Multi-tool scoring (PR-B landed 2026-05-25)

Known False Positive Patterns

Similar Skills

NLPM Quality Scoring Rubric

Scoring Formula

Penalty Tables

Skills

Agents

Commands

Shared Partials

Rules

Hooks — universal checks (apply to all tools)

Hooks (Claude Code — Tier 2-Claude only)

Hooks (Codex CLI — Tier 2-Codex only)

Hooks (Antigravity / Gemini lineage — Tier 2-Antigravity only) — ADVISORY

plugin.json (Claude Code — .claude-plugin/plugin.json)

.codex-plugin/plugin.json (Codex CLI — Tier 2-Codex only)

.agents/plugins/marketplace.json (Codex marketplace — Tier 2-Codex)

agents/openai.yaml (Codex skill sidecar — Tier 2-Codex)

gemini-extension.json (Gemini/Antigravity — Tier 2-Antigravity) — ADVISORY

.gemini/commands/*.toml (Gemini slash commands — legacy/transitional, Tier 2-Antigravity)

.mcp.json (Claude Code — .mcp.json at repo root)

.codex/config.toml (Codex configuration — Tier 2-Codex)

.lsp.json (Claude Code LSP — Tier 2-Claude)

monitors/monitors.json (Claude Code monitors — Tier 2-Claude)

Settings Files (.claude/settings.json, .claude/settings.local.json)

CLAUDE.md

Memory Files

Agent Workflow Programs (project-root program.md-style files)

All Artifact Types: Vague Quantifiers

All Artifact Types: Vocabulary Drift (R51 — opt-in, disabled by default)

Cross-Component (--plugin flag)

Score Bands

Calibration Examples

Scope Note

Multi-tool scoring (PR-B landed 2026-05-25)

plugin.json (Claude Code — `.claude-plugin/plugin.json`)

.mcp.json (Claude Code — `.mcp.json` at repo root)

Agent Workflow Programs (project-root `program.md`-style files)

plugin.json (Claude Code — `.claude-plugin/plugin.json`)

.mcp.json (Claude Code — `.mcp.json` at repo root)

Agent Workflow Programs (project-root `program.md`-style files)