Use when needing to create validators for LLM prompts in a specific domain - provides systematic process for analyzing domain expertise, classifying prompt types, and generating domain-specific validation frameworks that check for expert-level effectiveness, not just coverage
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Generate domain-specific LLM prompt validators through systematic analysis, not ad-hoc checklisting.
Core principle: Validators must check whether prompts transfer expert behavior, not just cover domain topics.
Execution: Use subagent architecture (see "Execution Method" section below)
6-Phase Process (Complete ALL phases IN ORDER):
Don't skip phases. Don't generate validator until Phase 5. Document each phase.
Use when:
Don't use when:
This skill REQUIRES subagent-based execution to prevent circular validation.
Three-Agent Architecture:
Step 1: Main Agent identifies domain
Step 2: Launch Subagent 1 for Validator Creation (Phases 1-5)
Use the Task tool to launch a subagent:
Task tool parameters:
- subagent_type: "general-purpose"
- prompt: "Create a validator for [domain name] prompts. Complete Phases 1-5 of the prompt-validator-generator skill."
- description: "Create validator for [domain]"
CRITICAL: Do NOT pass the target prompt to Subagent 1. Pass ONLY the domain name.
Subagent 1 will:
Step 3: Launch Subagent 2 for Meta-Validation (Phase 6)
After receiving validator from Subagent 1, launch another subagent:
Task tool parameters:
- subagent_type: "general-purpose"
- prompt: "Here's a [domain] validator: [paste validator]. Test it on these prompts: [target prompt + test prompts]. Complete Phase 6 of the prompt-validator-generator skill."
- description: "Meta-validate [domain] validator"
NOW you can pass the target prompt to Subagent 2 for testing.
Subagent 2 will:
Why this architecture is mandatory:
Single-agent execution NOT RECOMMENDED:
CRITICAL: Do NOT look at the prompt you're validating yet. Analyze the DOMAIN independently first.
Common failure: Looking at a prompt and basing validator on "what does this prompt contain?" This creates circular validation where the prompt defines its own success criteria.
Correct approach: Analyze expert behavior in the domain FIRST, independently of any specific prompt. The validator checks if prompts enable expert behavior, not if prompts match an example.
Note: If you're executing this skill via subagent (as recommended above), you won't have access to the target prompt during this phase - which is exactly the point.
Before creating validator, analyze the domain:
Question: What do experts do naturally in this domain that novices don't?
IMPORTANT: Answer based on domain knowledge, NOT by looking at prompts you're validating.
Wrong approach:
Right approach:
Analyze:
Sources for this analysis:
Examples across domains:
Software debugging:
Experts: Investigate root cause, form hypotheses, add instrumentation
Novices: Jump to solutions, try random fixes, no stopping criteria
Teaching mathematics:
Experts: Connect concepts to prior knowledge, use multiple representations, diagnose misconceptions
Novices: Show procedures without conceptual links, single approach, assume understanding
Financial advising:
Experts: Assess risk tolerance, consider tax implications, integrate estate planning, adapt to life changes
Novices: Focus on returns only, ignore taxes/estate, one-size-fits-all approach
Medical diagnosis:
Experts: Differential diagnosis, probabilistic reasoning, consider comorbidities, update based on tests
Novices: Pattern match to common conditions, binary thinking, ignore context
Question: What goes wrong when non-experts work in this domain?
Categories:
Examples across domains:
Software (TDD):
- Writing code before tests ("too simple to test")
- Testing after implementation ("achieves same goals")
Teaching (mathematics):
- Teaching procedures without concepts ("they just need the formula")
- Skipping prerequisite checks ("they should know this")
Financial advising:
- Recommending products without risk assessment ("high returns are good")
- Ignoring client's emotional relationship with money
Medical diagnosis:
- Anchoring on first impression ("it's probably just X")
- Ordering tests before clinical reasoning ("let's see what shows up")
Domain-specific (unique to this domain):
Universal (apply to all domains):
The validator must check BOTH.
Determine what type of prompt you're validating:
Note: Prompts often combine multiple types. Identify the PRIMARY type to guide validation focus.
| Type | Purpose | Validation Focus | Examples |
|---|---|---|---|
| Enforcement | Make violations costly, prevent shortcuts | Will they follow discipline under pressure? | TDD, safety protocols, compliance |
| Guidance/Advisory | Help with judgment and complex decisions | Will they make expert-level choices? | Investment allocation, teaching strategy, legal strategy |
| Diagnostic | Identify problems and root causes | Will they diagnose accurately? | Medical diagnosis, debugging, troubleshooting |
| Analytical | Break down and examine information | Will they analyze deeply/systematically? | Argument analysis, data interpretation |
| Evaluative | Judge quality against standards | Will they assess accurately? | Code review, essay grading, evaluation |
| Generative/Creative | Create original content | Will they produce quality output? | Story writing, design, composition |
| Synthesis | Combine multiple sources coherently | Will they integrate effectively? | Research synthesis, lit reviews |
| Planning/Strategic | Develop actionable plans | Will they create realistic plans? | Project planning, strategic planning |
| Transformation | Convert between formats/forms | Will they transform accurately? | Translation, summarization, conversion |
| Explanation/Teaching | Build understanding and mental models | Will they grasp and explain concepts? | Math concepts, theory, principles |
| Procedural | Provide step-by-step instructions | Will they follow steps correctly? | Recipes, tutorials, protocols |
| Interactive/Conversational | Guide ongoing dialogue | Will they conduct effective conversations? | Tutoring, coaching, customer service |
If your prompt doesn't fit: Define custom type by answering:
Core validation approaches by type:
Enforcement: Check for explicit requirements (MUST/MANDATORY), rationalization prevention, consequences for violations, red flags
Guidance/Advisory: Check for trade-off frameworks, context-adaptation guidance, decision-making support, tacit knowledge capture
Diagnostic: Check for systematic investigation process, differential consideration, evidence requirements, stopping criteria
Analytical: Check for framework/methodology, depth vs surface distinction, logical rigor, assumption identification
Evaluative: Check for clear criteria, severity/priority guidance, bias prevention, actionable feedback structure
Generative/Creative: Check for quality standards, creativity constraints, style/voice guidance, iteration/refinement process
Synthesis: Check for integration methodology, source evaluation, coherence standards, citation/attribution guidance
Planning/Strategic: Check for goal-to-task decomposition, risk consideration, resource allocation, timeline realism, contingency planning
Transformation: Check for accuracy verification, semantic preservation, format requirements, edge case handling
Explanation/Teaching: Check for mental models, concrete examples, conceptual accuracy, progressive complexity, misconception prevention
Procedural: Check for step completeness, prerequisite clarity, error recovery, verification checkpoints
Interactive/Conversational: Check for context tracking, turn-taking guidance, empathy/tone calibration, goal orientation
Before generating the validator, capture the expertise that will fill it.
DO NOT generate the validator yet. Phase 5 will do that. This phase PREPARES the expertise.
For each evaluation dimension from Phase 1, document:
Software examples:
Teaching examples:
Financial examples:
Medical examples:
Examples across domains:
Examples across domains:
Question: How do domain experts validate effectiveness? How do they establish ground truth?
Critical for meta-validation design: This determines HOW you'll test your validator.
Ask yourself:
Examples across domains:
Software debugging:
Experts compare: Systematic investigation vs random fixes
Methodology: Track hypothesis count, fix attempts, root cause identification
Ground truth: Did they find actual root cause vs symptom fix?
Mathematics teaching:
Experts compare: Conceptual understanding vs procedural fluency
Methodology: Student explanation quality, transfer to new problems
Ground truth: Can students explain WHY, not just HOW?
Financial advising:
Experts compare: Personalized advice vs generic recommendations
Methodology: Risk-return alignment, tax efficiency, behavioral coaching quality
Ground truth: Would expert advisor give same recommendation?
Skill quality (meta):
Experts compare: Skill-guided output vs expert output (no skill)
Methodology: Agent A (expert), Agent B (skill-guided), Agent C (analyzer)
Ground truth: Does skill-guided output match expert-level output?
Why this matters:
Document:
Before generating the validator, plan how you'll test it.
Critical questions to answer:
Based on Phase 3.4 (domain validation methodology), determine:
What comparison will prove your validator works?
Common approaches:
A. With/Without Comparison (most common):
B. Expert/Novice Comparison:
C. Scenario-Based Testing:
D. Agent-Based Comparison (for prompts guiding agent behavior):
Choose methodology based on:
Example for skill quality domain:
Methodology: Agent-based comparison
- Agent A: Expert skill writer (no framework)
- Agent B: Using prompt-validator-generator skill
- Agent C: Compare outputs for quality, systematic process
Proves: Skill transfers systematic process for validator creation
Document your chosen methodology:
NOW generate your validator, incorporating ALL previous phases.
CRITICAL: This is where validator creation happens. Not before.
Your validator must incorporate:
IMPORTANT: Phase 4 methodology should SHAPE your validator design:
If Phase 4 chose With/Without Comparison:
If Phase 4 chose Expert/Novice Comparison:
If Phase 4 chose Scenario-Based Testing:
If Phase 4 chose Agent-Based Comparison:
Your validator isn't just documented with the methodology - it's DESIGNED to be tested by it.
Create validator with this structure:
# [Domain] Prompt Validator
## Domain Analysis Summary
[Phase 1 output: Expert patterns, failure modes, domain-specific vs universal concerns]
## Prompt Type Classification
[Phase 2 output: Type from taxonomy, validation focus]
## Validation Approach
[Why this approach based on prompt type]
## Evaluation Dimensions
### Domain-Specific Dimensions (3-7 dimensions)
**Dimension 1: [Name] ([Weight]%)**
**What expert behavior:** [From Phase 3.1]
**Evaluation criteria:**
- **Strong (4.5-5.0):** [Detailed criteria incorporating tacit knowledge from Phase 3.2]
- **Adequate (3.5-4.4):** [Criteria]
- **Weak (2.5-3.4):** [Criteria]
- **Poor (1.0-2.4):** [Criteria]
**Red flags:** [Anti-patterns from Phase 1.2]
**Context factors:** [From Phase 3.3]
[Repeat for each domain dimension]
### Universal Quality Dimensions (2-3 dimensions)
**Dimension X: Actionability**
[Standard universal dimension]
**Dimension Y: Context Adaptation**
[Standard universal dimension]
## Scoring Methodology
**Weights:**
- Domain dimensions: [weights from Phase 3]
- Universal dimensions: [weights]
**Thresholds:**
- Expert-level: ≥ [threshold from Phase 2 approach]
- Critical dimensions: [any must-exceed thresholds]
## Anti-Patterns
[From Phase 1.2 - common failure modes as checklist]
## Meta-Validation Plan
[From Phase 4 - how you'll test this validator]
Your validator MUST:
Check behavior, not coverage: Every dimension validates expert behavior or judgment, not topic mentions
Include tacit knowledge: Make implicit expert knowledge explicit in evaluation criteria
Be domain-specific: 3-7 dimensions unique to this domain based on Phase 1 analysis
Be type-appropriate: Validation approach matches prompt type from Phase 2
Have clear criteria: Each score level (5.0, 4.0, 3.0, 2.0, 1.0) has specific, observable criteria
Include anti-patterns: Common failure modes from Phase 1.2 as red flags
Be testable: Can be applied consistently using methodology from Phase 4, with dimensions and criteria specifically designed to support that testing approach
Don't create validator that:
GATE: Cannot proceed to Phase 5 until Phase 4 is complete with documented testing methodology.
Before moving to Phase 6, verify:
If any checkbox is unchecked, fix before Phase 6.
Test your validator using the methodology designed in Phase 4.
Apply the methodology you designed in Phase 4:
If you chose:
Document results:
Strong prompt test:
Weak prompt test:
Document both tests:
Can a bad prompt pass your validation?
Test by creating prompt that:
If it passes → validator checks coverage, not effectiveness
Example test prompt: "[Create a coverage-only prompt for your domain that lists all relevant topics but provides no expert guidance on when/how/why to apply them]"
Can a good prompt fail your validation?
Test with prompt that:
If it fails → validator too rigid or structural
Example test prompt: "[Take an unconventional but effective prompt that achieves expert-level results through different structure or approach]"
If possible, compare validator results against domain expert judgment:
Process:
Correlation check:
If expert judgment unavailable:
Document:
Use TodoWrite to track these steps:
Phase 1: Complete domain analysis
Phase 2: Classify prompt type
Phase 3: Capture domain expertise
Phase 4: Design testing methodology
Phase 5: Generate complete validator
Phase 6: Meta-validate and refine
Your validator is ready when:
Problem: Basing validator on what the prompt you're validating contains
How it happens:
Why it's wrong:
Example of circular validation:
❌ Wrong sequence:
1. Look at TDD prompt
2. See it has "Write test first" section
3. Create validator dimension "Has 'write test first' section"
4. TDD prompt passes (because we based validator on it)
5. Other prompts fail even if they enforce test-first differently
This validates structure, not behavior.
Correct sequence:
✓ Right sequence:
1. Analyze TDD domain: Experts write tests before code
2. Identify failure mode: Developers rationalize skipping tests
3. Create dimension: "Does prompt ENFORCE test-first with consequences?"
4. Test TDD prompt: Does it prevent writing code before tests?
5. Test any prompt structure: Can detect enforcement regardless of format
This validates behavior, not structure.
Red flags you're doing circular validation:
Fix: Complete Phase 1 domain analysis WITHOUT looking at any prompts. Base dimensions on domain expertise, not prompt contents.
Problem: Checklist of topics to cover, not behaviors to enforce
Examples:
Software:
Bad: "Does prompt cover security, performance, scalability?"
Good: "Does prompt guide threat modeling at trust boundaries?"
Teaching:
Bad: "Does prompt cover multiplication methods?"
Good: "Does prompt guide method selection based on student understanding?"
Financial:
Bad: "Does prompt mention diversification?"
Good: "Does prompt guide diversification based on risk capacity and timeline?"
Fix: Every dimension must check for expert behavior or judgment
Problem: Creating validator without understanding domain expertise
Symptom: Generic quality criteria that could apply to any prompt
Fix: Complete Phase 1 (Domain Analysis) before writing validator
Problem: Same scoring approach for all prompt types
Example: Using behavioral enforcement criteria for explanation prompts
Fix: Adapt scoring to prompt type (Phase 2 classification)
Problem: Validating explicit knowledge only
Example: Checks for "mentions X" instead of "guides when to apply X vs Y"
Fix: Capture expert heuristics and instincts in evaluation criteria
Problem: Deploying validator without testing it
Symptom: Validators that pass everything or fail everything
Fix: Phase 6 (Meta-Validation) with known strong/weak examples
ALWAYS include:
ADAPT based on:
All of these mean: Go back to Phase 1 and follow the systematic process.
| Rationalization | Reality |
|---|---|
| "I'll base validator dimensions on what this prompt contains" | CIRCULAR VALIDATION. Dimensions come from domain expertise, not prompt structure. Identify domain, then analyze independently. |
| "This prompt is good, I'll validate if others match it" | CIRCULAR VALIDATION. Validator checks behavior, not structural similarity. One prompt doesn't define success criteria. |
| "I'll extract what this prompt does and check for that" | CIRCULAR VALIDATION. Analyze domain independently first. Prompt contents don't define expert behavior. |
| "I already know this domain well" | Your implicit knowledge won't transfer to the validator. Phase 1 makes it explicit. |
| "This prompt type is obvious" | Classification determines validation approach. Phase 2 ensures you choose correctly. |
| "I can do domain analysis mentally" | Undocumented analysis = other validators can't learn from it. Document Phase 1. |
| "Meta-validation takes too long" | 15 minutes of testing prevents deploying broken validators. Phase 6 is mandatory. |
| "The user needs this quickly" | Quick broken validator wastes more time than systematic correct validator. |
| "I'll follow the spirit not letter" | Skipping phases = missing critical elements. Follow the process. |
| "This domain doesn't fit the taxonomy" | Define custom type (instructions in Phase 2.1). Still follow the 6 phases. |
| "I'll use an example as template" | Examples are illustrations, not templates. Each domain needs analysis. |
| "Domain analysis is obvious from prompt type" | Type suggests focus, analysis reveals specifics. Both required. |
| "I can combine phases to save time" | Phases build on each other. Skipping = incomplete validators. |
All of these mean: Complete all 6 phases in order. The process exists because shortcuts fail.
Top 3 are CIRCULAR VALIDATION - the most critical failure mode. Correct sequence:
Phase 1: Domain Analysis
Expert patterns: Root-cause investigation, hypothesis testing, instrumentation
Failure modes: Random fixes, premature solutions, infinite retries
Domain-specific: Systematic process, rationalization prevention
Universal: Actionability, examples, context adaptation
Phase 2: Prompt Type
Type: Enforcement (prevents shortcuts under pressure)
Focus: Will they investigate before fixing?
Phase 3: Domain Expertise
Expert behavior: Investigate systematically, form hypotheses, gather evidence
Tacit knowledge: "3+ failed fixes = question architecture"
Context factors: Emergency vs routine, critical vs experimental
Validation methodology: Track hypothesis count, root cause identification
Phase 4: Testing Design
Methodology: With/Without comparison
Test prompts WITH systematic investigation enforcement
Test prompts WITHOUT enforcement (random fix approach)
Validator should distinguish between them
Phase 5: Validator Generated
Domain dimensions:
1. Systematic Process Enforcement (25%)
2. Rationalization Prevention (30%)
3. Evidence Gathering (20%)
4. Hypothesis Testing (15%)
5. Handling Uncertainty (10%)
Universal: Actionability, Context Adaptation
Scoring: Weighted, threshold ≥3.5, enforcement ≥4.0
Phase 1: Domain Analysis
Expert patterns: Multiple representations, misconception diagnosis, conceptual connections
Failure modes: Procedural only, assumed understanding, single method
Domain-specific: Representation selection, error analysis, conceptual depth
Universal: Clarity, examples, progressive complexity
Phase 2: Prompt Type
Type: Guidance (helps with pedagogical decisions)
Focus: Will they select appropriate teaching strategies?
Phase 3: Domain Expertise
Expert behavior: Select representations based on student understanding
Tacit knowledge: "Student errors reveal misconceptions, not stupidity"
Context factors: Grade level, prior knowledge, learning disabilities
Validation methodology: Student explanation quality, transfer to new problems
Phase 4: Testing Design
Methodology: Expert/Novice comparison
Test expert-level teaching prompts (conceptual + procedural)
Test novice-level prompts (procedural only)
Validator should discriminate correctly
Phase 5: Validator Generated
Domain dimensions:
1. Representation Guidance (25%)
2. Misconception Diagnosis (25%)
3. Conceptual Connection (20%)
4. Differentiation Strategy (15%)
5. Assessment Integration (15%)
Universal: Actionability, Context Adaptation
Scoring: Balanced, threshold ≥4.0
Phase 1: Domain Analysis
Expert patterns: Holistic planning, risk-return alignment, behavior coaching, tax integration
Failure modes: Product-focused, returns-only, ignores emotions, forgets taxes
Domain-specific: Risk assessment, tax efficiency, behavior management
Universal: Clarity, examples, context sensitivity
Phase 2: Prompt Type
Type: Guidance (complex judgment decisions)
Focus: Will they make client-appropriate recommendations?
Phase 3: Domain Expertise
Expert behavior: Assess risk through emotional AND financial capacity
Tacit knowledge: "Behavior gaps cost more than fee differences"
Context factors: Age, risk tolerance, life stage, tax situation
Validation methodology: Would expert advisor give same recommendation?
Phase 4: Testing Design
Methodology: Scenario-based testing
Create scenarios: young investor, near-retirement, high net worth
Test prompts across scenarios
Validator should detect context-inappropriate advice
Phase 5: Validator Generated
Domain dimensions:
1. Risk Assessment Depth (25%)
2. Tax Integration (20%)
3. Behavior Coaching (20%)
4. Life Stage Adaptation (20%)
5. Goal Prioritization (15%)
Universal: Actionability, Context Adaptation
Scoring: Balanced, threshold ≥4.0, risk assessment ≥4.5
After creating validator, consider:
Creating prompt validators IS domain expertise analysis made systematic.
Don't jump to checklist creation. Follow the process:
The result: Validators that check whether prompts transfer expert behavior, not just cover topics.