From prodsec-skills
Mitigate prompt injection risks in LLM-based systems. Use when designing, building, or reviewing AI systems that accept user prompts, or when evaluating model safety for deployment.
How this skill is triggered — by the user, by Claude, or both
Slash command
/prodsec-skills:prompt-injection-mitigationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Prompt injection cannot be fully prevented. It can only be minimized. The approach combines multiple layers of defense rather than relying on a single control.
Prompt injection cannot be fully prevented. It can only be minimized. The approach combines multiple layers of defense rather than relying on a single control.
Use architectural components that reduce prompt injection probability:
guardrails/bidirectional-filtering)Model safety is primarily determined during pre-training and fine-tuning. If the solution does not pre-train or fine-tune its own models, select models that have been trained with safety as a priority.
| Evaluation Criteria | What to Look For |
|---|---|
| Safety benchmarks | Published safety evaluation scores and red-team results |
| Alignment training | RLHF, constitutional AI, or other alignment techniques applied |
| Known vulnerabilities | Check for disclosed prompt injection vulnerabilities |
| Provider reputation | Track record of the model provider on security and safety |
The best mitigation for prompt injection in agentic systems is keeping a human in the loop. Require explicit user confirmation before executing any sensitive or destructive action triggered by the LLM. This is especially critical for MCP-based agents where tool execution can have real-world impact.
Reduce the impact of successful prompt injection by constraining what the model can do:
eval_sandbox/output-validation-sandbox)Rate Limiting (API Gateway)
→ Input Guardrails (prompt filtering)
→ Safer Model (alignment training)
→ Output Guardrails (response filtering)
→ Output Validation Sandbox (if model generates actions)
Beyond prompt injection, address these related LLM risks:
npx claudepluginhub redhatproductsecurity/prodsec-skills --plugin prodsec-skillsAssesses AI/LLM application security including prompt injection, jailbreak resistance, OWASP LLM Top 10 (2025), RAG/agent security, and model supply chain risks. Maps findings to MITRE ATLAS and recommends mitigations.
Offensive checklist for AI/LLM security testing: prompt injection, jailbreaking, model extraction, training data poisoning, adversarial inputs, and LLM-assisted attack automation. Use for red-teaming and authorized security assessments of AI/ML systems.
Deploys Llama Guard, NeMo Guardrails, and LLM Guard as runtime defenses for LLM applications, blocking jailbreaks, injection, and toxic output.