AI Agent Tool Designer

Overview

This skill provides comprehensive guidance for designing tools that AI agents can use effectively. Whether building custom agent tools or any AI-accessible interfaces, these principles maximize agent success in accomplishing real-world tasks.

Note: Use the more specific mcp-builder skill if you want to create an MCP server.

The quality of a tool system is measured not by how comprehensively it implements features, but by how well it enables AI agents to accomplish realistic, complex tasks using only the tools provided.

Agent-Centric Design Principles

Before implementing any tool system, understand these foundational principles for designing tools that AI agents can use effectively:

1. Build for Workflows, Not Just API Endpoints

Principle: Design thoughtful, high-impact workflow tools rather than simply wrapping existing API endpoints.

Why it matters: Agents need to accomplish complete tasks, not just make individual API calls. Tools that consolidate related operations reduce the number of steps agents must take and improve success rates.

How to apply:

Consolidate related operations (e.g., schedule_event that both checks availability and creates the event)
Focus on tools that enable complete tasks, not just individual API calls
Consider what workflows agents actually need to accomplish, not just what the underlying API offers
Ask: "What is the user trying to accomplish?" rather than "What does the API provide?"

Examples:

❌ Bad: Separate tools check_calendar_availability, create_calendar_event, send_event_notification
✅ Good: Single tool schedule_event with parameters for checking conflicts and sending notifications

2. Optimize for Limited Context

Principle: Agents have constrained context windows - make every token count.

Why it matters: When agents run out of context, they fail to complete tasks. Verbose tool outputs force agents to make difficult decisions about what information to keep or discard.

How to apply:

Return high-signal information, not exhaustive data dumps
Provide "concise" vs "detailed" response format options (default to concise)
Default to human-readable identifiers over technical codes (names over IDs when possible)
Consider the agent's context budget as a scarce resource
Implement character limits and graceful truncation (typically 25,000 characters)
Use pagination with reasonable defaults (20-50 items)

Examples:

❌ Bad: Return all 50 fields from user object including metadata, internal IDs, timestamps in multiple formats
✅ Good: Return name, email, role, and key status fields; offer detailed=true parameter for full data

3. Design Actionable Error Messages

Principle: Error messages should guide agents toward correct usage patterns, not just report failures.

Why it matters: Agents learn tool usage through feedback. Clear, educational errors help agents self-correct and succeed on retry.

How to apply:

Suggest specific next steps in error messages
Make errors educational, not just diagnostic
Include examples of correct usage when parameters are invalid
Guide agents toward solutions: "Try using filter='active_only' to reduce results"
Avoid technical jargon; use natural language

Examples:

❌ Bad: "Error 400: Invalid request"
✅ Good: "The limit parameter must be between 1-100. You provided 500. Try using limit=50 and pagination with offset to retrieve more results."

4. Follow Natural Task Subdivisions

Principle: Tool names and organization should reflect how humans think about tasks, not just API structure.

Why it matters: Agents use tool names and descriptions to decide which tool to call. Natural naming improves tool discovery and reduces wrong tool selections.

How to apply:

Tool names should reflect human mental models of tasks
Group related tools with consistent prefixes for discoverability
Design tools around natural workflows, not just API structure
Use action-oriented naming: search_users, create_project, send_message
Include service/system prefix to avoid conflicts: slack_send_message not just send_message

Examples:

❌ Bad: api_endpoint_users_post, api_endpoint_users_get, api_endpoint_users_delete
✅ Good: create_user, search_users, delete_user

5. Use Evaluation-Driven Development

Principle: Create realistic evaluation scenarios early and let agent feedback drive tool improvements.

Why it matters: Only by testing tools with actual agents can you discover usability issues. Prototype quickly and iterate based on real agent performance.

How to apply:

Create 10+ complex, realistic questions agents should answer using your tools
Test with actual AI agents attempting to solve these questions
Observe where agents struggle, make mistakes, or run out of context
Iterate on tool design based on agent feedback
Measure success by agent task completion rate, not feature completeness

Process:

Build initial tools based on these principles
Create evaluation questions (see Evaluation Guide)
Test with agents
Identify failure patterns
Refine tools
Repeat

Tool Design Framework

Follow this systematic framework when designing any tool for AI agents:

Phase 1: Planning

1. Identify Core Workflows

List the most valuable operations agents need to perform
Prioritize tools that enable the most common and important use cases
Consider which tools work together to enable complex workflows

2. Design Input Schemas

Use strong validation (dry-validation for Ruby, JSON Schema)
Include proper constraints (min/max length, regex patterns, ranges)
Provide clear, descriptive field descriptions with examples
Set sensible defaults to reduce required parameters

3. Design Output Formats

Support multiple formats (JSON for programmatic, Markdown for human-readable)
Define consistent response structures across similar tools
Plan for large-scale usage (thousands of users/resources)
Implement character limits and truncation strategies
Include pagination metadata (has_more, next_offset, total_count)

4. Plan Error Handling

Design clear, actionable, agent-friendly error messages
Handle authentication and authorization errors gracefully
Consider rate limiting and timeout scenarios
Provide guidance on how to proceed after errors

Phase 2: Implementation

Tool Naming Conventions:

Use snake_case: search_users, create_project
Include service prefix: github_create_issue, slack_send_message
Be action-oriented: start with verbs (get, list, search, create, update, delete)
Be specific: avoid generic names that could conflict

Tool Descriptions: Write comprehensive descriptions that include:

One-line summary of what the tool does
Detailed explanation of purpose and functionality
When to use this tool (and when NOT to use it)
Parameter descriptions with examples
Return value schema
Error handling guidance

Tool Annotations (if supported by your system):

readOnlyHint: true for read-only operations
destructiveHint: false for non-destructive operations
idempotentHint: true if repeated calls have same effect
openWorldHint: true if interacting with external systems

Phase 3: Refinement

Code Quality Checklist:

✅ No duplicated code between tools (DRY principle)
✅ Shared logic extracted into reusable functions
✅ Similar operations return similar formats (consistency)
✅ All external calls have error handling
✅ Full type coverage (type hints, TypeScript types)
✅ Every tool has comprehensive documentation

Testing:

Test with valid and invalid inputs
Test error handling paths
Test with real AI agents using evaluation questions
Test pagination and large result sets
Test character limits and truncation

Response Format Guidelines

All tools that return data should support multiple formats for flexibility:

JSON Format (`response_format="json"`)

Purpose: Machine-readable structured data for programmatic processing

Best practices:

Include all available fields and metadata
Use consistent field names and types
Suitable for when agents need to process data further
Return IDs alongside names for precision

Example:

{
  "users": [
    {
      "id": "U123456",
      "name": "John Doe",
      "email": "john@example.com",
      "role": "developer",
      "active": true
    }
  ],
  "total": 150,
  "count": 20,
  "has_more": true,
  "next_offset": 20
}

Markdown Format (`response_format="markdown"`, typically default)

Purpose: Human-readable formatted text for user presentation

Best practices:

Use headers, lists, and formatting for clarity
Convert timestamps to readable format ("2024-01-15 10:30 UTC" vs epoch)
Show display names with IDs in parentheses ("@john.doe (U123456)")
Omit verbose metadata (show one profile image URL, not all sizes)
Group related information logically
Use when presenting information to end users

Example:

## Users (20 of 150)

- **John Doe** (@john.doe)
  - Email: john@example.com
  - Role: Developer
  - Status: Active

- **Jane Smith** (@jane.smith)
  - Email: jane@example.com
  - Role: Designer
  - Status: Active

*Showing 20 results. Use offset=20 to see more.*

Pagination Best Practices

For tools that list resources:

Implementation requirements:

Always respect the limit parameter (never load all results when limit specified)
Implement offset-based or cursor-based pagination
Return pagination metadata: has_more, next_offset/next_cursor, total_count
Never load all results into memory for large datasets
Default to reasonable limits (20-50 items typical)

Response structure:

{
  "items": [...],
  "total": 150,
  "count": 20,
  "offset": 0,
  "has_more": true,
  "next_offset": 20
}

Clear guidance in responses: Include instructions for getting more data:

"Showing 20 of 150 results. Use offset=20 to see the next page."
"Results truncated. Add filters to narrow the search."

Character Limits and Truncation

To prevent overwhelming context windows:

Implementation:

Define CHARACTER_LIMIT constant (typically 25,000 characters)
Check response size before returning
Truncate gracefully with clear indicators
Provide guidance on how to filter/paginate for complete results

Example handling:

CHARACTER_LIMIT = 25_000

if result.length > CHARACTER_LIMIT
  truncated_data = data[0...[1, data.length / 2].max]
  response[:truncated] = true
  response[:truncation_message] =
    "Response truncated from #{data.length} to #{truncated_data.length} items. " \
    "Use 'offset' parameter or add filters like status='active' to see more."
end

Input Validation Best Practices

Security and usability:

Validate all parameters against schema before processing
Sanitize file paths to prevent directory traversal
Validate URLs and external identifiers
Check parameter sizes and ranges
Prevent command injection in system calls
Return clear validation errors with examples of correct format

Schema design:

Use strong validation (dry-validation, JSON Schema)
Include constraints (minLength, maxLength, pattern, minimum, maximum)
Provide detailed field descriptions with examples
Mark required vs optional parameters clearly
Set sensible defaults where possible

Resources

This skill includes reference documentation for deeper exploration:

references/tool_design_patterns.md

Comprehensive patterns and anti-patterns for common tool design scenarios with detailed examples.

references/evaluation_guide.md

Complete methodology for creating evaluation questions that test tool effectiveness with AI agents, including how to run evaluations and interpret results.

ai-tool-designer

AI Agent Tool Designer

Overview

Agent-Centric Design Principles

1. Build for Workflows, Not Just API Endpoints

2. Optimize for Limited Context

3. Design Actionable Error Messages

4. Follow Natural Task Subdivisions

5. Use Evaluation-Driven Development

Tool Design Framework

Phase 1: Planning

Phase 2: Implementation

Phase 3: Refinement

Response Format Guidelines

JSON Format (`response_format="json"`)

Markdown Format (`response_format="markdown"`, typically default)

Pagination Best Practices

Character Limits and Truncation

Input Validation Best Practices

Resources

references/tool_design_patterns.md

references/evaluation_guide.md

Further Reading

Related Skills

ai-tool-designer

AI Agent Tool Designer

Overview

Agent-Centric Design Principles

1. Build for Workflows, Not Just API Endpoints

2. Optimize for Limited Context

3. Design Actionable Error Messages

4. Follow Natural Task Subdivisions

5. Use Evaluation-Driven Development

Tool Design Framework

Phase 1: Planning

Phase 2: Implementation

Phase 3: Refinement

Response Format Guidelines

JSON Format (response_format="json")

Markdown Format (response_format="markdown", typically default)

Pagination Best Practices

Character Limits and Truncation

Input Validation Best Practices

Resources

references/tool_design_patterns.md

references/evaluation_guide.md

Further Reading

Related Skills

JSON Format (`response_format="json"`)

Markdown Format (`response_format="markdown"`, typically default)