From lightdash-agentops
Orchestrate evaluation runs and test case management for Lightdash agents.
How this skill is triggered — by the user, by Claude, or both
Slash command
/lightdash-agentops:run-lightdash-evalsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Skill for managing and executing evaluations for Lightdash AI agents.
Skill for managing and executing evaluations for Lightdash AI agents.
Enables the "Eval-Driven Development" workflow by providing tools to create evaluation suites, append test cases (prompts), execute evaluation runs, and analyze the results.
Wraps the following MCP tools from the lightdash-tools server:
ldt__list_agent_evaluationsldt__get_agent_evaluationldt__create_agent_evaluationldt__update_agent_evaluationldt__append_agent_evaluation_promptsldt__run_agent_evaluationldt__list_agent_evaluation_runsldt__get_agent_evaluation_run_resultsldt__delete_agent_evaluationlist_agent_evaluations, get_agent_evaluation, list_agent_evaluation_runs, get_agent_evaluation_run_results.create_agent_evaluation, update_agent_evaluation, append_agent_evaluation_prompts, run_agent_evaluation.delete_agent_evaluation.ldt__append_agent_evaluation_prompts to add 20-50 diverse test cases representing real-world user queries.ldt__run_agent_evaluation.ldt__list_agent_evaluation_runs.ldt__get_agent_evaluation_run_results.agent-tuner sub-agent to automatically process evaluation results for improvement.npx claudepluginhub yu-iskw/dbt-heroes --plugin lightdash-agentopsLists, inspects, runs, and manages PostHog AI observability evaluations (hog, llm_judge, sentiment). Use to debug failures, compare results, or prototype evaluators.
Runs evaluations on ADK agents: writing eval datasets, analyzing failures, comparing results, and optimizing agents using the Quality Flywheel methodology.
Produces a structured SHIP/ITERATE/BLOCK triage report from Copilot Studio evaluation results (CSV, summary, or plain text). Grounded in the Practical Guidance on Agent Evaluation 10-step playbook.