Skill

autocode-workflow

Enforces a gated workflow for competitive programming problem creation using AutoCode MCP tools: problem statements, solutions, validators, generators, stress tests, and Polygon packaging.

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/autocode:autocode-workflow

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

AutoCode is a Claude Code plugin for competitive programming problem setting. It exists because AI-generated problems often fail in subtle ways:

SKILL.md

272 lines · ~2.7k tokens

Stats

LanguagePython

Stars17

Forks2

MaintenanceExcellent

Last CommitMay 25, 2026

Actions

View Source View Plugin View on GitHub View README

AutoCode Problem Creation Workflow

AutoCode is a Claude Code plugin for competitive programming problem setting. It exists because AI-generated problems often fail in subtle ways:

statement is ambiguous or samples do not match the intended solution;
standard solution has hidden bugs;
claimed complexity is wrong;
brute solution is not a reliable oracle;
generator misses edge cases and TLE patterns;
final tests do not kill wrong solutions;
package is built before statement, tests, and manifest are consistent.

The workflow turns AI output into a gated pipeline. Do not skip gates.

Status Output Contract

Every workflow checkpoint should use:

decision: go / no_go
blocking_issues: unmet gates or risks
next_actions: exact MCP calls needed to proceed

If a gate fails, stop progression and fix first.

Core Sequence

Non-interactive problems:

problem_create
  -> solution_build(sol)
  -> solution_build(brute)
  -> solution_analyze / solution_audit_std / solution_audit_brute
  -> validator_build(accuracy >= 0.9)
  -> generator_build
  -> stress_test_run(completed_rounds == total_rounds) (if `special_judge` + `stress_comparison: "checker"`, run `checker_build` first; stress calls `checker(in, sol, brute)`; optional `stress_checker_bidirectional` adds `checker(in, brute, sol)` — see checker prompt)
  -> checker_build(if non-exact output; non-SPJ default is after stress)
  -> problem_validate
  -> problem_generate_tests
  -> problem_verify_tests(passed)
  -> problem_pack_polygon

Interactive problems:

problem_create
  -> solution_build(sol)
  -> solution_build(brute)
  -> solution_analyze / solution_audit_std / solution_audit_brute
  -> interactor_build
  -> generator_build
  -> stress_test_run
  -> problem_validate
  -> problem_generate_tests
  -> problem_verify_tests(passed)
  -> problem_pack_polygon

The authoritative implementation is scripts/workflow_guard.py.

Interactive tasks are not just "no validator". They require an explicit protocol contract in the statement and a testlib interactor that can reject protocol violations. Treat missing protocol semantics as a blocker, the same severity as a missing validator for a non-interactive task.

Mandatory Gates

Gate	Requirement
Problem setup	`problem_create` must create directory structure and `autocode.json`
Standard solution	`solution_build(solution_type="sol")` succeeds
Brute solution	`solution_build(solution_type="brute")` succeeds after sol
Complexity audit	`solution_analyze`, `solution_audit_std`, and `solution_audit_brute` reviewed
Validator	Non-interactive only: `validator_build` returns valid `accuracy >= 0.9`
Interactor	Interactive only: `interactor_build` is ready
Generator	`generator_build` succeeds after validator/interactor gate
Stress	`stress_test_run` completes all rounds
Statement validation	`problem_validate` passes samples and sample files
Final tests	`problem_generate_tests` creates final tests
Test verification	`problem_verify_tests` passes before packaging
Packaging	`problem_pack_polygon` only after verified tests

Audit Agents

Use these agents when the risk is material:

autocode-idea-auditor: before implementation, especially if the idea has unclear constraints, multiple valid outputs, or interaction.
autocode-solution-auditor: after std/brute exist, before relying on stress results or final generation.
autocode-package-auditor: before problem_pack_polygon, especially when wrong solutions, checker, interactor, or custom answer extension are involved.

Tool Guidance

`problem_create`

Creates:

autocode.json
solutions/
files/
statements/README.md
statements/tutorial.md
tests/

Statement format requirement for statements/README.md:

title;
time/memory limits;
optional background;
problem description;
input format (must include all variable ranges and aggregate constraints);
output format;
samples (numbered in ascending order when multiple);
explanation (sample explanations must be placed here, not mixed into sample blocks; only representative samples need explanation).

Do not infer that a problem is ready from file presence alone. Prefer structured tool results and workflow state.

`solution_build`

Build sol before brute.

For non-trivial problems, brute must be independent enough to serve as an oracle. If it is the same algorithm as sol, mark this as a risk and run solution_audit_brute.

Before writing brute, perform a quick counterexample check on paper:

verify brute directly simulates problem constraints;
reject simplifications that change mandatory selection semantics.

`solution_analyze` and audit tools

Use:

solution_analyze to estimate time/space complexity, risk notes, and recommended stress profiles.
solution_audit_std to check std complexity and constraint mismatch.
solution_audit_brute to check whether brute can support stress testing.

Do not accept a claimed complexity without evidence.

`validator_build`

Non-interactive problems need a validator with evidence. A validator build without effective accuracy is not enough.

Target:

accuracy >= 0.9
valid inputs include normal, boundary, and maximum cases;
invalid inputs include near-valid but illegal formats/ranges.

`interactor_build`

Interactive problems use interactor_build instead of validator_build and checker_build.

Require an explicit interaction protocol in the statement before final packaging. The protocol must define:

who outputs first;
hidden input/range/randomness/adaptiveness;
every query command and final-answer command;
judge response format and meaning;
numeric query/round limits;
flush requirement after every contestant output;
immediate termination behavior after final answer;
verdict for malformed tokens, out-of-range arguments, too many queries, premature EOF, blocked protocol, and extra output.

Interactor implementation requirements:

use registerInteraction(argc, argv);
read testcase data from inf and optional jury data from ans;
write to the contestant via tout, not std::cout;
call tout.flush() after every message to the contestant;
read contestant output via ouf;
end every branch with quitf(_ok/_wa/_pe/_fail, ...);
include scripted interaction_scenarios in interactor_build that cover AC, wrong final answer, malformed command, out-of-range query, query-limit boundary, exceeding query limit, and premature EOF.

`generator_build`

Generator should implement semantically distinct strategies:

type=1: tiny / exhaustive / sanity;
type=2: random coverage;
type=3: boundary and extreme constraints;
type=4: targeted worst-case or TLE-inducing patterns.

type=4 must not be only "same as type=3 but with max parameters".

`stress_test_run`

Use multiple profiles when possible:

tiny_exhaustive
random_small
edge_small

Use stress_test_run advisory fields (complexity_context, n_max_advisory) to choose n_max based on audit evidence instead of fixed thresholds.

Proceed only when completed_rounds == total_rounds.

`problem_validate`

Validation failure is a release blocker. Do not generate final tests or package until statement samples and sample files pass.

`problem_generate_tests`

Final tests should include at least half limit-oriented cases (type=3 + type=4) when candidates are available.

For type=4 profiles with extra_args (for example mode=tle_dense), the runner may fallback to a no-extra-args retry when the generator rejects those args, to preserve compatibility with older generators.

For long-running generation:

warn the user that new chat messages can interrupt MCP calls;
if interrupted, use resume=true;
use problem_cleanup_processes only when cleanup is needed.

`problem_verify_tests`

Must pass before packaging. Default checks include:

file_count
answer_consistency
validator
no_empty
limit_ratio
limit_semantics

With special_judge: true and stress_comparison: "checker" (and a built files/checker), answer_consistency and wrong_solution_kill use the testlib checker against jury answers; with stress_comparison: "exact" they still compare strings to the answer files.

Use wrong_solution_kill when wrong solutions are available. Wrong entries honor manifest expected: default fail means at least one test must reject the binary; pass means all tests must accept it (checker AC or exact match to .ans).

Run autocode-verify <problem_dir> for quick manifest/path checks; under the checker workflow it also surfaces spj_warnings if checker.cpp or the compiled checker is missing.

`problem_pack_polygon`

Only package after problem_verify_tests(passed=true).

Manifest

Each problem should maintain autocode.json as a readable contract. It should describe:

problem name;
interactive or non-interactive mode;
statement/tutorial paths;
time and memory limits;
solution roles (including wrong solutions and optional per-entry expected: fail | pass for wrong_solution_kill);
case plan;
optional SPJ fields: special_judge, stress_comparison (exact | checker), and optional stress_checker_bidirectional (only meaningful with special_judge + stress_comparison=checker).

Use autocode-verify <problem_dir> for quick structural checks.

Forbidden Actions

Do not build brute before sol.
Do not build generator before validator/interactor gate.
Do not run stress before sol, brute, and generator are ready.
Do not validate/package based on file presence alone.
Do not generate final tests before statement validation passes.
Do not package before problem_verify_tests passes.
Do not ignore hook denial; fix the missing prerequisite instead.

Failure Recovery

When a step fails:

Report decision=no_go.
List blocking_issues.
Identify whether the fault is in statement, std, brute, validator, generator, checker/interactor, or tests.
Fix the failed artifact.
Re-run the failed gate and any downstream gate whose assumptions changed.

Never patch around a failed gate by skipping it.

autocode-workflow

Popularity

Invocation

Context Preview

SKILL.md

autocode-workflow

Popularity

Invocation

Context Preview

SKILL.md

AutoCode Problem Creation Workflow

Status Output Contract

Core Sequence

Mandatory Gates

Audit Agents

Tool Guidance

problem_create

solution_build

solution_analyze and audit tools

validator_build

interactor_build

generator_build

stress_test_run

problem_validate

problem_generate_tests

problem_verify_tests

problem_pack_polygon

Manifest

Forbidden Actions

Failure Recovery

Similar Skills

AutoCode Problem Creation Workflow

Status Output Contract

Core Sequence

Mandatory Gates

Audit Agents

Tool Guidance

problem_create

solution_build

solution_analyze and audit tools

validator_build

interactor_build

generator_build

stress_test_run

problem_validate

problem_generate_tests

problem_verify_tests

problem_pack_polygon

Manifest

Forbidden Actions

Failure Recovery

Similar Skills

`problem_create`

`solution_build`

`solution_analyze` and audit tools

`validator_build`

`interactor_build`

`generator_build`

`stress_test_run`

`problem_validate`

`problem_generate_tests`

`problem_verify_tests`

`problem_pack_polygon`

`problem_create`

`solution_build`

`solution_analyze` and audit tools

`validator_build`

`interactor_build`

`generator_build`

`stress_test_run`

`problem_validate`

`problem_generate_tests`

`problem_verify_tests`

`problem_pack_polygon`