From autocode
Enforces a gated workflow for competitive programming problem creation using AutoCode MCP tools: problem statements, solutions, validators, generators, stress tests, and Polygon packaging.
How this skill is triggered — by the user, by Claude, or both
Slash command
/autocode:autocode-workflowThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
AutoCode is a Claude Code plugin for competitive programming problem setting. It exists because AI-generated problems often fail in subtle ways:
AutoCode is a Claude Code plugin for competitive programming problem setting. It exists because AI-generated problems often fail in subtle ways:
The workflow turns AI output into a gated pipeline. Do not skip gates.
Every workflow checkpoint should use:
decision: go / no_goblocking_issues: unmet gates or risksnext_actions: exact MCP calls needed to proceedIf a gate fails, stop progression and fix first.
Non-interactive problems:
problem_create
-> solution_build(sol)
-> solution_build(brute)
-> solution_analyze / solution_audit_std / solution_audit_brute
-> validator_build(accuracy >= 0.9)
-> generator_build
-> stress_test_run(completed_rounds == total_rounds) (if `special_judge` + `stress_comparison: "checker"`, run `checker_build` first; stress calls `checker(in, sol, brute)`; optional `stress_checker_bidirectional` adds `checker(in, brute, sol)` — see checker prompt)
-> checker_build(if non-exact output; non-SPJ default is after stress)
-> problem_validate
-> problem_generate_tests
-> problem_verify_tests(passed)
-> problem_pack_polygon
Interactive problems:
problem_create
-> solution_build(sol)
-> solution_build(brute)
-> solution_analyze / solution_audit_std / solution_audit_brute
-> interactor_build
-> generator_build
-> stress_test_run
-> problem_validate
-> problem_generate_tests
-> problem_verify_tests(passed)
-> problem_pack_polygon
The authoritative implementation is scripts/workflow_guard.py.
Interactive tasks are not just "no validator". They require an explicit protocol contract in the statement and a testlib interactor that can reject protocol violations. Treat missing protocol semantics as a blocker, the same severity as a missing validator for a non-interactive task.
| Gate | Requirement |
|---|---|
| Problem setup | problem_create must create directory structure and autocode.json |
| Standard solution | solution_build(solution_type="sol") succeeds |
| Brute solution | solution_build(solution_type="brute") succeeds after sol |
| Complexity audit | solution_analyze, solution_audit_std, and solution_audit_brute reviewed |
| Validator | Non-interactive only: validator_build returns valid accuracy >= 0.9 |
| Interactor | Interactive only: interactor_build is ready |
| Generator | generator_build succeeds after validator/interactor gate |
| Stress | stress_test_run completes all rounds |
| Statement validation | problem_validate passes samples and sample files |
| Final tests | problem_generate_tests creates final tests |
| Test verification | problem_verify_tests passes before packaging |
| Packaging | problem_pack_polygon only after verified tests |
Use these agents when the risk is material:
autocode-idea-auditor: before implementation, especially if the idea has unclear constraints, multiple valid outputs, or interaction.autocode-solution-auditor: after std/brute exist, before relying on stress results or final generation.autocode-package-auditor: before problem_pack_polygon, especially when wrong solutions, checker, interactor, or custom answer extension are involved.problem_createCreates:
autocode.jsonsolutions/files/statements/README.mdstatements/tutorial.mdtests/Statement format requirement for statements/README.md:
Do not infer that a problem is ready from file presence alone. Prefer structured tool results and workflow state.
solution_buildBuild sol before brute.
For non-trivial problems, brute must be independent enough to serve as an oracle. If it is the same algorithm as sol, mark this as a risk and run solution_audit_brute.
Before writing brute, perform a quick counterexample check on paper:
solution_analyze and audit toolsUse:
solution_analyze to estimate time/space complexity, risk notes, and recommended stress profiles.solution_audit_std to check std complexity and constraint mismatch.solution_audit_brute to check whether brute can support stress testing.Do not accept a claimed complexity without evidence.
validator_buildNon-interactive problems need a validator with evidence. A validator build without effective accuracy is not enough.
Target:
accuracy >= 0.9interactor_buildInteractive problems use interactor_build instead of validator_build and checker_build.
Require an explicit interaction protocol in the statement before final packaging. The protocol must define:
Interactor implementation requirements:
registerInteraction(argc, argv);inf and optional jury data from ans;tout, not std::cout;tout.flush() after every message to the contestant;ouf;quitf(_ok/_wa/_pe/_fail, ...);interaction_scenarios in interactor_build that cover AC, wrong final answer, malformed command, out-of-range query, query-limit boundary, exceeding query limit, and premature EOF.generator_buildGenerator should implement semantically distinct strategies:
type=1: tiny / exhaustive / sanity;type=2: random coverage;type=3: boundary and extreme constraints;type=4: targeted worst-case or TLE-inducing patterns.type=4 must not be only "same as type=3 but with max parameters".
stress_test_runUse multiple profiles when possible:
tiny_exhaustiverandom_smalledge_smallUse stress_test_run advisory fields (complexity_context, n_max_advisory) to choose n_max based on audit evidence instead of fixed thresholds.
Proceed only when completed_rounds == total_rounds.
problem_validateValidation failure is a release blocker. Do not generate final tests or package until statement samples and sample files pass.
problem_generate_testsFinal tests should include at least half limit-oriented cases (type=3 + type=4) when candidates are available.
For type=4 profiles with extra_args (for example mode=tle_dense), the runner may fallback to a no-extra-args retry when the generator rejects those args, to preserve compatibility with older generators.
For long-running generation:
resume=true;problem_cleanup_processes only when cleanup is needed.problem_verify_testsMust pass before packaging. Default checks include:
file_countanswer_consistencyvalidatorno_emptylimit_ratiolimit_semanticsWith special_judge: true and stress_comparison: "checker" (and a built files/checker), answer_consistency and wrong_solution_kill use the testlib checker against jury answers; with stress_comparison: "exact" they still compare strings to the answer files.
Use wrong_solution_kill when wrong solutions are available. Wrong entries honor manifest expected: default fail means at least one test must reject the binary; pass means all tests must accept it (checker AC or exact match to .ans).
Run autocode-verify <problem_dir> for quick manifest/path checks; under the checker workflow it also surfaces spj_warnings if checker.cpp or the compiled checker is missing.
problem_pack_polygonOnly package after problem_verify_tests(passed=true).
Each problem should maintain autocode.json as a readable contract. It should describe:
wrong solutions and optional per-entry expected: fail | pass for wrong_solution_kill);special_judge, stress_comparison (exact | checker), and optional stress_checker_bidirectional (only meaningful with special_judge + stress_comparison=checker).Use autocode-verify <problem_dir> for quick structural checks.
problem_verify_tests passes.When a step fails:
decision=no_go.blocking_issues.Never patch around a failed gate by skipping it.
npx claudepluginhub sztu-acm/autocodeValidates expected outputs in competitive programming problem statements against actual solution output. Ensures sample correctness before test generation and blocks release on mismatches.
Evaluates TandemKit Generator output against specs using Codex as second opinion. Autonomous verification loops via bash state watchers and signals until pass or user intervention.
Auto-loop execution workflow with quality gates. Use when starting any non-trivial implementation task. Provides automatic task decomposition, code implementation, testing (L1-L4), and iterative quality gates until completion. Invoke with /autoworker.