From arbor
Coordinates the Arbor research loop: persistent ReAct cycle with Idea Tree state, INIT/OBSERVE/IDEATE/SELECT/DISPATCH/DECIDE protocol, tool mapping, and cycle caps. Use after setup and before phase-specific skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/arbor:arbor-agent-coordinatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this to run the strategic loop. The coordinator is a research commander,
Use this to run the strategic loop. The coordinator is a research commander, not the code author.
Run once at the start unless resuming.
TreeSetMeta:
baseline_score, trunk_score, eval_cmd, eval_cmd_test,
dataset_info, metric_direction, trunk_branch, and any
timeout/retry settings.eval_contract, prefill the matching metadata.If resuming, skip INIT and call TreeView to re-orient.
If the run is smoke-only, do not run expensive baselines or inherited real
eval commands. Persist a cheap cached-score parser or explicitly mocked score
as the eval command, set short timeout metadata, and mark dataset_info and
node reports as smoke-only.
Read code, logs, prior experiment reports, tree insights, failure cases, and
score patterns. Focus on failure classes and bottlenecks, not just symptoms.
For large logs, use arbor_state.py parse-log or normalize carriage returns
before matching metric lines. Do not flood context with full training logs
during smoke or forward tests.
TreeView(format="constraints") first.arbor-agent-ideate.Depth semantics:
Choose pending leaves using evidence, expected impact, feasibility, diversity,
and recoverable failure modes. Use TreeView(format="pending") or compact view.
Load arbor-agent-executor and dispatch:
RunExecutor(node_id, additional_context=...).RunExecutorParallel(tasks=[...]), usually 2-4 tasks.Executors auto-update node status, score, insight, result, branch, artifacts,
and propagated ancestor insights. If extraction is wrong, correct it with
TreeUpdateNode.
Scores in the tree are absolute B_dev metric values, not deltas.
Use arbor-agent-merge-eval for merge decisions.
Before stopping, run final B_test only if it is available, the contract permits
it, and the run is not smoke-only. Record test_trunk_score when the final
test run is valid.
Node statuses are:
pendingrunningdonemergedprunedEach node stores:
id, parent_id, children_ids, depthhypothesisstatusinsightresultscorecode_refrelated_workTree metadata stores:
baseline_score, trunk_scoretest_baseline_score, test_trunk_scoreeval_cmd, eval_cmd_testeval_timeout, eval_retries, retry backoffdataset_infometric_directiontrunk_branchsubmission_path, sample_submission_pathNative Arbor tools:
TreeView: compact/full/node/pending/constraints.TreeAddNode: add child with generated id.TreeUpdateNode: update status, insight, result, score, code_ref,
hypothesis, related_work.TreeSetMeta: persist evaluation metadata.TreePrune: mark a subtree pruned.TreePropagate: synthesize child insights upward.RunExecutor, RunExecutorParallel: run implementation agents.GitMergeBranch: B_test verify and merge.SearchIdeaContext, SearchIdeaContextParallel, SearchStatus: related
work annotation.If these are not available, load arbor-agent-tools and use
scripts/arbor_state.py as the state backend.
When using the fallback helper, serialize tree-mutating commands for the same
run. Do not launch meta, add, update, prune, propagate, eval,
record, worktree, or merge in parallel.
Count cycles once a node is done, merged, pruned, or failed. If the hard cap is reached, do not launch more executors. Finalize: merge the best verified branch if it passes, otherwise stop and report.
Use human questions only when genuinely blocked on information that cannot be
discovered locally. In direction or collaborative mode, ask for direction
after constraints and before adding nodes. In review modes, respect skipped
or edited ideas and executor gates.
npx claudepluginhub ruc-nlpir/arbor --plugin arborCoordinates multi-phase research workflow by loading and managing phase skills for setup, ideation, execution, merge evaluation, novelty search, and reporting.
Runs an autonomous 5-stage research loop that reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until a target metric is achieved or budget exhausted.
Runs a research workflow with baseline measurement, failure analysis, web research, and strategy generation for metric-driven optimization. Use when project has research_target configured.