Stats

Actions

Available In

Tags

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Arbor is an autonomous research agent that turns a long-horizon objective into a cumulative search. Give it a benchmark and a goal; it proposes hypotheses, edits code, runs real experiments, learns from the results, and keeps the improvements that hold up on held-out data. Instead of one-shot attempts that forget what failed, Arbor grows a hypothesis tree: every idea becomes a branch — pruned if it fails, harvested if it works — and insights propagate back so later ideas start smarter.

For more details, visit our project page and read the paper. For a more detailed usage manual, see our documentation. 🧭 You can also choose the CLI or Skill version depending on your environment and workflow.

🎬 Demo

📣 News

2026-06 — Built-in literature search & idea novelty checks. Arbor can now ground its research in prior work via the public alphaXiv API — zero config, no search endpoint or key. Novelty-check any idea before you build it with arbor idea-check "<your idea>", or let the Coordinator vet every new branch automatically. See Literature Search & Novelty Checks. 🔎

2026-06 — Arbor was featured by VentureBeat, one of the leading tech media outlets in the US: "New AI optimization framework beats Claude Code and Codex by 2.5x on the same compute budget". 📰

2026-06 — Arbor's native CLI runtime and Agent Skill Suite (Codex / Claude Code) are released. 🚀

2026-06 — The Arbor paper is released on arXiv. 🎉

💡 Why Arbor

General-purpose optimization — optimizes any task with a target to improve and a metric to measure, from model training to harness engineering to data synthesis.

Long-horizon structured exploration — the hypothesis-tree framework keeps results, failure modes, and distilled insights in the Idea Tree and propagates them upward, so later ideas start smarter instead of scrolling off.

Real experiment discipline — Executors iterate on a dev split, validate on a held-out test split, and only merge gains that clear a configurable margin — each in its own git worktree, so main is never touched until you merge.

Literature-grounded ideas — keyless search backends (alphaXiv + web) check an idea's novelty and prior art before spending compute, via node verdicts or arbor idea-check.

Model and workflow flexibility — Anthropic, OpenAI / Responses API, and OpenAI-compatible backends via LiteLLM (DeepSeek, Gemini, Qwen, vLLM, Ollama, …), usable as a native CLI or an Agent Skill Suite inside Codex / Claude Code.

Steerable — a live dashboard, read-only WebUI, optional human-in-the-loop review, and one-line domain plugins let you steer runs without touching the core.

🧩 Framework

Arbor runs two cooperating agents:

Arbor — Optimize anything

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

English | 简体中文

🎬 Demo

https://github.com/user-attachments/assets/49c1a306-d2e9-49d6-9c83-65e38a62df30

📣 News

2026-06 — Built-in literature search & idea novelty checks. Arbor can now ground its research in prior work via the public alphaXiv API — zero config, no search endpoint or key. Novelty-check any idea before you build it with arbor idea-check "<your idea>", or let the Coordinator vet every new branch automatically. See Literature Search & Novelty Checks. 🔎
2026-06 — Arbor was featured by VentureBeat, one of the leading tech media outlets in the US: "New AI optimization framework beats Claude Code and Codex by 2.5x on the same compute budget". 📰
2026-06 — Arbor's native CLI runtime and Agent Skill Suite (Codex / Claude Code) are released. 🚀
2026-06 — The Arbor paper is released on arXiv. 🎉

💡 Why Arbor

General-purpose optimization — optimizes any task with a target to improve and a metric to measure, from model training to harness engineering to data synthesis.
Long-horizon structured exploration — the hypothesis-tree framework keeps results, failure modes, and distilled insights in the Idea Tree and propagates them upward, so later ideas start smarter instead of scrolling off.
Real experiment discipline — Executors iterate on a dev split, validate on a held-out test split, and only merge gains that clear a configurable margin — each in its own git worktree, so main is never touched until you merge.
Literature-grounded ideas — keyless search backends (alphaXiv + web) check an idea's novelty and prior art before spending compute, via node verdicts or arbor idea-check.
Model and workflow flexibility — Anthropic, OpenAI / Responses API, and OpenAI-compatible backends via LiteLLM (DeepSeek, Gemini, Qwen, vLLM, Ollama, …), usable as a native CLI or an Agent Skill Suite inside Codex / Claude Code.
Steerable — a live dashboard, read-only WebUI, optional human-in-the-loop review, and one-line domain plugins let you steer runs without touching the core.

🧩 Framework

Arbor framework

Arbor runs two cooperating agents:

arbor

Popularity

What's Inside

Confidence

README

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

🎬 Demo

📣 News

💡 Why Arbor

🧩 Framework

Similar Plugins

autoresearch

claude-adaptive-research

omp

archora-research

clab

gyoshu

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

🎬 Demo

📣 News

💡 Why Arbor

🧩 Framework

Popularity

Health & Quality

Similar Plugins

autoresearch

claude-adaptive-research

omp

archora-research

clab

gyoshu