From factor-researcher
Runs FactorMiner benchmark workflows: Table 1 Top-K freeze benchmark, memory/strategy ablations, transaction-cost pressure tests, and the full suite. Use to compare against baselines or reproduce paper results.
How this skill is triggered — by the user, by Claude, or both
Slash command
/factor-researcher:factor-benchmarkThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill runs FactorMiner's canonical benchmark surface — the rigorous comparison layer that turns a single run into evidence.
This skill runs FactorMiner's canonical benchmark surface — the rigorous comparison layer that turns a single run into evidence.
| Mode | What it answers |
|---|---|
table1 | Top-K freeze benchmark across configured universes vs. baselines — the headline reproduction. |
ablation-memory | How much does experience memory contribute? |
ablation-strategy | Effect of memory policy × dependence metric × backend. |
cost-pressure | How does the library hold up under rising transaction costs? |
efficiency | Operator- and factor-level runtime/compute cost. |
suite | The full benchmark suite in one run. |
factorminer -o output/bench benchmark table1 --data path/to/market_data.csv
factorminer -o output/bench benchmark suite --data path/to/market_data.csv
Pass a pre-mined library to benchmark a specific run rather than mining fresh:
factorminer -o output/bench benchmark table1 \
--data market_data.csv \
--factor-miner-library output/run1/factor_library.json
efficiency takes no data — it profiles the engine itself.
The CLI prints a per-universe summary (library IC, ICIR, avg |ρ|) and writes JSON payloads into the output directory. Fold those JSON files into the research note with factor-report --benchmark.
cost-pressure is the honesty check: a library that only wins at zero cost is not a result.npx claudepluginhub minihellboy/factorminer --plugin factor-researcherRuns the FactorMiner research engine to discover alpha factors from validated datasets via Ralph or Helix loops with causal validation, regime conditioning, and debate generation.
Designs structured benchmarks comparing algorithms, models, or implementations with metrics, test cases, hardware context, and tradeoff analysis to produce reproducible performance reports.
Guides building production-grade backtesting systems with realistic cost models, point-in-time data pipelines, and walk-forward analysis to avoid common biases.