Search everything...

Stats

Actions

Available In

curry-train

Name: curry-train
Author: curryfromuestc

By curryfromuestc

Methodology-first deep learning training framework. Idea is cheap; infrastructure that lets you validate ideas fast is valuable.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Popularity

Stars

Med: 0·Avg: 281

Installs

Med: 0·Avg: 1

What's Inside

Agents4

failure-diagnoser agent

/failure-diagnoser

Use this agent to diagnose a failed or stalled training run by reading recent logs, metrics, config, and traces. Trigger when the user asks "why did this crash", reports NaN/OOM/divergence, or shows an unhealthy loss curve. Produces a structured diagnosis with ranked candidate causes and fixes.

hpo-proposer agent

/hpo-proposer

Use this agent to propose an Optuna search space and kick off a hyperparameter study. Trigger when the user asks to "tune hyperparameters", "set up an Optuna sweep", "search over LR and weight decay", or "find the best hyperparameters for this experiment". Operates only on configs that already passed Stage 3 pre-validation.

runs-diff agent

/runs-diff

Use this agent to compare two completed training runs and produce a concise variance-aware markdown diff (config, metrics, stability, verdict). Trigger when the user asks "did this change help", "is run A better than run B", or "compare two experiments". Reads run journals only; does not re-run training.

scaffolder agent

/scaffolder

Use this agent to scaffold a new model package (config.py, model.py, checkpoint.py, protocol.py + Hydra config) inside a curryTrain project. Trigger when the user asks to "add a new model called X", "scaffold an experiment", or "generate a curryTrain model from this HF model".

Skills47

bench

/bench

Run a short, reproducible benchmark of one optimizer step (forward + backward + optimizer step over N microbatches) using the project's registered runtime. Activate when the user asks to "benchmark a training step", "measure throughput", "time one optimizer step", or "smoke test the runtime". Wraps run_accumulated_step from curry_train.benchmark.

diagnose

/diagnose

Diagnose a training failure or stall by inspecting recent logs, loss curves, OOM traces, NaN events, and config. Activate when the user asks "why did my training crash", "loss went to NaN", "OOM during step X", "training is not improving", or "help me debug this run". Delegates to the failure-diagnoser agent.

infra-fabric-launch

/infra-fabric-launch

Lightning Fabric integration recipe — minimal 5-line setup that gives DDP / FSDP / mixed precision / mixed-precision while keeping a raw PyTorch training loop. Activate when the user asks "Lightning Fabric", "torchrun", "DDP setup", "FSDP setup", "mixed precision", or wires up the launch script.

infra-hydra-config

/infra-hydra-config

Hydra + OmegaConf configuration layout for curryTrain projects — composable defaults, structured configs, CLI override syntax, sweep integration. Activate when the user asks "Hydra setup", "config management", "compose configs", "override CLI", "Hydra defaults list", or builds the experiment configuration.

infra-optuna-sweep

/infra-optuna-sweep

Concrete recipe for running an Optuna-driven hyperparameter sweep through Hydra, with TPE/CMA-ES/Hyperband, distributed multi-rank trials, study persistence, and per-trial run journal. Activate when the user asks "set up an Optuna sweep", "run hyperparameter search", "Hydra Optuna sweeper", or "parallel HPO".

Stats

Version0.2.0

LanguagePython

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitMay 4, 2026

AddedMay 4, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

curry-train

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

curryTrain

A methodology-first deep learning training framework, packaged as a Claude Code plugin.

Idea is cheap. Infrastructure that lets you validate ideas fast is valuable.

中文 README → README.zh.md

What it is

curryTrain organizes deep learning training around the actual end-to-end workflow, not around an algorithm catalog. The plugin provides Skills, Agents, and a minimal Python template that scaffolds a new training project and assists you through six well-defined stages.

Six-stage methodology

Stage	Question it answers	Representative skills
1. Skeleton	Does the architecture exist and does data flow through it?	scaffolder, preflight-asserts, data-pipeline
2. Sanity	Is the implementation actually correct?	overfit-single-batch, init-loss-check, grad-flow-viz
3. Pre-validate	Will this idea pay off, before I burn the compute?	lr-range-test, small-scale-ablation, multi-seed-variance, mup-coord-check, scaling-fit, surrogate-task, compute-budget, kill-criterion
4. Scale-up	Will it scale stably to the target size?	capacity-sweep, optuna-integration, parallel-primitive-intro
5. Stabilize	Will it survive a long run?	warmup-cosine, loss-spike-rollback, checkpoint-cadence, run-journal
6. Iterate	Which experiment was actually better?	variance-aware-decision, error-cluster, ablation-matrix, runs-diff

Stage 3 is where most projects waste compute and where curryTrain provides the most differentiated value.

Components

47 skills organized as: 1 user-invoked (slash) + 4 workflow + 24 methodology + 14 primitive + 4 infra. Only /curry-train:init is exposed as a slash command; the other 46 skills auto-activate from natural-language phrasing in your messages.
4 specialized agents: scaffolder, hpo-proposer, failure-diagnoser, runs-diff
0 hooks, 0 MCP servers — the plugin stays light and explicit
Python template at template/curry_train/ — a minimal layered scaffold (Runtime / Primitive / Model) you copy into your project via /curry-train:init

Installation

Option A — Claude Code marketplace (recommended)

In Claude Code, run:

/plugin marketplace add curryfromuestc/curry-train
/plugin install curry-train@curry-train

This adds the GitHub repo as a marketplace and installs the curry-train plugin from it. After installation, the /curry-train:init slash command and all description-activated skills (workflow, methodology, primitive, infra) become available in your sessions.

Option B — Local development install

If you cloned this repo locally and want to edit the plugin while using it:

git clone https://github.com/curryfromuestc/curry-train.git
mkdir -p ~/.claude/plugins
ln -s "$(pwd)/curry-train" ~/.claude/plugins/curry-train

Reload Claude Code (or run /reload-plugins) and the plugin will be picked up.

Option C — Per-session plugin dir

claude --plugin-dir /path/to/curry-train

Quick start

/curry-train:init is the only explicit slash command; everything else activates from natural-language phrasing.

# Bootstrap a new training project (copies the Python template into ./curry_train)
/curry-train:init my-experiment

Then drive the rest of the workflow by describing what you want:

"scaffold a new model called my-model" → new-experiment skill (Stage 1)
"smoke-test the runtime / time one optimizer step" → bench skill
"this run crashed / loss went to NaN, help me debug" → diagnose skill
"compare run A and run B / did this change actually help?" → runs-diff skill
"how do I check my init loss is reasonable", "find a learning rate", "is my improvement real or noise" → the matching methodology skill auto-activates

This is by design: the methodology lives in skills and trips on what you describe, so you don't have to memorize a command surface.

Stack opinions (V1)

Hydra + OmegaConf for config
Lightning Fabric (not the Trainer) for distributed launch
Optuna for hyperparameter search
Logger protocol with TensorBoard as the default backend (no lock-in to W&B / MLflow)
torchrun for launch (no custom launcher)

Philosophy

Architecture inspired by NVIDIA Bumblebee's three-layer split (Runtime ↔ Primitive ↔ Model). Workflow inspired by Karpathy's "A Recipe for Training Neural Networks". Built for engineers who train models — including unconventional ones (SNN, CV, multimodal) — and need fast, trustworthy iteration.

The framework intentionally keeps the Python core small. The framework's value is in methodology (skills), not in re-implementing what Lightning Fabric / Accelerate / DeepSpeed already do well.

Layout

View full README on GitHub

curry-train

Popularity

What's Inside

Confidence

README

curryTrain

What it is

Six-stage methodology

Components

Installation

Option A — Claude Code marketplace (recommended)

Option B — Local development install

Option C — Per-session plugin dir

Quick start

Stack opinions (V1)

Philosophy

Layout

Similar Plugins

caveman

ui-design

llm-council-plugin

self-improving-agent

More by curryfromuestc

academic-paper

repo-digest

chipdev-method

curryTrain

What it is

Six-stage methodology

Components

Installation

Option A — Claude Code marketplace (recommended)

Option B — Local development install

Option C — Per-session plugin dir

Quick start

Stack opinions (V1)

Philosophy

Layout

Popularity

Health & Quality

More by curryfromuestc

academic-paper

repo-digest

chipdev-method

Similar Plugins

caveman

ui-design

llm-council-plugin

self-improving-agent