From submit-slurm-job
Generates and submits sbatch scripts for GPU compute jobs on Slurm clusters. Handles partition, GPU types (A100_40G, V100, A800), node selection, Python paths, and cluster rules.
How this skill is triggered — by the user, by Claude, or both
Slash command
/submit-slurm-job:submit-slurm-jobThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate and submit sbatch scripts for GPU compute jobs on the cluster. Handles all cluster-specific details: partition, GPU types, node selection, python path.
Generate and submit sbatch scripts for GPU compute jobs on the cluster. Handles all cluster-specific details: partition, GPU types, node selection, python path.
Before using this skill, set the following in your project's CLAUDE.md or environment:
| Variable | Example | Description |
|---|---|---|
PYTHON_PATH | /path/to/miniconda3/envs/myenv/bin/python3 | Full path to Python interpreter |
PROJECT_DIR | /home/user_xxx/private/homefile | Must be under ~/private/homefile; Slurm submission is only allowed from this path |
PARTITION | home | Slurm partition name (the only compute partition) |
Cluster rule: scripts and submissions must live under ~/private/homefile. Data should live under ~/private/datafile. Run sbatch/srun only after cd ~/private/homefile/....
Ask the user (with AskUserQuestion) what they want to run. Key parameters:
| Parameter | Default | Description |
|---|---|---|
job_name | (required) | Short job name for SBATCH |
gpu_type | (required) | GPU model, must be explicit (e.g., A100_40G, V100, A800) |
n_gpu | 1 | Number of GPUs |
time | 24:00:00 | Wall time limit |
mem | 32G | Memory |
cpus | 4 | CPUs per task |
script | (required) | Python script path (must live under ~/private/homefile) |
args | (required) | Script arguments |
output_dir | {PROJECT_DIR} | Directory for log files (keep under ~/private/homefile) |
--gres: use gpu:MODEL:N (e.g., gpu:A100_40G:1, gpu:A800:2). gpu:1 will be rejected.slurm_gpustat (cluster-provided wheel) or scontrol show nodes -o to see available GPU models.--nodelist=node.Important: ensure the script is written under ~/private/homefile/... and run sbatch from that directory (cluster enforcement).
Generate the sbatch script following this template:
#!/bin/bash
#SBATCH --partition={PARTITION} # home
#SBATCH --cpus-per-task={cpus}
#SBATCH --mem={mem}
#SBATCH --gres=gpu:{gpu_type}:{n_gpu}
#SBATCH --nodes=1
#SBATCH --time={time}
#SBATCH --job-name={job_name}
#SBATCH -o {output_dir}/{job_name}_%j.out
echo The current job ID is $SLURM_JOB_ID
echo Running on $SLURM_JOB_NODELIST
echo CUDA devices: $CUDA_VISIBLE_DEVICES
echo ==== Job started at `date` ====
nvidia-smi
echo
{PYTHON_PATH} \
{script} \
{args}
echo
echo ==== Job finished at `date` ====
Key rules:
home (the only compute partition).~/private/homefile/....{PYTHON_PATH} (do NOT conda activate).-o for combined stdout+stderr.--gres.--nodelist unless the user explicitly requests a specific node.Write the script to {output_dir}/{job_name}.sh (under ~/private/homefile), then submit with sbatch from the same directory.
After submission, report:
{output_dir}/{job_name}_{JOBID}.outsqueue -u $USER, tail -f {log_file}scancel {JOBID}scancel, verify you submitted from the correct project (~/private/homefile matching the web UI project) and that requested resources are within quota.If the user wants to submit multiple jobs (e.g., different datasets on different GPUs):
.sh scripts for each jobfor f in script1.sh script2.sh ...; do sbatch $f; donePut multiple python commands in a single script, separated by echo markers.
Create separate scripts, each requesting one GPU. Submit all scripts independently.
--nodelist for specific nodesOnly add #SBATCH --nodelist=n004 when the user explicitly wants a specific node. Otherwise let Slurm schedule based on GPU type availability.
npx claudepluginhub quantumbfs/claude-code-skills --plugin submit-slurm-jobGenerates correct SLURM sbatch job scripts with MPI/OpenMP layout guidance, resource validation, and conflict detection. Use when preparing cluster submissions or debugging job failures.
Provisions and manages on-demand/reserved GPU clusters (H100, H200, B200) on Together AI with Kubernetes or Slurm orchestration, shared storage, credentials, and scaling for ML/HPC workloads.
Submits Proteina-Complexa pipelines (binder search, monomer design, distributed training) to a remote SLURM cluster via bash launcher scripts. Always dry-runs before submitting and emits a replayable manifest.