Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
references/PLATFORM_COMPARISON.mdreferences/TROUBLESHOOTING.mdscripts/estimate_cost.pyscripts/train_sft.pyRun Unsloth training on RunPod GPU instances.
echo $RUNPOD_API_KEY (get at runpod.io/console/user/settings)pip install runpodfunsloth-train| GPU | VRAM | Cost | Best For |
|---|---|---|---|
| RTX 3090 | 24GB | ~$0.35/hr | Budget 7-14B |
| RTX 4090 | 24GB | ~$0.55/hr | Fast 7-14B |
| A100 40GB | 40GB | ~$1.50/hr | 14-34B |
| A100 80GB | 80GB | ~$2.00/hr | 70B |
| H100 | 80GB | ~$3.50/hr | Fastest |
RunPod typically has better prices than HF Jobs.
import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")
Allows: resume training, download checkpoints, share between pods.
import runpod
pod = runpod.create_pod(
name="funsloth-training",
image_name="runpod/pytorch:2.1.0-py3.10-cuda12.1.0-devel",
gpu_type_id="{gpu_type}",
volume_in_gb=50,
network_volume_id="{volume_id}",
env={"HF_TOKEN": "{token}", "WANDB_API_KEY": "{key}"},
)
# SSH into pod
ssh root@{pod_ip}
# Upload script
scp train.py root@{pod_ip}:/workspace/
# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach
# SSH monitoring
tail -f /workspace/training.log
nvidia-smi -l 1
# Dashboard
https://runpod.io/console/pods/{pod_id}
# Save to network volume
cp -r /workspace/outputs /runpod-volume/
# Download via SCP
scp -r root@{pod_ip}:/workspace/outputs ./
# Or push to HF Hub from pod
runpod.stop_pod(pod_id) # Can resume later
runpod.terminate_pod(pod_id) # Deletes pod, keeps volume
Offer funsloth-upload for Hub upload with model card.
save_steps| Error | Resolution |
|---|---|
| Pod creation failed | Try different GPU type or region |
| SSH refused | Wait 1-2 min, check IP |
| Out of disk | Increase volume or clean up |
| Volume not mounting | Check same region as pod |