Training manager for local GPU training - validate CUDA, manage GPU selection, monitor progress, handle checkpoints
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
notebooks/sft_template.ipynbreferences/HARDWARE_GUIDE.mdreferences/TROUBLESHOOTING.mdscripts/train_sft.pyRun Unsloth training on your local GPU.
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
If CUDA not available:
nvidia-sminvcc --versionpip install torch --index-url https://download.pytorch.org/whl/cu121See references/HARDWARE_GUIDE.md for requirements:
| VRAM | Recommended Setup |
|---|---|
| 8GB | 7B, 4-bit, batch=1, LoRA r=8 |
| 12GB | 7B, 4-bit, batch=2, LoRA r=16 |
| 16GB | 7-13B, 4-bit, batch=2, LoRA r=16-32 |
| 24GB | 7-14B, 4-bit, batch=4, LoRA r=32 |
pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes
jupyter notebook notebooks/sft_template.ipynb
# Edit configuration in script, then run
python scripts/train_sft.py
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use first GPU
# Watch GPU usage
watch -n 1 nvidia-smi
# Or use nvitop (more detailed)
pip install nvitop && nvitop
export WANDB_API_KEY="your-key"
# Add report_to="wandb" in TrainingArguments
Try in order:
torch.cuda.empty_cache()packing=True for short sequencesSee references/TROUBLESHOOTING.md for more solutions.
TrainingArguments(
resume_from_checkpoint=True, # Auto-find latest
# Or: resume_from_checkpoint="outputs/checkpoint-500"
)
Training script automatically saves:
outputs/lora_adapter/ - LoRA weightsoutputs/merged_16bit/ - Merged model (optional)from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter")
FastLanguageModel.for_inference(model)
messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Offer funsloth-upload for Hub upload with model card.
save_steps