From sagemaker-ai
Generates code for fine-tuning base models on SageMaker using SFT, DPO, RLVR, and RLAIF trainers. Activates on phrases like 'start training' or 'fine-tune my model'.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sagemaker-ai:finetuningThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before starting this workflow, verify:
code_templates/dpo.pycode_templates/rlaif_builtin.pycode_templates/rlaif_custom_prompt.pycode_templates/rlvr.pycode_templates/sft.pyreferences/code_output_guide.mdreferences/continuous_customization.mdreferences/eula_links.mdreferences/rlaif_guide.mdreferences/rlvr_reward_function.mdscripts/mlflow_reference.pytemplates/nova_rlvr_reward_function_source_template.pytemplates/rlvr_reward_function_source_template.pyBefore starting this workflow, verify:
A use_case_spec.md file exists
use-case-specification skill first, then resumeA fine-tuning technique (SFT, DPO, RLVR, RLAIF, or CPT/RFT (for Nova)) and base model have already been selected
model-selection and/or finetuning-technique skills to collect what's missing, then resumeA base model name available on SageMakerHub has been identified
model-selection skill to get itmodel-selection retrieves, as it may differ from other commonly used names for the same modelThe SDK environment has been verified (SDK version, region, execution role)
sdk-getting-started skill first, then resumeA training dataset uploaded to a bucket in the environment's default region.
run_cell is available, offer to run it. Otherwise, tell them to run cells one by one (mention ipykernel requirement).python3 <script>.py⏸ Wait for user.
Read references/code_output_guide.md for output format rules, then read the code template matching the finetuning strategy:
code_templates/sft.pycode_templates/dpo.pycode_templates/rlvr.pycode_templates/rlaif_builtin.pycode_templates/rlaif_custom_prompt.pyThe template is a Python file where each # Cell N: Label comment marks the start of a new section. Split on these markers — everything between one marker and the next becomes one unit of output.
code_output_guide.mdmeta-):
ACCEPT_EULA = False line from the config cellaccept_eula=ACCEPT_EULA, line from the trainer callmax_epochs or lr_warmup_steps_ratio from the Configure Trainer section and the Hyperparameter Overrides sectionIn the 'Setup & Credentials' cell, populate:
BASE_MODEL
MODEL_PACKAGE_GROUP_NAME
use_case_spec.md if needed)[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}customer-support-chatbot-v1Save notebook
references/rlvr_reward_function.md section "Helping Users Create Custom Reward Functions"CUSTOM_REWARD_FUNCTION in the Notebook with the ARN of the reward function (either given directly by the user, or from the function generation code as evaluator.arn).Read references/rlaif_guide.md and follow its instructions.
meta-)
ACCEPT_EULA = True and uncomment accept_eula=ACCEPT_EULA in the generated notebook. If the user declines, leave ACCEPT_EULA = False and warn that training will fail without acceptance.ACCEPT_EULA variable and accept_eula parameter should already be omitted from the notebook (see Step 1.3).After generating the code, offer to run it. Training can take hours depending on your dataset and model.
Notebook mode: If run_cell is available, offer to run the cells. Otherwise tell the user to run cells themselves.
Script mode: Present the user with options:
"Would you like me to:
- Leave it to you — run with
python scripts/[script_name]- Run it and wait until it's done
- Start it but don't wait — we can check status later"
trainer.train(wait=True) blocks until complete. Report final status.wait=True to wait=False in the script, execute, report the training job name.Checking status:
describe-training-job --training-job-name NAME → TrainingJobStatus, FailureReason, SecondaryStatusTransitionslist-model-packages --model-package-group-name GROUP_NAME --sort-by CreationTime --sort-order Descending --max-results 1Showing results after completion:
scripts/mlflow_reference.py as the pattern to query MLflow metricsCRITICAL:
If the user wants to finetune a model they had already customized, follow the instructions in references/continuous_customization.md
rlvr_reward_function.md - Lambda reward function creation guide (RLVR only)templates/rlvr_reward_function_source_template.py - Lambda reward function source template for open-weights models (RLVR only)templates/nova_rlvr_reward_function_source_template.py - Lambda reward function source template for Nova 2.0 Lite (RLVR only)code_templates/sft.py - Complete notebook template for Supervised Fine-Tuning (OSS path)code_templates/dpo.py - Complete notebook template for Direct Preference Optimization (OSS path)code_templates/rlvr.py - Complete notebook template for Reinforcement Learning from Verifiable Rewards (OSS path)references/continuous_customization.md - Instructions on fine-tuning an already fine-tuned model.rlaif_guide.md - instructions on RLAIF finetuning optionsrlaif_builtin.py - Code template for RLAIF with built-in judge promptrlaif_custom_prompt.py - Code template for RLAIF with custom judge promptnpx claudepluginhub awslabs/agent-plugins --plugin sagemaker-aiTrains or fine-tunes language/vision models using TRL or Unsloth on Hugging Face Jobs cloud GPUs. Supports SFT, DPO, GRPO, reward modeling, and GGUF export for local deployment.
Generates deployment code for fine-tuned models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock. Identifies Nova vs OSS pathway and handles endpoint configuration.
Trains or fine-tunes TRL language models (SFT, DPO, GRPO) on Hugging Face Jobs cloud GPUs with GGUF export. Useful when users request model training without local GPU setup.