Templates and patterns for common ML training scenarios including text classification, text generation, fine-tuning, and PEFT/LoRA. Provides ready-to-use training configurations, dataset preparation scripts, and complete training pipelines. Use when building ML training pipelines, fine-tuning models, implementing classification or generation tasks, setting up PEFT/LoRA training, or when user mentions model training, fine-tuning, classification, generation, or parameter-efficient tuning.
Limited to specific tools
Additional assets for this skill
This skill is limited to using the following tools:
README.mdexamples/sentiment-classifier.mdexamples/text-generator.mdscripts/setup-classification.shscripts/setup-fine-tuning.shscripts/setup-generation.shscripts/setup-peft.shtemplates/classification-config.yamltemplates/generation-config.yamltemplates/peft-config.jsonPurpose: Provide production-ready training templates, configuration files, and automation scripts for common ML training scenarios including classification, generation, fine-tuning, and PEFT/LoRA approaches.
Activation Triggers:
Key Resources:
scripts/setup-classification.sh - Classification training setup automationscripts/setup-generation.sh - Generation training setup automationscripts/setup-fine-tuning.sh - Full fine-tuning setup automationscripts/setup-peft.sh - PEFT/LoRA training setup automationtemplates/classification-config.yaml - Classification training configurationtemplates/generation-config.yaml - Generation training configurationtemplates/peft-config.json - PEFT/LoRA configurationexamples/sentiment-classifier.md - Complete sentiment classification exampleexamples/text-generator.md - Complete text generation exampleUse cases: Sentiment analysis, intent classification, topic categorization, spam detection, named entity recognition (NER)
Key characteristics:
Setup command:
./scripts/setup-classification.sh <project-name> <model-name> <num-classes>
Example:
./scripts/setup-classification.sh sentiment-model distilbert-base-uncased 3
Use cases: Summarization, question answering, chatbots, text completion, translation, code generation
Key characteristics:
Setup command:
./scripts/setup-generation.sh <project-name> <model-name> <generation-type>
Example:
./scripts/setup-generation.sh qa-bot t5-small question-answering
Use cases: When you have sufficient data and compute to retrain all model parameters
Key characteristics:
Setup command:
./scripts/setup-fine-tuning.sh <project-name> <model-name> <task-type>
Example:
./scripts/setup-fine-tuning.sh medical-classifier bert-base-uncased classification
Use cases: Limited compute resources, quick experimentation, domain adaptation with small datasets
Key characteristics:
Setup command:
./scripts/setup-peft.sh <project-name> <model-name> <peft-method>
Example:
./scripts/setup-peft.sh efficient-classifier roberta-base lora
File: templates/classification-config.yaml
Key parameters:
model:
name: distilbert-base-uncased
num_labels: 3
task_type: classification
dataset:
train_file: data/train.csv
validation_file: data/val.csv
test_file: data/test.csv
text_column: text
label_column: label
training:
output_dir: ./outputs
num_epochs: 3
batch_size: 16
learning_rate: 2e-5
warmup_steps: 500
weight_decay: 0.01
evaluation_strategy: epoch
save_strategy: epoch
logging_steps: 100
fp16: true # Mixed precision training
gradient_accumulation_steps: 1
optimizer:
name: adamw
betas: [0.9, 0.999]
epsilon: 1e-8
1. Dataset Preparation:
from datasets import load_dataset
# Load from CSV
dataset = load_dataset('csv', data_files={
'train': 'data/train.csv',
'validation': 'data/val.csv',
'test': 'data/test.csv'
})
# Preprocess
def preprocess(examples):
return tokenizer(
examples['text'],
truncation=True,
padding='max_length',
max_length=512
)
dataset = dataset.map(preprocess, batched=True)
2. Model Initialization:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
num_labels=num_classes,
id2label={0: 'negative', 1: 'neutral', 2: 'positive'},
label2id={'negative': 0, 'neutral': 1, 'positive': 2}
)
3. Training:
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir='./outputs',
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=32,
learning_rate=2e-5,
warmup_steps=500,
weight_decay=0.01,
evaluation_strategy='epoch',
save_strategy='epoch',
load_best_model_at_end=True,
metric_for_best_model='accuracy',
fp16=True, # Enable mixed precision
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset['train'],
eval_dataset=dataset['validation'],
compute_metrics=compute_metrics,
)
trainer.train()
4. Evaluation:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = predictions.argmax(axis=-1)
accuracy = accuracy_score(labels, predictions)
precision, recall, f1, _ = precision_recall_fscore_support(
labels, predictions, average='weighted'
)
return {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1': f1
}
# Evaluate on test set
results = trainer.evaluate(dataset['test'])
print(results)
File: templates/generation-config.yaml
Key parameters:
model:
name: t5-small
task_type: generation
generation_type: question-answering # or summarization, translation, etc.
dataset:
train_file: data/train.json
validation_file: data/val.json
input_column: question
target_column: answer
max_input_length: 512
max_target_length: 128
training:
output_dir: ./outputs
num_epochs: 5
batch_size: 8
learning_rate: 3e-4
warmup_steps: 1000
weight_decay: 0.01
evaluation_strategy: steps
eval_steps: 500
save_steps: 500
logging_steps: 100
fp16: true
gradient_accumulation_steps: 2
predict_with_generate: true
generation:
max_length: 128
min_length: 10
num_beams: 4
length_penalty: 2.0
early_stopping: true
no_repeat_ngram_size: 3
1. Dataset Preparation:
from datasets import load_dataset
# Load from JSON (question-answer pairs)
dataset = load_dataset('json', data_files={
'train': 'data/train.json',
'validation': 'data/val.json'
})
# Preprocess for seq2seq
def preprocess(examples):
inputs = tokenizer(
examples['question'],
max_length=512,
truncation=True,
padding='max_length'
)
# Tokenize targets
with tokenizer.as_target_tokenizer():
targets = tokenizer(
examples['answer'],
max_length=128,
truncation=True,
padding='max_length'
)
inputs['labels'] = targets['input_ids']
return inputs
dataset = dataset.map(preprocess, batched=True)
2. Model & Training:
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')
training_args = Seq2SeqTrainingArguments(
output_dir='./outputs',
num_train_epochs=5,
per_device_train_batch_size=8,
per_device_eval_batch_size=16,
learning_rate=3e-4,
predict_with_generate=True,
generation_max_length=128,
generation_num_beams=4,
fp16=True,
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=dataset['train'],
eval_dataset=dataset['validation'],
)
trainer.train()
3. Generation & Evaluation:
# Generate predictions
def generate_answer(question):
inputs = tokenizer(question, return_tensors='pt', max_length=512, truncation=True)
outputs = model.generate(
**inputs,
max_length=128,
num_beams=4,
length_penalty=2.0,
early_stopping=True
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Test
question = "What is machine learning?"
answer = generate_answer(question)
print(f"Q: {question}\nA: {answer}")
Traditional fine-tuning challenges:
PEFT/LoRA benefits:
File: templates/peft-config.json
{
"peft_type": "LORA",
"task_type": "SEQ_CLS",
"inference_mode": false,
"r": 8,
"lora_alpha": 16,
"lora_dropout": 0.1,
"target_modules": [
"query",
"key",
"value",
"dense"
],
"bias": "none",
"modules_to_save": ["classifier"]
}
Key parameters:
r: LoRA rank (lower = fewer parameters, typically 4-64)lora_alpha: Scaling factor (typically 2x rank)lora_dropout: Dropout for LoRA layers (0.05-0.1)target_modules: Which layers to apply LoRA (query, key, value, dense)1. Install PEFT:
pip install peft
2. Setup PEFT Model:
from transformers import AutoModelForSequenceClassification
from peft import get_peft_model, LoraConfig, TaskType
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
'roberta-base',
num_labels=3
)
# Configure LoRA
peft_config = LoraConfig(
task_type=TaskType.SEQ_CLS,
inference_mode=False,
r=8,
lora_alpha=16,
lora_dropout=0.1,
target_modules=['query', 'key', 'value', 'dense']
)
# Apply PEFT
model = get_peft_model(base_model, peft_config)
model.print_trainable_parameters()
# Output: trainable params: 296,448 || all params: 124,940,546 || trainable%: 0.237%
3. Training:
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir='./peft_outputs',
num_train_epochs=3,
per_device_train_batch_size=16, # Can use larger batch size!
learning_rate=1e-3, # Higher learning rate for PEFT
fp16=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset['train'],
eval_dataset=dataset['validation'],
)
trainer.train()
4. Save & Load Adapters:
# Save only LoRA adapters (tiny file, ~1-10MB)
model.save_pretrained('./lora_adapters')
# Load adapters later
from peft import PeftModel
base_model = AutoModelForSequenceClassification.from_pretrained('roberta-base', num_labels=3)
model = PeftModel.from_pretrained(base_model, './lora_adapters')
For even more memory efficiency with large models:
from transformers import BitsAndBytesConfig
# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# Load model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
'meta-llama/Llama-2-7b-hf',
quantization_config=bnb_config,
device_map='auto'
)
# Apply LoRA on top of quantized model
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
# Now can fine-tune 7B model on 16GB GPU!
cd /home/gotime2022/.claude/plugins/marketplaces/ai-dev-marketplace/plugins/ml-training/skills/training-patterns
./scripts/setup-classification.sh my-classifier distilbert-base-uncased 3
Creates:
Arguments:
project-name: Name of training projectmodel-name: HuggingFace model identifiernum-classes: Number of classification labels./scripts/setup-generation.sh my-generator t5-small summarization
Creates:
Arguments:
project-name: Name of training projectmodel-name: HuggingFace model identifiergeneration-type: summarization, question-answering, translation, etc../scripts/setup-fine-tuning.sh domain-model bert-base-uncased classification
Creates:
Arguments:
project-name: Name of training projectmodel-name: HuggingFace model identifiertask-type: classification or generation./scripts/setup-peft.sh efficient-trainer roberta-base lora
Creates:
Arguments:
project-name: Name of training projectmodel-name: HuggingFace model identifierpeft-method: lora, qlora, prefix-tuning, or adaptertext,label
"This product is amazing!",positive
"Terrible experience",negative
"It's okay, nothing special",neutral
[
{
"question": "What is the capital of France?",
"answer": "The capital of France is Paris."
},
{
"question": "How does photosynthesis work?",
"answer": "Photosynthesis is the process where plants convert light energy into chemical energy..."
}
]
from datasets import load_dataset
# Load from HuggingFace Hub
dataset = load_dataset('glue', 'sst2') # Sentiment classification
dataset = load_dataset('squad') # Question answering
dataset = load_dataset('cnn_dailymail', '3.0.0') # Summarization
# Load local files
dataset = load_dataset('csv', data_files='data.csv')
dataset = load_dataset('json', data_files='data.json')
Learning Rate:
Batch Size:
Epochs:
Warmup Steps:
Techniques:
Example:
from transformers import TrainingArguments
training_args = TrainingArguments(
fp16=True, # Mixed precision
gradient_checkpointing=True, # Memory optimization
gradient_accumulation_steps=4, # Effective batch size × 4
per_device_train_batch_size=4, # Small batch per GPU
)
Track these metrics:
Use Weights & Biases:
training_args = TrainingArguments(
report_to='wandb',
run_name='my-training-run',
)
from transformers import EarlyStoppingCallback
trainer = Trainer(
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
)
training_args = TrainingArguments(
save_strategy='epoch', # Save after each epoch
save_total_limit=3, # Keep only best 3 checkpoints
load_best_model_at_end=True, # Load best after training
metric_for_best_model='f1', # Choose best by F1 score
)
When: Testing ideas, limited compute, small datasets Approach: LoRA with small rank (r=4-8) Time: Minutes to 1 hour Memory: Can fine-tune 7B models on 16GB GPU
When: Production deployment, sufficient labeled data Approach: Full fine-tuning with early stopping Time: 1-6 hours Memory: 16GB GPU for base models (110M-340M params)
When: Adapting to specific domain, then task-specific fine-tuning Approach:
When: One model for multiple tasks Approach: Train separate LoRA adapters per task, swap at inference Time: 1-3 hours per task Memory: 16GB GPU, adapters are tiny (1-10MB each)
Out of Memory (OOM) Errors:
Training Not Converging:
Overfitting:
Slow Training:
Poor Evaluation Metrics:
Supported Models:
Requirements:
Best Practice: Start with PEFT/LoRA for quick iteration, switch to full fine-tuning only when necessary