WHEN: Machine Learning/Deep Learning code review, PyTorch/TensorFlow patterns, Model training optimization, MLOps checks WHAT: Model architecture review + Training patterns + Data pipeline checks + GPU optimization + Experiment tracking WHEN NOT: Data analysis only → python-data-reviewer, General Python → python-reviewer
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.
torch, tensorflow, keras, sklearn in requirements.txt/pyproject.toml.pt, .pth, .h5, .pkl model filestrain.py, model.py, dataset.py files**Framework**: PyTorch / TensorFlow / scikit-learn
**Python**: 3.10+
**CUDA**: 11.x / 12.x
**Task**: Classification / Regression / NLP / CV
**Stage**: Research / Production
AskUserQuestion:
"Which areas to review?"
Options:
- Full ML pattern check (recommended)
- Model architecture review
- Training loop optimization
- Data pipeline efficiency
- MLOps/deployment patterns
multiSelect: true
| Check | Recommendation | Severity |
|---|---|---|
| Missing model.eval() | Inconsistent inference | HIGH |
| Missing torch.no_grad() | Memory leak in inference | HIGH |
| In-place operations in autograd | Gradient computation error | CRITICAL |
| DataLoader num_workers=0 | CPU bottleneck | MEDIUM |
| Missing gradient clipping | Exploding gradients | MEDIUM |
# BAD: Missing eval() and no_grad()
def predict(model, x):
return model(x) # Dropout/BatchNorm inconsistent!
# GOOD: Proper inference mode
def predict(model, x):
model.eval()
with torch.no_grad():
return model(x)
# BAD: In-place operation breaking autograd
x = torch.randn(10, requires_grad=True)
x += 1 # In-place! Breaks gradient computation
# GOOD: Out-of-place operation
x = torch.randn(10, requires_grad=True)
x = x + 1
# BAD: DataLoader bottleneck
loader = DataLoader(dataset, batch_size=32) # num_workers=0
# GOOD: Parallel data loading
loader = DataLoader(
dataset,
batch_size=32,
num_workers=4,
pin_memory=True, # For GPU
persistent_workers=True,
)
# BAD: No gradient clipping
optimizer.step()
# GOOD: Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
| Check | Recommendation | Severity |
|---|---|---|
| Missing @tf.function | Performance loss | MEDIUM |
| Eager mode in production | Slow inference | HIGH |
| Large model in memory | OOM risk | HIGH |
| Missing mixed precision | Training inefficiency | MEDIUM |
# BAD: No @tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
pred = model(x)
loss = loss_fn(y, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# GOOD: Use @tf.function
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
pred = model(x, training=True)
loss = loss_fn(y, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# BAD: Missing mixed precision
model.fit(x_train, y_train, epochs=10)
# GOOD: Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
model.fit(x_train, y_train, epochs=10)
| Check | Recommendation | Severity |
|---|---|---|
| fit_transform on test data | Data leakage | CRITICAL |
| Missing cross-validation | Overfitting risk | HIGH |
| No feature scaling | Model performance | MEDIUM |
| Hardcoded random_state | Reproducibility | LOW |
# BAD: Data leakage
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test) # LEAK! Re-fitting
# GOOD: transform only on test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # No re-fit
# BAD: No cross-validation
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
# GOOD: Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})")
# BAD: Pipeline without scaling
model = LogisticRegression()
model.fit(X_train, y_train)
# GOOD: Use Pipeline with scaling
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)
| Check | Problem | Solution |
|---|---|---|
| Loading full dataset to memory | OOM | Use generators/tf.data |
| No data augmentation | Overfitting | Add augmentation |
| Unbalanced classes | Biased model | Oversample/undersample/weights |
| No validation split | No early stopping | Use validation set |
# BAD: Full dataset in memory
images = []
for path in all_image_paths:
images.append(load_image(path)) # OOM for large datasets!
# GOOD: Use generator
def data_generator(paths, batch_size):
for i in range(0, len(paths), batch_size):
batch_paths = paths[i:i+batch_size]
yield np.array([load_image(p) for p in batch_paths])
# GOOD: Use tf.data
dataset = tf.data.Dataset.from_tensor_slices(paths)
dataset = dataset.map(load_and_preprocess)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)
# BAD: No class weights for imbalanced data
model.fit(X_train, y_train)
# GOOD: Add class weights
from sklearn.utils.class_weight import compute_class_weight
weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
class_weights = dict(enumerate(weights))
model.fit(X_train, y_train, class_weight=class_weights)
| Check | Recommendation | Severity |
|---|---|---|
| CPU tensor operations | Use GPU tensors | HIGH |
| Frequent GPU-CPU transfer | Batch transfers | HIGH |
| No gradient accumulation | OOM for large batch | MEDIUM |
| Missing torch.cuda.empty_cache() | Memory fragmentation | LOW |
# BAD: CPU operations
x = torch.randn(1000, 1000)
y = torch.randn(1000, 1000)
z = x @ y # CPU computation
# GOOD: GPU operations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
z = x @ y # GPU computation
# BAD: Frequent CPU-GPU transfer
for x, y in dataloader:
x = x.cuda()
y = y.cuda()
loss = model(x, y)
print(loss.item()) # Sync every iteration!
# GOOD: Batch logging
losses = []
for x, y in dataloader:
x, y = x.to(device), y.to(device)
loss = model(x, y)
losses.append(loss)
if step % log_interval == 0:
print(torch.stack(losses).mean().item())
# Gradient accumulation for large effective batch
accumulation_steps = 4
for i, (x, y) in enumerate(dataloader):
loss = model(x, y) / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
| Check | Recommendation | Severity |
|---|---|---|
| No experiment tracking | Reproducibility | HIGH |
| Hardcoded hyperparameters | Config management | MEDIUM |
| No model versioning | Deployment issues | MEDIUM |
| Missing seed setting | Non-reproducible | HIGH |
# BAD: No seed setting
model = train_model(X, y)
# GOOD: Set all seeds
import random
import numpy as np
import torch
def set_seed(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
set_seed(42)
# BAD: Hardcoded hyperparameters
lr = 0.001
batch_size = 32
epochs = 100
# GOOD: Use config file or hydra
import hydra
from omegaconf import DictConfig
@hydra.main(config_path="configs", config_name="train")
def train(cfg: DictConfig):
model = build_model(cfg.model)
optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)
# GOOD: Use experiment tracking
import wandb
wandb.init(project="my-project", config=cfg)
for epoch in range(epochs):
loss = train_epoch(model, dataloader)
wandb.log({"loss": loss, "epoch": epoch})
wandb.finish()
## ML Code Review Results
**Project**: [name]
**Framework**: PyTorch/TensorFlow/scikit-learn
**Task**: Classification/Regression/NLP/CV
**Files Analyzed**: X
### Model Architecture
| Status | File | Issue |
|--------|------|-------|
| MEDIUM | models/resnet.py | Missing dropout for regularization |
| LOW | models/transformer.py | Consider gradient checkpointing |
### Training Loop
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | Missing model.eval() in validation (line 45) |
| HIGH | train.py | No gradient clipping (line 67) |
### Data Pipeline
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | data/dataset.py | fit_transform on test data (line 23) |
| HIGH | data/loader.py | DataLoader num_workers=0 |
### MLOps
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | No seed setting for reproducibility |
| MEDIUM | train.py | Hardcoded hyperparameters |
### Recommended Actions
1. [ ] Add model.eval() and torch.no_grad() for inference
2. [ ] Fix data leakage in preprocessing
3. [ ] Set random seeds for reproducibility
4. [ ] Add experiment tracking (wandb/mlflow)
python-reviewer skill: General Python code qualitypython-data-reviewer skill: Data preprocessing patternstest-generator skill: ML test generationdocker-reviewer skill: ML containerization