From ds
Generates standardized model cards in HuggingFace and NVIDIA Model Card++ formats for ML models, covering details, intended uses, training data, metrics, limitations, and ethics. Use when preparing models for deployment or handoff.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ds:model-cardThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate a standardized model card that documents a trained ML model's purpose, performance, limitations, and ethical considerations. Based on HuggingFace Model Card format and NVIDIA Model Card++ extensions.
Generate a standardized model card that documents a trained ML model's purpose, performance, limitations, and ethical considerations. Based on HuggingFace Model Card format and NVIDIA Model Card++ extensions.
| Field | Description |
|---|---|
| Name | Human-readable model name |
| Version | Model version (e.g., v1.0.0) |
| Type | Algorithm family (e.g., gradient boosting, neural network, linear regression) |
| Framework | Library used (scikit-learn, statsmodels, aeon, xgboost, etc.) |
| Task | What the model does (classification, regression, forecasting, anomaly detection, etc.) |
| Date trained | When the model was last trained |
| Author | Who developed the model |
Document the model's intended use case clearly:
| Field | Description |
|---|---|
| Source | Where the training data comes from |
| Date range | Time period of training data |
| Size | Number of samples and features |
| Data hash | SHA-256 hash for version tracking |
| Preprocessing | Key transformations applied |
| Known biases | Any known biases in the training data |
| Field | Description |
|---|---|
| Source | Same or different from training? |
| Date range | Time period of evaluation data |
| Size | Number of samples |
| Split strategy | How train/eval was split |
Report performance metrics with context:
| Metric | Value | Baseline | Improvement | Confidence Interval |
|---|---|---|---|---|
| [Primary] | ||||
| [Secondary] |
Include:
Document known limitations honestly:
Provide concrete usage examples:
# Example: Loading and using the model
import joblib
model = joblib.load("path/to/model.pkl")
predictions = model.predict(X_new)
Include:
Before shipping, verify:
| Mistake | Impact | Fix |
|---|---|---|
| Vague limitations ("may not work for all data") | Users can't assess risk | Be specific: "Accuracy drops 15% on samples with >50% missing values" |
| Missing subgroup metrics | Hides fairness issues | Report metrics for all meaningful slices |
| No baseline comparison | Can't assess model value | Always include baseline performance |
| Outdated training data dates | Users assume data is fresh | Include data recency and staleness risk |
| Missing dependency versions | Can't recreate environment | Pin exact versions in requirements |
npx claudepluginhub andikarachman/data-science-plugin --plugin dsGenerates Model Cards per Mitchell et al. and HuggingFace standards, covering intended use, limitations, training data provenance, ethical considerations, and regulatory alignment (EU AI Act, NIST AI RMF).
Adds evaluation results to Hugging Face model cards. Extracts tables from README, imports scores from Artificial Analysis API, or runs custom evaluations with vLLM/lighteval. Updates model-index metadata for leaderboard compatibility.
Turns model work into production ML systems with data contracts, repeatable training, quality gates, deployable artifacts, and monitoring. Useful for ranking, search, recommendations, classifiers, forecasting, embeddings, LLMs, anomaly detection, and batch analytics.