Cost estimation scripts and tools for calculating GPU hours, training costs, and inference pricing across Modal, Lambda Labs, and RunPod platforms. Use when estimating ML training costs, comparing platform pricing, calculating GPU hours, budgeting for ML projects, or when user mentions cost estimation, pricing comparison, GPU budgeting, training cost analysis, or inference cost optimization.
Limited to specific tools
Additional assets for this skill
This skill is limited to using the following tools:
README.mdSKILL_SUMMARY.mdexamples/inference-cost-estimate.mdexamples/training-cost-estimate.mdscripts/calculate-gpu-hours.shscripts/compare-platforms.shscripts/estimate-inference-cost.shscripts/estimate-training-cost.shtemplates/cost-breakdown.jsontemplates/platform-pricing.yamlPurpose: Provide production-ready cost estimation tools for ML training and inference across cloud GPU platforms (Modal, Lambda Labs, RunPod).
Activation Triggers:
Key Resources:
scripts/estimate-training-cost.sh - Calculate training costs based on model size, data, GPU typescripts/estimate-inference-cost.sh - Estimate inference costs for production workloadsscripts/calculate-gpu-hours.sh - Convert training parameters to GPU hoursscripts/compare-platforms.sh - Compare costs across Modal, Lambda, RunPodtemplates/cost-breakdown.json - Structured cost breakdown templatetemplates/platform-pricing.yaml - Up-to-date platform pricing dataexamples/training-cost-estimate.md - Example training cost calculationexamples/inference-cost-estimate.md - Example inference cost analysisGPU Options:
Free Credits:
Single GPU:
8x GPU Clusters:
Key Features:
Script: scripts/estimate-training-cost.sh
Usage:
bash scripts/estimate-training-cost.sh \
--model-size 7B \
--dataset-size 10000 \
--epochs 3 \
--gpu t4 \
--platform modal
Parameters:
--model-size: Model size (125M, 350M, 1B, 3B, 7B, 13B, 70B)--dataset-size: Number of training samples--epochs: Number of training epochs--batch-size: Training batch size (default: auto-calculated)--gpu: GPU type (t4, a10, a100-40gb, a100-80gb, h100)--platform: Cloud platform (modal, lambda, runpod)--peft: Use PEFT/LoRA (yes/no, default: no)--mixed-precision: Use FP16/BF16 (yes/no, default: yes)Output:
{
"model": "7B",
"dataset_size": 10000,
"epochs": 3,
"gpu": "T4",
"platform": "Modal",
"estimated_hours": 4.2,
"cost_breakdown": {
"compute_cost": 2.48,
"storage_cost": 0.05,
"total_cost": 2.53
},
"cost_optimizations": {
"with_peft": 1.26,
"savings_percentage": 50
},
"alternative_platforms": {
"lambda_a10": 1.30,
"runpod_t4": 2.40
}
}
Calculation Methodology:
Script: scripts/estimate-inference-cost.sh
Usage:
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 2 \
--gpu t4 \
--platform modal \
--deployment serverless
Parameters:
--requests-per-day: Expected daily requests--avg-latency: Average inference time (seconds)--gpu: GPU type--platform: Cloud platform--deployment: Deployment type (serverless, dedicated)--batch-inference: Batch requests (yes/no, default: no)Output:
{
"requests_per_day": 1000,
"requests_per_month": 30000,
"avg_latency_sec": 2,
"gpu": "T4",
"platform": "Modal Serverless",
"cost_breakdown": {
"daily_compute_seconds": 2000,
"daily_cost": 0.33,
"monthly_cost": 9.90,
"cost_per_request": 0.00033
},
"scaling_analysis": {
"requests_10k_day": 99.00,
"requests_100k_day": 990.00
},
"dedicated_alternative": {
"monthly_cost": 442.50,
"break_even_requests_day": 4500
}
}
Script: scripts/calculate-gpu-hours.sh
Usage:
bash scripts/calculate-gpu-hours.sh \
--model-params 7B \
--tokens-total 30M \
--gpu a100-40gb
Parameters:
--model-params: Model parameters (125M, 350M, 1B, 3B, 7B, 13B, 70B)--tokens-total: Total training tokens--gpu: GPU type--peft: Use PEFT (yes/no)--multi-gpu: Number of GPUs (default: 1)GPU Throughput Benchmarks:
T4 (16GB):
- 7B full fine-tune: 150 tokens/sec
- 7B with PEFT: 600 tokens/sec
A100 40GB:
- 7B full fine-tune: 800 tokens/sec
- 7B with PEFT: 3200 tokens/sec
- 13B with PEFT: 1600 tokens/sec
A100 80GB:
- 13B full fine-tune: 600 tokens/sec
- 70B with PEFT: 400 tokens/sec
H100:
- 70B with PEFT: 1200 tokens/sec
Script: scripts/compare-platforms.sh
Usage:
bash scripts/compare-platforms.sh \
--training-hours 4 \
--gpu-type a100-40gb
Output:
# Platform Cost Comparison
## Training Job: 4 hours on A100 40GB
| Platform | GPU Cost | Egress Fees | Total | Notes |
|----------|----------|-------------|-------|-------|
| Modal | $8.40 | $0.00 | $8.40 | Serverless, pay-per-second |
| Lambda | $5.16 | $0.00 | $5.16 | Cheapest for dedicated |
| RunPod | $8.00 | $0.00 | $8.00 | Pay-per-minute |
## Winner: Lambda Labs ($5.16)
Savings: $3.24 (38.6% vs Modal)
Recommendation: Use Lambda for long-running dedicated training, Modal for
serverless/bursty workloads.
Template: templates/cost-breakdown.json
{
"project_name": "ML Training Project",
"cost_estimate": {
"training": {
"model_size": "7B",
"training_runs": 4,
"hours_per_run": 4.2,
"gpu_type": "T4",
"platform": "Modal",
"cost_per_run": 2.48,
"total_training_cost": 9.92
},
"inference": {
"deployment_type": "serverless",
"expected_requests_month": 30000,
"gpu_type": "T4",
"platform": "Modal",
"monthly_cost": 9.90
},
"storage": {
"model_artifacts_gb": 14,
"dataset_storage_gb": 5,
"monthly_storage_cost": 0.50
},
"total_monthly_cost": 20.32,
"breakdown_percentage": {
"training": 48.8,
"inference": 48.7,
"storage": 2.5
}
},
"cost_optimizations_applied": {
"peft_lora": "50% training cost reduction",
"mixed_precision": "2x faster training",
"serverless_inference": "Pay only for actual usage",
"batch_inference": "Up to 10x reduction in inference cost"
},
"potential_savings": {
"without_optimizations": 45.00,
"with_optimizations": 20.32,
"total_savings": 24.68,
"savings_percentage": 54.8
}
}
Template: templates/platform-pricing.yaml
platforms:
modal:
billing: per-second
free_credits: 30 # USD per month
startup_credits: 50000 # USD for eligible startups
gpus:
t4:
price_per_sec: 0.000164
price_per_hour: 0.59
vram_gb: 16
a100_40gb:
price_per_sec: 0.000583
price_per_hour: 2.10
vram_gb: 40
h100:
price_per_sec: 0.001097
price_per_hour: 3.95
vram_gb: 80
lambda:
billing: per-hour
free_credits: 0
minimum_billing: 1-hour
gpus:
a10_1x:
price_per_hour: 0.31
vram_gb: 24
a100_40gb_1x:
price_per_hour: 1.29
vram_gb: 40
a100_40gb_8x:
price_per_hour: 10.32
total_vram_gb: 320
runpod:
billing: per-minute
free_credits: 0
features:
- zero_egress_fees
- flashboot_200ms
gpus:
t4:
price_per_hour: 0.60 # Approximate
vram_gb: 16
File: examples/training-cost-estimate.md
Scenario:
Cost Calculation:
bash scripts/estimate-training-cost.sh \
--model-size 7B \
--dataset-size 10000 \
--epochs 3 \
--gpu t4 \
--platform modal \
--peft yes
Results:
Training Time: 4.2 hours
Modal T4 Cost: $2.48
Alternative (Lambda A10): $1.30 (47% cheaper)
Optimization Impact:
- Without PEFT: $12.40 (5x more expensive)
- With PEFT: $2.48
- Savings: $9.92 (80%)
Recommendation: Use Lambda A10 for cheapest option, or Modal T4 for serverless convenience.
File: examples/inference-cost-estimate.md
Scenario:
Cost Calculation:
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 2 \
--gpu t4 \
--platform modal \
--deployment serverless
Current (1K requests/day):
Serverless Modal T4:
- Daily cost: $0.33
- Monthly cost: $9.90
- Cost per request: $0.00033
Dedicated Lambda A10:
- Monthly cost: $223 (24/7 instance)
- Break-even: 2,250 requests/day
- Not recommended for current traffic
After Growth (10K requests/day):
Serverless Modal T4:
- Monthly cost: $99.00
- Still cost-effective
Dedicated Lambda A10:
- Monthly cost: $223
- Break-even reached at 2,250 requests/day
- Recommendation: Stay serverless until 10K+ daily
Savings: 50-90% training cost reduction
# Calculate savings
bash scripts/estimate-training-cost.sh --model-size 7B --peft no
# Cost: $12.40
bash scripts/estimate-training-cost.sh --model-size 7B --peft yes
# Cost: $2.48
# Savings: $9.92 (80%)
Savings: 2x faster training, 50% cost reduction
Automatically enabled in cost estimations with --mixed-precision yes
Use Case Guidelines:
# Short jobs (<1 hour): Modal serverless
bash scripts/compare-platforms.sh --training-hours 0.5 --gpu-type t4
# Winner: Modal ($0.30 vs Lambda $0.31 minimum)
# Long jobs (4+ hours): Lambda dedicated
bash scripts/compare-platforms.sh --training-hours 4 --gpu-type a100-40gb
# Winner: Lambda ($5.16 vs Modal $8.40)
# Variable workloads: Modal serverless
# Pay only for actual usage, no idle cost
Savings: Up to 10x reduction in inference cost
# Single inference
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 2 \
--batch-inference no
# Cost: $9.90/month
# Batch inference (10 requests per batch)
bash scripts/estimate-inference-cost.sh \
--requests-per-day 1000 \
--avg-latency 0.3 \
--batch-inference yes
# Cost: $1.49/month
# Savings: $8.41 (85%)
Required for scripts:
# Bash 4.0+ (for associative arrays)
bash --version
# jq (for JSON processing)
sudo apt-get install jq
# bc (for floating-point calculations)
sudo apt-get install bc
# yq (for YAML processing)
pip install yq
Supported Platforms: Modal, Lambda Labs, RunPod GPU Types: T4, L4, A10, A100 (40GB/80GB), H100, H200, B200 Output Format: JSON cost breakdowns and markdown reports Version: 1.0.0