Structured workflow for cloud cost analysis and optimization including rightsizing, reserved capacity planning, and FinOps practices.
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
name: ops-cost-optimization description: | Structured workflow for cloud cost analysis and optimization including rightsizing, reserved capacity planning, and FinOps practices.
trigger: |
skip_when: |
This skill defines the structured process for cloud cost optimization. Use it for systematic cost analysis and data-driven optimization.
| Phase | Focus | Output |
|---|---|---|
| 1. Cost Visibility | Understand current spend | Cost breakdown |
| 2. Anomaly Detection | Identify unusual spend | Anomaly report |
| 3. Optimization Analysis | Find savings opportunities | Opportunities list |
| 4. Risk Assessment | Evaluate optimization risks | Risk matrix |
| 5. Implementation | Execute optimizations | Cost reduction |
| 6. Monitoring | Track savings | Savings report |
Analyze costs across multiple dimensions:
| Dimension | Purpose | Tool |
|---|---|---|
| Service | Which AWS services cost most | Cost Explorer |
| Account | Which accounts spend most | Cost Explorer |
| Tag | Cost by team/project/environment | Cost Allocation Tags |
| Resource | Individual resource costs | Cost Explorer |
| Time | Cost trends over time | Cost Explorer |
## Cost Visibility Report
**Period:** [Month YYYY]
**Total Spend:** $XX,XXX
**Budget:** $XX,XXX
**Variance:** [+/-X%]
### Cost by Service
| Service | Cost | % of Total | MoM Change |
|---------|------|------------|------------|
| EC2 | $X,XXX | XX% | +X% |
| RDS | $X,XXX | XX% | +X% |
| S3 | $X,XXX | XX% | +X% |
| Data Transfer | $X,XXX | XX% | +X% |
| Other | $X,XXX | XX% | +X% |
### Cost by Environment
| Environment | Cost | % of Total |
|-------------|------|------------|
| Production | $X,XXX | XX% |
| Staging | $X,XXX | XX% |
| Development | $X,XXX | XX% |
### Cost by Team
| Team | Cost | % of Total |
|------|------|------------|
| Platform | $X,XXX | XX% |
| API | $X,XXX | XX% |
| Data | $X,XXX | XX% |
Minimum required tags for cost allocation:
| Tag | Purpose | Example |
|---|---|---|
Environment | Env separation | prod, staging, dev |
Team | Cost ownership | platform, api, data |
Service | Service identification | api-gateway, auth |
CostCenter | Financial allocation | CC-1234 |
| Rule | Threshold | Alert |
|---|---|---|
| Daily spend spike | >20% vs 7-day avg | Warning |
| Service cost jump | >50% vs last month | Critical |
| New service appears | Any new service >$100/day | Info |
| Tag coverage drop | <95% coverage | Warning |
When anomaly detected:
Identify the spike:
Check common causes:
Validate intentionality:
## Cost Anomaly Report
**Detected:** YYYY-MM-DD HH:MM
**Severity:** [Critical/Warning/Info]
### Anomaly Details
| Metric | Expected | Actual | Delta |
|--------|----------|--------|-------|
| Daily spend | $X,XXX | $X,XXX | +XX% |
### Investigation
**Root Cause:** [description]
**Contributing Factors:**
1. [Factor 1]
2. [Factor 2]
**Intentional:** [Yes/No]
### Action Required
- [ ] [Action if remediation needed]
- [ ] [Update budget if expected]
| Category | Typical Savings | Effort | Risk |
|---|---|---|---|
| Rightsizing | 20-40% | Low | Low |
| Reserved Capacity | 30-70% | Medium | Low-Medium |
| Spot Instances | 60-90% | Medium | Medium |
| Storage Tiering | 20-50% | Low | Low |
| Idle Resources | 100% | Low | None |
| Data Transfer | 10-30% | Medium | Low |
## Rightsizing Opportunities
### Underutilized Instances
| Instance | Type | Avg CPU | Avg Mem | Recommendation | Savings |
|----------|------|---------|---------|----------------|---------|
| api-prod-1 | m5.xlarge | 15% | 25% | m5.large | $70/mo |
| worker-2 | c5.2xlarge | 30% | 20% | c5.xlarge | $140/mo |
### Criteria Used
- CPU avg <40% over 14 days -> downsize candidate
- Memory avg <50% over 14 days -> downsize candidate
- Excluded: ASG instances (handled by ASG sizing)
## Reserved Instance Coverage
### Current Coverage
| Service | On-Demand | Reserved | Coverage |
|---------|-----------|----------|----------|
| EC2 | $5,000 | $3,000 | 38% |
| RDS | $2,000 | $0 | 0% |
| ElastiCache | $500 | $500 | 50% |
### RI Recommendations
| Resource Type | Term | Payment | Monthly Savings | Break-even |
|---------------|------|---------|-----------------|------------|
| 10x m5.large | 1 year | No upfront | $350 | 0 months |
| db.r5.xlarge | 1 year | Partial | $180 | 4 months |
### RI Purchase Criteria
- Stable workload for >80% of term
- Usage predictable for commitment period
- Consider convertible RIs for flexibility
## Idle Resources
### Unattached EBS Volumes
| Volume ID | Size | Cost/Month | Last Attached |
|-----------|------|------------|---------------|
| vol-xxx | 100GB | $10 | 90 days ago |
| vol-yyy | 500GB | $50 | Never |
### Unused Elastic IPs
| IP | Allocation ID | Associated | Cost/Month |
|----|---------------|------------|------------|
| x.x.x.x | eipalloc-xxx | No | $3.60 |
### Idle Load Balancers
| LB Name | Target Groups | Requests/Day | Cost/Month |
|---------|---------------|--------------|------------|
| old-api | 0 | 0 | $16.50 |
| Optimization | Risk Level | Potential Impact | Mitigation |
|---|---|---|---|
| Downsize instance | Low | Performance degradation | Monitor, quick rollback |
| Purchase RI | Low-Medium | Unused commitment | Convertible RIs |
| Spot instances | Medium | Instance interruption | Diversify, checkpointing |
| Delete idle | None-Low | Lost data (if EBS) | Snapshot first |
| Storage tiering | Low | Retrieval latency | Test access patterns |
| Priority | Criteria | Examples |
|---|---|---|
| Quick Wins | Low effort, no risk, immediate savings | Delete idle resources |
| High Impact | Significant savings, manageable risk | RI purchases |
| Medium Impact | Moderate savings, requires planning | Rightsizing |
| Long-term | Architectural changes | Spot migration |
## Savings Report
**Period:** [Month YYYY]
**Target Savings:** $X,XXX
**Actual Savings:** $X,XXX
**Achievement:** XX%
### Savings by Category
| Category | Target | Actual | Status |
|----------|--------|--------|--------|
| Rightsizing | $500 | $450 | 90% |
| Reserved Instances | $2,000 | $2,100 | 105% |
| Idle Resources | $200 | $200 | 100% |
### Monthly Trend
| Month | Spend | Savings | Cumulative |
|-------|-------|---------|------------|
| Jan | $50,000 | $0 | $0 |
| Feb | $48,000 | $2,000 | $2,000 |
| Mar | $47,500 | $2,500 | $4,500 |
| Rationalization | Why It's WRONG | Required Action |
|---|---|---|
| "Small savings not worth it" | Small savings compound | Evaluate ALL opportunities |
| "RIs are too risky" | RI risk is manageable | Analyze stable workloads |
| "Dev doesn't need optimization" | Dev is often 30%+ of cost | Optimize ALL environments |
| "Can't predict future usage" | Historical data helps | Use data-driven forecasting |
| "Optimization takes too much time" | ROI on optimization is high | Invest in systematic process |
| User Says | Your Response |
|---|---|
| "Just cut costs by 30%" | "Cannot proceed without analysis. Blind cuts cause outages. Will provide data-driven recommendations." |
| "Skip the analysis, buy RIs" | "RI purchases require usage analysis. Wrong RIs waste money. Analysis required first." |
| "Dev environment is fine as-is" | "Dev costs are significant. Optimization applies to all environments." |
For cost optimization tasks, dispatch:
Task tool:
subagent_type: "cloud-cost-optimizer"
model: "opus"
prompt: |
COST ANALYSIS REQUEST
Scope: [accounts/services to analyze]
Period: [time range]
Focus: [rightsizing/RI/general optimization]
Constraints: [budget targets, risk tolerance]