estimate-calibrator | armory

Stats

Actions

Tags

estimate-calibrator | armory

Estimate Calibrator

Replaces single-point guesses with structured three-point estimates: decomposes work into atomic units, estimates best/likely/worst case for each, identifies unknowns and assumptions, calculates aggregate ranges using PERT, and assigns confidence levels with explicit rationale.

Reference Files

File	Contents	Load When
`references/estimation-methods.md`	PERT formula, three-point estimation, Monte Carlo basics	Always
`references/unknown-categories.md`	Technical, scope, external, and organizational uncertainty types	Unknown identification
`references/calibration-tips.md`	Cognitive biases in estimation, historical calibration, buffer strategies	Always
`references/sizing-heuristics.md`	Common task size patterns, complexity indicators, reference class data	Quick sizing needed

Prerequisites

Work item description (feature, task, project)
Decomposed tasks (or use task-decomposer skill first)
Context: team familiarity, tech stack, existing codebase

Workflow

Phase 1: Decompose Work

If the work item is not already decomposed into atomic units:

Break into tasks — Each task should be estimable independently.
Right granularity — Tasks should be 1 hour to 3 days. Larger tasks have higher uncertainty; break them down further.
Identify dependencies — Tasks on the critical path determine the minimum duration.

Phase 2: Three-Point Estimate

For each task, estimate three scenarios:

Scenario	Definition	Mindset
Best case	Everything goes right. No surprises.	"If I've done this exact thing before"
Likely case	Normal friction. Some minor obstacles.	"Realistic expectation with typical setbacks"
Worst case	Significant problems. Not catastrophic.	"Murphy's law but not a disaster"

Key rule: Worst case is NOT "everything goes wrong." It's the realistic bad scenario (90th percentile), not the apocalyptic one (99th percentile).

Phase 3: Identify Unknowns

Categorize unknowns that affect estimates:

Category	Example	Impact
Technical	"Never used this library before"	Likely case inflated, worst case much higher
Scope	"Requirements may change"	All estimates may shift
External	"Depends on API access from partner"	Blocking risk — could delay entirely
Integration	"Haven't tested with production data"	Hidden complexity at integration
Organizational	"Need design approval"	Calendar time, not effort time

Phase 4: Calculate Ranges

For individual tasks, use the PERT formula:

Expected = (Best + 4 × Likely + Worst) / 6
Std Dev = (Worst - Best) / 6

For aggregate (project) estimates:

Sum of expected values for total expected duration
Root sum of squares of std devs for aggregate uncertainty

Phase 5: Assign Confidence

Confidence	Meaning	When
High	Likely case within ±20%	Well-understood task, team has done it before
Medium	Likely case within ±50%	Some unknowns, moderate familiarity
Low	Likely case within ±100% or more	Significant unknowns, new technology

Output Format

## Estimate: {Work Item}

### Summary
| Scenario | Duration |
|----------|----------|
| Best case | {time} |
| Likely case | {time} |
| Worst case | {time} |
| **PERT expected** | **{time}** |
| **Confidence** | **{High/Medium/Low}** |

### Task-Level Estimates

| # | Task | Best | Likely | Worst | PERT | Unknowns |
|---|------|------|--------|-------|------|----------|
| 1 | {task} | {time} | {time} | {time} | {time} | {key unknown or "None"} |
| 2 | {task} | {time} | {time} | {time} | {time} | {key unknown} |
| | **Total** | **{sum}** | **{sum}** | **{sum}** | **{pert}** | |

### Key Unknowns

| # | Unknown | Category | Impact on Estimate | Mitigation |
|---|---------|----------|-------------------|------------|
| 1 | {unknown} | {Technical/Scope/External} | +{time} if realized | {spike, prototype, early test} |

### Assumptions
- {Assumption 1 — what must be true for this estimate to hold}
- {Assumption 2}

### Risk Factors
- {Risk}: If realized, adds {time}. Likelihood: {High/Medium/Low}.

### Confidence Rationale
**{High/Medium/Low}** because:
- {Specific reason — e.g., "Team has built 3 similar features"}
- {Specific reason — e.g., "External API is a new integration"}

### Recommendation
{Commit to PERT expected with {X}% buffer, or spike the top unknown first.}

Calibration Rules

Three points, not one. Single-point estimates are always wrong. Three points communicate uncertainty — the most important part of any estimate.
Worst case is the 90th percentile, not the 99th. "Asteroid hits the office" is not a useful worst case. "The API documentation is wrong and we need to reverse-engineer the protocol" is realistic worst case.
Unknowns inflate estimates more than known difficulty. A hard but well-understood task is more predictable than an easy but novel one.
Estimates are not commitments. Communicate ranges, not deadlines. If stakeholders need a single number, give the PERT expected plus a buffer for confidence level.
Spike unknowns early. If a single unknown dominates the estimate range, invest 1-2 days spiking it before estimating the rest.

Error Handling

Problem	Resolution
Work item not decomposed	Decompose into 3-8 tasks first (or suggest task-decomposer skill).
No historical reference	Estimate relative to a known task: "This is about 2x the auth feature."
Stakeholder wants a single number	Provide PERT expected with buffer matching confidence level (High: +20%, Medium: +50%, Low: +100%).
Estimate seems too large	Check for scope creep in task list. Remove non-essential tasks. Identify what can be deferred.
Team has never done this type of work	Mark confidence as Low. Recommend a spike before committing to an estimate.

When NOT to Estimate

Push back if:

The work is exploratory (research, spikes) — timebox instead of estimating
Requirements are completely undefined — define scope first
The user wants precision (hours) for a large project — provide ranges, not false precision
The estimate will be used as a commitment without acknowledging uncertainty