PolicyEngine variable patterns - variable creation, no hard-coding principle, federal/state separation, metadata standards
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Essential patterns for creating PolicyEngine variables for government benefit programs.
The law defines WHAT to implement. These patterns are just HOW to implement it.
1. READ the legal code/policy manual FIRST
2. UNDERSTAND what the law actually says
3. IMPLEMENT exactly what the law requires
4. USE these patterns as tools to implement correctly
Patterns are tools, not rules to blindly follow:
Every implementation decision should trace back to a specific legal citation.
CRITICAL: PolicyEngine uses single-period simulation architecture
The following CANNOT be implemented and should be SKIPPED when found in documentation:
Cannot simulate:
Why: Requires tracking benefit history across multiple periods. PolicyEngine simulates one period at a time with no state persistence.
What to do: Document in comments but DON'T parameterize or implement:
# NOTE: [State] has [X]-month lifetime limit on [Program] benefits
# This cannot be simulated in PolicyEngine's single-period architecture
Cannot simulate:
Why: Requires historical data from previous periods.
Cannot simulate:
Why: Requires tracking application dates and eligibility history.
Cannot simulate:
Why: Requires tracking violation history.
Cannot simulate:
Why: Requires tracking expenses and resources across periods.
PolicyEngine CAN simulate point-in-time eligibility and benefits:
Special Case: Time-limited deductions/disregards
When a deduction or disregard is only available for X months:
Example:
class state_tanf_countable_earned_income(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
earned = spm_unit("tanf_gross_earned_income", period)
# NOTE: In reality, this 75% disregard only applies for first 4 months
# of employment. PolicyEngine cannot track employment duration, so we
# apply the disregard assuming the household qualifies.
# Actual rule: [State Code Citation]
disregard_rate = p.earned_income_disregard_rate # 0.75
return earned * (1 - disregard_rate)
Rule: If it requires history or future tracking, it CANNOT be fully simulated - but implement what we can and document limitations
Every numeric value MUST be parameterized
❌ FORBIDDEN:
return where(eligible, 1000, 0) # Hard-coded 1000
age < 15 # Hard-coded 15
benefit = income * 0.33 # Hard-coded 0.33
month >= 10 and month <= 3 # Hard-coded months
✅ REQUIRED:
return where(eligible, p.maximum_benefit, 0)
age < p.age_threshold.minor_child
benefit = income * p.benefit_rate
month >= p.season.start_month
Acceptable literals:
0, 1, -1 for basic math12 for month conversion (/ 12, * 12)Delete the file rather than leave placeholders
❌ NEVER:
def formula(entity, period, parameters):
# TODO: Implement
return 75 # Placeholder
✅ ALWAYS:
# Complete implementation or no file at all
adds or add() - NEVER Manual AdditionCRITICAL: NEVER manually fetch variables and add them with +. Always use adds or add().
adds attribute (no formula)❌ WRONG - Writing a formula for simple sum:
class tx_tanf_gross_income(Variable):
def formula(spm_unit, period, parameters):
earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
return earned + unearned # DON'T DO THIS!
✅ CORRECT - Use adds, no formula needed:
class tx_tanf_gross_income(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
adds = ["tanf_gross_earned_income", "tanf_gross_unearned_income"]
# NO formula method - adds handles it automatically!
add() function❌ WRONG - Manual fetching and adding:
def formula(spm_unit, period, parameters):
earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
gross = earned + unearned # DON'T manually add!
return gross * p.rate
✅ CORRECT - Use add() function:
def formula(spm_unit, period, parameters):
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return gross * p.rate
Decision rule:
adds = [...] (no formula)add() function inside formulaSee policyengine-aggregation-skill for detailed patterns.
Follow established patterns:
class il_tanf_countable_earned_income(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
label = "Illinois TANF countable earned income"
unit = USD
reference = "https://www.law.cornell.edu/regulations/illinois/..."
defined_for = StateCode.IL
# Use adds for simple sums
adds = ["il_tanf_earned_income_after_disregard"]
Key rules:
reference (clickable)#page=XX() not list []documentation field - use reference instead❌ WRONG - Don't use documentation field:
class some_variable(Variable):
documentation = "This is the wrong field" # DON'T USE THIS
✅ CORRECT - Use reference field:
class some_variable(Variable):
reference = "https://example.gov/rules.pdf#page=10" # USE THIS
Reference format:
# Single reference:
reference = "https://oregon.gov/dhs/tanf-manual.pdf#page=23"
# Multiple references - use TUPLE ():
reference = (
"https://oregon.public.law/rules/oar_461-155-0030",
"https://oregon.gov/dhs/tanf-manual.pdf#page=23",
)
# ❌ WRONG - Don't use list []:
reference = [
"https://...",
"https://...",
]
adds vs formulaCRITICAL: Never use both adds/subtracts AND a custom formula in the same variable!
This causes bugs when the two get out of sync. Choose one approach:
❌ FORBIDDEN - Mixing compositional and formula:
class household_net_income(Variable):
subtracts = ["employee_pension_contributions"] # ❌ Has subtracts
def formula(household, period): # ❌ AND has formula
gross = household("household_gross_income", period)
tax = household("income_tax", period)
# BUG: Forgot to subtract employee_pension_contributions!
return gross - tax
Use adds/subtracts when:
✅ BEST - Pure compositional:
class tanf_gross_income(Variable):
adds = ["employment_income", "self_employment_income"]
✅ BEST - Compositional with subtracts:
class household_net_income(Variable):
adds = ["household_gross_income"]
subtracts = ["income_tax", "employee_pension_contributions"]
Use formula when:
✅ CORRECT - Pure formula:
def formula(entity, period, parameters):
income = add(entity, period, ["income1", "income2"])
return max_(0, income) # Need max_
MOST IMPORTANT: Always check the state's legal code or policy manual for the exact calculation order. The pattern below is typical but not universal.
The Typical Pattern:
max_() to prevent negative earned incomeThis pattern is based on how MOST TANF programs work, but you MUST verify with the specific state's legal code.
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
deductions = spm_unit("tanf_earned_income_deductions", period)
# ❌ WRONG: Deductions applied to total income
total_income = gross_earned + unearned
countable = total_income - deductions
return max_(countable, 0)
Why this is wrong:
Example error:
max_($100 + $500 - $200, 0) = $400 (reduces unearned!)max_($100 - $200, 0) + $500 = $500def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
deductions = spm_unit("tanf_earned_income_deductions", period)
# ✅ CORRECT: Deductions applied to earned only, then add unearned
return max_(gross_earned - deductions, 0) + unearned
With multiple deduction steps:
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
# Step 1: Apply work expense deduction
work_expense = min_(gross_earned * p.work_expense_rate, p.work_expense_max)
after_work_expense = max_(gross_earned - work_expense, 0)
# Step 2: Apply earnings disregard
earnings_disregard = after_work_expense * p.disregard_rate
countable_earned = max_(after_work_expense - earnings_disregard, 0)
# Step 3: Add unearned (no deductions applied)
return countable_earned + unearned
With disregard percentage (simplified):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.tanf.income
gross_earned = spm_unit("tanf_gross_earned_income", period)
unearned = spm_unit("tanf_gross_unearned_income", period)
# Apply disregard to earned (keep 33% = disregard 67%)
countable_earned = gross_earned * (1 - p.earned_disregard_rate)
return max_(countable_earned, 0) + unearned
Some states DO have unearned income deductions (rare). Handle separately:
def formula(spm_unit, period, parameters):
gross_earned = spm_unit("tanf_gross_earned_income", period)
gross_unearned = spm_unit("tanf_gross_unearned_income", period)
earned_deductions = spm_unit("tanf_earned_income_deductions", period)
unearned_deductions = spm_unit("tanf_unearned_income_deductions", period)
# Apply each type of deduction to its respective income type
countable_earned = max_(gross_earned - earned_deductions, 0)
countable_unearned = max_(gross_unearned - unearned_deductions, 0)
return countable_earned + countable_unearned
Standard TANF pattern:
Countable Income = max_(Earned - Earned Deductions, 0) + Unearned
NOT:
❌ max_(Earned + Unearned - Deductions, 0)
❌ max_(Earned - Deductions + Unearned, 0) # Can go negative
Location: /parameters/gov/{agency}/
Location: /parameters/gov/states/{state}/
# Federal: parameters/gov/hhs/fpg/base.yaml
first_person: 14_580
# State: parameters/gov/states/ca/scale_factor.yaml
fpg_multiplier: 2.0 # 200% of FPG
❌ ANTI-PATTERN: Copy-pasting calculations
# File 1: calculates income after deduction
def formula(household, period, parameters):
gross = add(household, period, ["income"])
deduction = p.deduction * household.nb_persons()
return max_(gross - deduction, 0)
# File 2: DUPLICATES same calculation
def formula(household, period, parameters):
gross = add(household, period, ["income"]) # Copy-pasted
deduction = p.deduction * household.nb_persons() # Copy-pasted
after_deduction = max_(gross - deduction, 0) # Copy-pasted
return after_deduction < p.threshold
✅ CORRECT: Reuse existing variables
# File 2: reuses calculation
def formula(household, period, parameters):
countable_income = household("program_countable_income", period)
return countable_income < p.threshold
When to create intermediate variables:
MANDATORY before implementing any TANF:
/variables/gov/states/dc/dhs/tanf//variables/gov/states/il/dhs/tanf//variables/gov/states/tx/hhs/tanf/Learn from them:
adds vs formulatanf/
├── eligibility/
│ ├── demographic_eligible.py
│ ├── income_eligible.py
│ └── eligible.py
├── income/
│ ├── earned/
│ ├── unearned/
│ └── countable_income.py
└── [state]_tanf.py
For simplified implementations:
DON'T create state-specific versions of:
❌ DON'T CREATE:
ca_tanf_demographic_eligible_person.py
ca_tanf_gross_earned_income.py
parameters/.../income/sources/earned.yaml
✅ DO USE:
# Federal demographic eligibility
is_demographic_tanf_eligible
# Federal income aggregation
tanf_gross_earned_income
Golden Rule: Only create a state variable if you're adding state-specific logic to it!
When studying reference implementations:
Before creating ANY state-specific variable, ask:
Decision:
Even without state-specific logic, create a variable if the SAME calculation is used in multiple places.
❌ Bad - Duplicating calculation across variables:
# Variable 1 - Income eligibility
class mo_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
# Duplicated calculation
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return gross <= p.income_limit
# Variable 2 - Countable income
class mo_tanf_countable_income(Variable):
def formula(spm_unit, period, parameters):
# SAME calculation repeated!
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
deductions = spm_unit("mo_tanf_deductions", period)
return max_(gross - deductions, 0)
# Variable 3 - Need standard
class mo_tanf_need_standard(Variable):
def formula(spm_unit, period, parameters):
# SAME calculation AGAIN!
gross = add(spm_unit, period, ["tanf_gross_earned_income", "tanf_gross_unearned_income"])
return where(gross < p.threshold, p.high, p.low)
✅ Good - Extract into reusable intermediate variable:
# Intermediate variable - used in multiple places
class mo_tanf_gross_income(Variable):
adds = ["tanf_gross_earned_income", "tanf_gross_unearned_income"]
# Variable 1 - Reuses intermediate
class mo_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
return gross <= p.income_limit
# Variable 2 - Reuses intermediate
class mo_tanf_countable_income(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
deductions = spm_unit("mo_tanf_deductions", period)
return max_(gross - deductions, 0)
# Variable 3 - Reuses intermediate
class mo_tanf_need_standard(Variable):
def formula(spm_unit, period, parameters):
gross = spm_unit("mo_tanf_gross_income", period) # Reuse
return where(gross < p.threshold, p.high, p.low)
When to create intermediate variables for reuse:
When NOT to create (still a wrapper):
❌ INVALID - Pure wrapper, no state logic:
class in_tanf_assistance_unit_size(Variable):
def formula(spm_unit, period):
return spm_unit("spm_unit_size", period) # Just returns federal
❌ INVALID - Aggregation without transformation:
class in_tanf_countable_unearned_income(Variable):
def formula(tax_unit, period):
return tax_unit.sum(person("tanf_gross_unearned_income", period))
❌ INVALID - Pass-through with no modification:
class in_tanf_gross_income(Variable):
def formula(entity, period):
return entity("tanf_gross_income", period)
✅ VALID - Has state-specific disregard:
class in_tanf_countable_earned_income(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.in.tanf.income
earned = spm_unit("tanf_gross_earned_income", period)
return earned * (1 - p.earned_income_disregard_rate) # STATE LOGIC
✅ VALID - Uses state-specific limits:
class in_tanf_income_eligible(Variable):
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.in.tanf
income = spm_unit("tanf_countable_income", period)
size = spm_unit("spm_unit_size", period.this_year)
limit = p.income_limit[min_(size, p.max_household_size)] # STATE PARAMS
return income <= limit
✅ VALID - IL has different counting rules:
class il_tanf_assistance_unit_size(Variable):
adds = [
"il_tanf_payment_eligible_child", # STATE-SPECIFIC
"il_tanf_payment_eligible_parent", # STATE-SPECIFIC
]
For TANF implementations:
❌ DON'T create these (use federal directly):
state_tanf_assistance_unit_size (unless different counting rules like IL)state_tanf_countable_unearned_income (unless state has disregards)state_tanf_gross_income (just use federal baseline)return entity("federal_variable", period)✅ DO create these (when state has unique rules):
state_tanf_countable_earned_income (if unique disregard %)state_tanf_income_eligible (state income limits)state_tanf_maximum_benefit (state payment standards)state_tanf (final benefit calculation)Option 1: Use Federal (Simplified)
class ca_tanf_eligible(Variable):
def formula(spm_unit, period, parameters):
# Use federal variable
has_eligible = spm_unit.any(
spm_unit.members("is_demographic_tanf_eligible", period)
)
return has_eligible & income_eligible
Option 2: State-Specific (Different thresholds)
class ca_tanf_demographic_eligible_person(Variable):
def formula(person, period, parameters):
p = parameters(period).gov.states.ca.tanf
age = person("age", period.this_year) # NOT monthly_age
age_limit = where(
person("is_full_time_student", period),
p.age_threshold.student,
p.age_threshold.minor_child
)
return age < age_limit
class program_income_eligible(Variable):
value_type = bool
entity = SPMUnit
definition_period = MONTH
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.program
income = spm_unit("program_countable_income", period)
size = spm_unit("spm_unit_size", period.this_year)
# Get threshold from parameters
threshold = p.income_limit[min_(size, p.max_household_size)]
return income <= threshold
class program_benefit(Variable):
value_type = float
entity = SPMUnit
definition_period = MONTH
unit = USD
def formula(spm_unit, period, parameters):
p = parameters(period).gov.states.xx.program
eligible = spm_unit("program_eligible", period)
# Calculate benefit amount
base = p.benefit_schedule.base_amount
adjustment = p.benefit_schedule.adjustment_rate
size = spm_unit("spm_unit_size", period.this_year)
amount = base + (size - 1) * adjustment
return where(eligible, amount, 0)
def formula(entity, period, parameters):
p = parameters(period).gov.states.az.program
federal_p = parameters(period).gov.hhs.fpg
# Federal base with state scale
size = entity("household_size", period.this_year)
fpg = federal_p.first_person + federal_p.additional * (size - 1)
state_scale = p.income_limit_scale # Often exists
income_limit = fpg * state_scale
Some variables need to compare values under current law (baseline) vs a proposed reform. This is common for:
employer_NI_fixed_employer_cost_change)CRITICAL: When a simulation has a baseline (i.e., it's a reform simulation), you must explicitly access baseline parameters:
def formula(person, period, parameters):
simulation = person.simulation
# Check if this is a reform simulation with a baseline
if simulation.baseline is not None:
# Access baseline parameters through the baseline's tax benefit system
baseline_parameters = simulation.baseline.tax_benefit_system.get_parameters_at_instant(period)
baseline_value = baseline_parameters.gov.hmrc.national_insurance.some_rate
else:
# No baseline exists - use current parameters as baseline
baseline_parameters = parameters(period)
baseline_value = baseline_parameters.gov.hmrc.national_insurance.some_rate
# Get reform (current) value
reform_value = parameters(period).gov.hmrc.national_insurance.some_rate
# Calculate difference
return reform_value - baseline_value
❌ WRONG - Using current parameters for baseline:
def formula(person, period, parameters):
p = parameters(period)
# This gets REFORM parameters, not baseline!
baseline_rate = p.gov.hmrc.national_insurance.some_rate
reform_rate = p.gov.hmrc.national_insurance.some_rate
return reform_rate - baseline_rate # Always returns 0!
✅ CORRECT - Properly accessing baseline:
def formula(person, period, parameters):
simulation = person.simulation
if simulation.baseline is not None:
baseline_p = simulation.baseline.tax_benefit_system.get_parameters_at_instant(period)
else:
baseline_p = parameters(period)
baseline_rate = baseline_p.gov.hmrc.national_insurance.some_rate
reform_rate = parameters(period).gov.hmrc.national_insurance.some_rate
return reform_rate - baseline_rate
This pattern is essential when:
Without this pattern, reform simulations will incorrectly show zero change because both "baseline" and "reform" values come from the same (reform) parameters.
Before creating any variable:
CRITICAL: Complete implementation means every parameter is used!
When you create parameters, you MUST create corresponding variables:
| Parameter Type | Required Variable(s) |
|---|---|
| resources/limit | state_program_resource_eligible |
| income/limit | state_program_income_eligible |
| payment_standard | state_program_maximum_benefit |
| income/disregard | state_program_countable_earned_income |
| categorical/requirements | state_program_categorically_eligible |
The main eligibility variable MUST combine ALL checks:
class state_program_eligible(Variable):
def formula(spm_unit, period, parameters):
income_eligible = spm_unit("state_program_income_eligible", period)
resource_eligible = spm_unit("state_program_resource_eligible", period) # DON'T FORGET!
categorical = spm_unit("state_program_categorically_eligible", period)
return income_eligible & resource_eligible & categorical
Common Implementation Failures:
When implementing variables:
adds when possible - cleaner than formula