Use to audit test quality with Google Fellow SRE scrutiny - identifies tautological tests, coverage gaming, weak assertions, missing corner cases. Creates bd epic with tasks for improvements, then runs SRE task refinement on each.
This skill inherits all available tools. When active, it can use any tool Claude has access to.
<skill_overview> Audit test suites for real effectiveness, not vanity metrics. Identify tests that provide false confidence (tautological, mock-testing, line hitters) and missing corner cases. Create bd epic with tracked tasks for improvements. Run SRE task refinement on each task before execution. </skill_overview>
<rigidity_level> MEDIUM FREEDOM - Follow the 5-phase analysis process exactly. Categorization criteria (RED/YELLOW/GREEN) are rigid. Corner case discovery adapts to the specific codebase. Output format is flexible but must include all sections. </rigidity_level>
<quick_reference>
| Phase | Action | Output |
|---|---|---|
| 1. Inventory | List all test files and functions | Test catalog |
| 2. Categorize | Apply RED/YELLOW/GREEN criteria to each test | Categorized tests |
| 3. Corner Cases | Identify missing edge cases per module | Gap analysis |
| 4. Prioritize | Rank by business criticality | Priority matrix |
| 5. bd Issues | Create epic + tasks, run SRE refinement | Tracked improvement plan |
Core Questions for Each Test:
!= nil → weak)bd Integration (MANDATORY):
Mutation Testing Validation:
mvn org.pitest:pitest-maven:mutationCoverage)npx stryker run)mutmut run)
</quick_reference><when_to_use> Use this skill when:
Don't use when:
<the_process>
Announce: "I'm using hyperpowers:analyzing-test-effectiveness to audit test quality with Google Fellow SRE-level scrutiny."
Goal: Create complete catalog of tests to analyze.
# Find all test files (adapt pattern to language)
fd -e test.ts -e spec.ts -e _test.go -e Test.java -e test.py .
# Or use grep to find test functions
rg "func Test|it\(|test\(|def test_|@Test" --type-add 'test:*test*' -t test
# Count tests per module
for dir in src/*/; do
count=$(rg -c "func Test|it\(" "$dir" 2>/dev/null | wc -l)
echo "$dir: $count tests"
done
Create inventory TodoWrite:
- Analyze tests in src/auth/
- Analyze tests in src/api/
- Analyze tests in src/parser/
[... one per module]
For each test, apply these criteria:
2.1 Tautological Tests (pass by definition)
// ❌ RED: Verifies non-optional return is not nil
test('builder returns value', () => {
const result = new Builder().build();
expect(result).not.toBeNull(); // Always passes - return type guarantees this
});
// ❌ RED: Verifies enum has cases (compiler checks this)
test('status enum has values', () => {
expect(Object.values(Status).length).toBeGreaterThan(0);
});
// ❌ RED: Duplicates implementation
test('add returns sum', () => {
expect(add(2, 3)).toBe(2 + 3); // Tautology: testing 2+3 == 2+3
});
Detection patterns:
# Find != nil / != null on non-optional types
rg "expect\(.*\)\.not\.toBeNull|assertNotNull|!= nil" tests/
# Find enum existence checks
rg "Object\.values.*length|cases\.count" tests/
# Find tests with no meaningful assertions
rg -l "expect\(" tests/ | xargs -I {} sh -c 'grep -c "expect" {} | grep -q "^1$" && echo {}'
2.2 Mock-Testing Tests (test the mock, not production)
// ❌ RED: Only verifies mock was called, not actual behavior
test('service fetches data', () => {
const mockApi = { fetch: jest.fn().mockResolvedValue({ data: [] }) };
const service = new Service(mockApi);
service.getData();
expect(mockApi.fetch).toHaveBeenCalled(); // Tests mock, not service logic
});
// ❌ RED: Mock determines test outcome
test('processor handles data', () => {
const mockParser = { parse: jest.fn().mockReturnValue({ valid: true }) };
const result = processor.process(mockParser);
expect(result.valid).toBe(true); // Just returns what mock returns
});
Detection patterns:
# Find tests that only verify mock calls
rg "toHaveBeenCalled|verify\(mock|\.called" tests/
# Find heavy mock setup
rg -c "mock|Mock|jest\.fn|stub" tests/ | sort -t: -k2 -nr | head -20
2.3 Line Hitters (execute without asserting)
// ❌ RED: Calls function, doesn't verify outcome
test('processor runs', () => {
const processor = new Processor();
processor.run(); // No assertion - just verifies no crash
});
// ❌ RED: Assertion is trivial
test('config loads', () => {
const config = loadConfig();
expect(config).toBeDefined(); // Too weak - doesn't verify correct values
});
Detection patterns:
# Find tests with 0-1 assertions
rg -l "test\(|it\(" tests/ | while read f; do
assertions=$(rg -c "expect|assert" "$f" 2>/dev/null || echo 0)
tests=$(rg -c "test\(|it\(" "$f" 2>/dev/null || echo 1)
ratio=$((assertions / tests))
[ "$ratio" -lt 2 ] && echo "$f: low assertion ratio ($assertions assertions, $tests tests)"
done
2.4 Evergreen/Liar Tests (always pass)
// ❌ RED: Catches and ignores exceptions
test('parser handles input', () => {
try {
parser.parse(input);
expect(true).toBe(true); // Always passes
} catch (e) {
// Swallowed - test passes even on exception
}
});
// ❌ RED: Test setup bypasses code under test
test('validator validates', () => {
const validator = new Validator({ skipValidation: true }); // Oops
expect(validator.validate(badInput)).toBe(true);
});
2.5 Happy Path Only
// ⚠️ YELLOW: Only tests valid input
test('parse valid json', () => {
const result = parse('{"name": "test"}');
expect(result.name).toBe('test');
});
// Missing: empty string, malformed JSON, deeply nested, unicode, huge payload
2.6 Weak Assertions
// ⚠️ YELLOW: Assertion too weak
test('fetch returns data', () => {
const result = await fetch('/api/users');
expect(result).not.toBeNull(); // Should verify actual content
expect(result.length).toBeGreaterThan(0); // Should verify exact count or specific items
});
2.7 Partial Coverage
// ⚠️ YELLOW: Tests success, not failure
test('create user succeeds', () => {
const user = createUser({ name: 'test', email: 'test@example.com' });
expect(user.id).toBeDefined();
});
// Missing: duplicate email, invalid email, missing fields, database error
2.8 Behavior Verification
// ✅ GREEN: Verifies specific behavior with exact values
test('calculateTotal applies discount correctly', () => {
const cart = new Cart([{ price: 100, quantity: 2 }]);
cart.applyDiscount('SAVE20');
expect(cart.total).toBe(160); // 200 - 20% = 160
});
2.9 Edge Case Coverage
// ✅ GREEN: Tests boundary conditions
test('username rejects empty string', () => {
expect(() => new User({ username: '' })).toThrow(ValidationError);
});
test('username handles unicode', () => {
const user = new User({ username: '日本語ユーザー' });
expect(user.username).toBe('日本語ユーザー');
});
2.10 Error Path Testing
// ✅ GREEN: Verifies error handling
test('fetch returns specific error on 404', () => {
mockServer.get('/api/user/999').reply(404);
await expect(fetchUser(999)).rejects.toThrow(UserNotFoundError);
});
For each module, identify missing corner case tests:
| Category | Examples | Tests to Add |
|---|---|---|
| Empty values | "", [], {}, null | test_empty_X_rejected/handled |
| Boundary values | 0, -1, MAX_INT, MAX_LEN | test_boundary_X_handled |
| Unicode | RTL, emoji, combining chars, null byte | test_unicode_X_preserved |
| Injection | SQL: '; DROP, XSS: <script>, cmd: ; rm | test_injection_X_escaped |
| Malformed | truncated JSON, invalid UTF-8, wrong type | test_malformed_X_error |
| Category | Examples | Tests to Add |
|---|---|---|
| Uninitialized | Use before init, double init | test_uninitialized_X_error |
| Already closed | Use after close, double close | test_closed_X_error |
| Concurrent | Parallel writes, read during write | test_concurrent_X_safe |
| Re-entrant | Callback calls same method | test_reentrant_X_safe |
| Category | Examples | Tests to Add |
|---|---|---|
| Network | timeout, connection refused, DNS fail | test_network_X_timeout |
| Partial response | truncated, corrupted, slow | test_partial_response_handled |
| Rate limiting | 429, quota exceeded | test_rate_limit_handled |
| Service errors | 500, 503, malformed response | test_service_error_handled |
| Category | Examples | Tests to Add |
|---|---|---|
| Exhaustion | OOM, disk full, max connections | test_resource_X_graceful |
| Contention | file locked, resource busy | test_contention_X_handled |
| Permissions | access denied, read-only | test_permission_X_error |
For each module, create corner case checklist:
### Module: src/auth/
**Covered Corner Cases:**
- [x] Empty password rejected
- [x] SQL injection in username escaped
**Missing Corner Cases (MUST ADD):**
- [ ] Unicode username preserved after roundtrip
- [ ] Concurrent login attempts don't corrupt session
- [ ] Password with null byte handled
- [ ] Very long password (10KB) rejected gracefully
- [ ] Login rate limiting enforced
**Priority:** HIGH (auth is business-critical)
| Priority | Criteria | Action Timeline |
|---|---|---|
| P0 - Critical | Auth, payments, data integrity | This sprint |
| P1 - High | Core business logic, user-facing features | Next sprint |
| P2 - Medium | Internal tools, admin features | Backlog |
| P3 - Low | Utilities, non-critical paths | As time permits |
Rank modules:
1. P0: src/auth/ - 5 RED tests, 12 missing corner cases
2. P0: src/payments/ - 2 RED tests, 8 missing corner cases
3. P1: src/api/ - 8 RED tests, 15 missing corner cases
4. P2: src/admin/ - 3 RED tests, 6 missing corner cases
CRITICAL: All findings MUST be tracked in bd and go through SRE task refinement.
bd create "Test Quality Improvement: [Module/Project]" \
--type epic \
--priority 1 \
--design "$(cat <<'EOF'
## Goal
Improve test effectiveness by removing tautological tests, strengthening weak tests, and adding missing corner case coverage.
## Success Criteria
- [ ] All RED tests removed or replaced with meaningful tests
- [ ] All YELLOW tests strengthened with proper assertions
- [ ] All P0 missing corner cases covered
- [ ] Mutation score ≥80% for P0 modules
## Scope
[Summary of modules analyzed and findings]
## Anti-patterns
- ❌ Adding tests that only check `!= nil`
- ❌ Adding tests that verify mock behavior
- ❌ Adding happy-path-only tests
- ❌ Leaving tautological tests "for coverage"
EOF
)"
Task 1: Remove Tautological Tests (Immediate)
bd create "Remove tautological tests from [module]" \
--type task \
--priority 0 \
--design "$(cat <<'EOF'
## Goal
Remove tests that provide false confidence by passing regardless of code correctness.
## Tests to Remove
[List each RED test with file:line]
- tests/auth.test.ts:45 - testUserExists (tautological: verifies non-optional != nil)
- tests/auth.test.ts:67 - testEnumHasCases (tautological: compiler checks this)
## Success Criteria
- [ ] All listed tests deleted
- [ ] No new tautological tests introduced
- [ ] Test suite still passes
- [ ] Coverage may decrease (this is expected and good)
## Anti-patterns
- ❌ Keeping tests "just in case"
- ❌ Replacing with equally meaningless tests
- ❌ Adding coverage-only tests to compensate
EOF
)"
Task 2: Strengthen Weak Tests (This Sprint)
bd create "Strengthen weak assertions in [module]" \
--type task \
--priority 1 \
--design "$(cat <<'EOF'
## Goal
Replace weak assertions with meaningful ones that catch real bugs.
## Tests to Strengthen
[List each YELLOW test with current vs recommended assertion]
- tests/parser.test.ts:34 - testParse
- Current: `expect(result).not.toBeNull()`
- Strengthen: `expect(result).toEqual(expectedAST)`
- tests/validator.test.ts:56 - testValidate
- Current: `expect(isValid).toBe(true)` (happy path only)
- Add edge cases: empty input, unicode, max length
## Success Criteria
- [ ] All weak assertions replaced with exact value checks
- [ ] Edge cases added to happy-path-only tests
- [ ] Each test documents what bug it catches
## Anti-patterns
- ❌ Replacing `!= nil` with `!= undefined` (still weak)
- ❌ Adding edge cases without meaningful assertions
EOF
)"
Task 3: Add Missing Corner Cases (Per Module)
bd create "Add missing corner case tests for [module]" \
--type task \
--priority 1 \
--design "$(cat <<'EOF'
## Goal
Add tests for corner cases that could cause production bugs.
## Corner Cases to Add
[List each with the bug it prevents]
- test_empty_password_rejected - prevents auth bypass
- test_unicode_username_preserved - prevents encoding corruption
- test_concurrent_login_safe - prevents session corruption
## Implementation Checklist
- [ ] Write failing test first (RED)
- [ ] Verify test fails for the right reason
- [ ] Test catches the specific bug listed
- [ ] Test has meaningful assertion (not just `!= nil`)
## Success Criteria
- [ ] All corner case tests written and passing
- [ ] Each test documents the bug it catches in test name/comment
- [ ] No tautological tests added
## Anti-patterns
- ❌ Writing test that passes immediately (didn't test anything)
- ❌ Testing mock behavior instead of production code
- ❌ Happy path only (defeats the purpose)
EOF
)"
MANDATORY: After creating bd tasks, run SRE task refinement:
Announce: "I'm using hyperpowers:sre-task-refinement to review these test improvement tasks."
Use Skill tool: hyperpowers:sre-task-refinement
Apply all 8 categories to each task, especially:
# Link all tasks as children of epic
bd dep add bd-2 bd-1 --type parent-child
bd dep add bd-3 bd-1 --type parent-child
bd dep add bd-4 bd-1 --type parent-child
# Set dependencies (remove before strengthen before add)
bd dep add bd-3 bd-2 # strengthen depends on remove
bd dep add bd-4 bd-3 # add depends on strengthen
bd create "Validate test improvements with mutation testing" \
--type task \
--priority 1 \
--design "$(cat <<'EOF'
## Goal
Verify test improvements actually catch more bugs using mutation testing.
## Validation Commands
```bash
# Java
mvn org.pitest:pitest-maven:mutationCoverage
# JavaScript/TypeScript
npx stryker run
# Python
mutmut run
# .NET
dotnet stryker
---
## Output Format
```markdown
# Test Effectiveness Analysis: [Project Name]
## Executive Summary
| Metric | Count | % |
|--------|-------|---|
| Total tests analyzed | N | 100% |
| RED (remove/replace) | N | X% |
| YELLOW (strengthen) | N | X% |
| GREEN (keep) | N | X% |
| Missing corner cases | N | - |
**Overall Assessment:** [CRITICAL / NEEDS WORK / ACCEPTABLE / GOOD]
## Detailed Findings
### RED Tests (Must Remove/Replace)
#### Tautological Tests
| Test | File:Line | Problem | Action |
|------|-----------|---------|--------|
#### Mock-Testing Tests
| Test | File:Line | Problem | Action |
|------|-----------|---------|--------|
#### Line Hitters
| Test | File:Line | Problem | Action |
|------|-----------|---------|--------|
#### Evergreen Tests
| Test | File:Line | Problem | Action |
|------|-----------|---------|--------|
### YELLOW Tests (Must Strengthen)
#### Weak Assertions
| Test | File:Line | Current | Recommended |
|------|-----------|---------|-------------|
#### Happy Path Only
| Test | File:Line | Missing Edge Cases |
|------|-----------|-------------------|
### GREEN Tests (Exemplars)
[List 3-5 tests that exemplify good testing practices for this codebase]
## Missing Corner Cases by Module
### [Module: name] - Priority: P0
| Corner Case | Bug Risk | Recommended Test |
|-------------|----------|------------------|
[Repeat for each module]
## bd Issues Created
### Epic
- **bd-N**: Test Quality Improvement: [Project Name]
### Tasks
| bd ID | Task | Priority | Status |
|-------|------|----------|--------|
| bd-N | Remove tautological tests from [module] | P0 | Created |
| bd-N | Strengthen weak assertions in [module] | P1 | Created |
| bd-N | Add missing corner case tests for [module] | P1 | Created |
| bd-N | Validate with mutation testing | P1 | Created |
### Dependency Tree
bd-1 (Epic: Test Quality Improvement) ├── bd-2 (Remove tautological tests) ├── bd-3 (Strengthen weak assertions) ← depends on bd-2 ├── bd-4 (Add corner case tests) ← depends on bd-3 └── bd-5 (Validate with mutation testing) ← depends on bd-4
## SRE Task Refinement Status
- [ ] All tasks reviewed with hyperpowers:sre-task-refinement
- [ ] Category 8 (Test Meaningfulness) applied to each task
- [ ] Success criteria are measurable
- [ ] Anti-patterns specified
## Next Steps
1. Run `bd ready` to see tasks ready for implementation
2. Implement tasks using hyperpowers:executing-plans
3. Run validation task to verify improvements
</the_process>
<examples> <example> <scenario>High coverage but production bugs keep appearing</scenario> <code> # Test suite stats Coverage: 92% Tests: 245 passing<why_it_fails>
Phase 1 - Inventory:
fd -e test.ts src/
# Found: auth.test.ts, user.test.ts, data.test.ts
Phase 2 - Categorize:
### auth.test.ts
| Test | Category | Problem |
|------|----------|---------|
| testAuthWorks | RED | Only checks `!= null` |
| testLoginFlow | YELLOW | Happy path only, no empty password |
| testTokenExpiry | GREEN | Verifies exact error |
### data.test.ts
| Test | Category | Problem |
|------|----------|---------|
| testDataSaves | RED | No assertion, just calls save() |
| testConcurrentWrites | MISSING | Not tested at all |
Phase 3 - Corner cases:
### auth module (P0)
Missing:
- [ ] test_empty_password_rejected
- [ ] test_unicode_username_preserved
- [ ] test_concurrent_login_safe
Phase 5 - Plan:
### Immediate
- Remove testAuthWorks (tautological)
- Remove testDataSaves (line hitter)
### This Sprint
- Add test_empty_password_rejected
- Add test_concurrent_writes_safe
- Strengthen testLoginFlow with edge cases
Result: Production bugs prevented by meaningful tests. </correction> </example>
<example> <scenario>Mock-heavy test suite that breaks on every refactor</scenario> <code> # Every refactor breaks 50+ tests # But bugs slip through to productiontest('service processes data', () => { const mockDb = jest.fn().mockReturnValue({ data: [] }); const mockCache = jest.fn().mockReturnValue(null); const mockLogger = jest.fn(); const mockValidator = jest.fn().mockReturnValue(true);
const service = new Service(mockDb, mockCache, mockLogger, mockValidator); service.process({ id: 1 });
expect(mockDb).toHaveBeenCalled(); expect(mockValidator).toHaveBeenCalled(); // Tests mock wiring, not actual behavior }); </code>
<why_it_fails>
### service.test.ts
| Test | Category | Problem | Action |
|------|----------|---------|--------|
| testServiceProcesses | RED | Only verifies mocks called | Replace with integration test |
| testServiceValidates | RED | Mock determines outcome | Test real validator |
| testServiceCaches | RED | Tests mock cache | Use real cache with test data |
Replacement strategy:
// ❌ Before: Tests mock wiring
test('service validates', () => {
const mockValidator = jest.fn().mockReturnValue(true);
const service = new Service(mockValidator);
expect(mockValidator).toHaveBeenCalled();
});
// ✅ After: Tests real behavior
test('service rejects invalid data', () => {
const service = new Service(new RealValidator());
const result = service.process({ id: -1 }); // Invalid ID
expect(result.error).toBe('INVALID_ID');
});
test('service accepts valid data', () => {
const service = new Service(new RealValidator());
const result = service.process({ id: 1, name: 'test' });
expect(result.success).toBe(true);
expect(result.data.name).toBe('test');
});
Result: Tests verify behavior, not implementation. Refactoring doesn't break tests. Real bugs caught. </correction> </example> </examples>
<critical_rules>
All of these mean: STOP. The test is probably RED or YELLOW.
<verification_checklist> Before completing analysis:
Per module:
Overall:
bd Integration (MANDATORY):
SRE Refinement Verification:
Validation:
This skill calls (MANDATORY):
This skill creates:
Workflow chain:
analyzing-test-effectiveness
↓ (creates bd issues)
sre-task-refinement (on each task)
↓ (refines tasks)
executing-plans (implements tasks)
↓ (runs validation)
review-implementation (verifies quality)
This skill informs:
Mutation testing tools:
mvn org.pitest:pitest-maven:mutationCoverage)npx stryker run)mutmut run)dotnet stryker)
</integration>
Key insight from Google: "Coverage mainly tells you about code that has no tests: it doesn't tell you about the quality of testing for the code that's 'covered'."
When stuck: