Validates that code actually works through sandbox testing, execution verification, and systematic debugging. Use this skill after code generation or modification to ensure functionality is genuine rather than assumed. The skill creates isolated test environments, executes code with realistic inputs, identifies bugs through systematic analysis, and applies best practices to fix issues without breaking existing functionality.
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
process-diagram.gvprocess.mdreadme.mdname: when-validating-code-works-use-functionality-audit type: testing-quality description: 'Validates that code actually works through sandbox testing, execution verification, and systematic debugging.
Use this skill after code generation or modification to ensure functionality is genuine rather than assumed.
The skill creates isolated test environments, executes code with realistic inputs, identifies bugs through
systematic analysis, and applies best practices to fix issues without breaking existing functionality.
' agents:
Use this skill when:
Do NOT use this skill for:
This skill succeeds when:
Handle these edge cases carefully:
CRITICAL RULES - ALWAYS FOLLOW:
Use multiple validation perspectives:
Validation Threshold: Findings require 2+ confirming signals before flagging as violations.
This skill integrates with:
Trigger Conditions:
Situations Requiring Functionality Audit:
This skill systematically validates that code delivers its intended behavior through actual execution rather than static analysis alone. It creates isolated testing environments (sandboxes), executes code with realistic inputs, captures outputs and errors, identifies root causes of failures through systematic debugging, and applies fixes using best practices that preserve existing functionality.
The skill emphasizes genuine functionality over appearance, detecting "theater code" that looks correct but fails during execution. It combines automated testing, manual validation, and debugging expertise to ensure code reliability.
Agents: tester (lead), coder (support) Duration: 10-15 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 1: Setup Testing Environment"
npx claude-flow swarm init --topology hierarchical --max-agents 2
# Spawn agents
npx claude-flow agent spawn --type tester --capabilities "sandbox-setup,environment-config,dependency-management"
npx claude-flow agent spawn --type coder --capabilities "tooling-setup,script-generation"
# Memory coordination - store environment config
npx claude-flow memory store --key "testing/functionality-audit/phase-1/tester/sandbox-config" --value '{"isolated":true,"snapshot_enabled":true}'
npx claude-flow memory store --key "testing/functionality-audit/phase-1/coder/dependencies" --value '{"package_manager":"npm","install_command":"npm install"}'
# Execute phase work
# 1. Create isolated sandbox environment
echo "Creating sandbox for code execution..."
mkdir -p /tmp/functionality-audit-sandbox
cd /tmp/functionality-audit-sandbox
# 2. Install dependencies and setup tools
npm init -y 2>/dev/null || true
npm install --save-dev jest @types/jest ts-node typescript 2>/dev/null || true
# 3. Configure testing framework
cat > jest.config.js << 'EOF'
module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
collectCoverage: true,
coverageDirectory: 'coverage',
testMatch: ['**/*.test.ts', '**/*.test.js'],
verbose: true
};
EOF
# Complete phase
npx claude-flow hooks post-task --task-id "phase-1-setup"
npx claude-flow memory store --key "testing/functionality-audit/phase-1/output" --value '{"status":"complete","sandbox_ready":true}'
Memory Pattern:
testing/functionality-audit/phase-0/user/code-to-validatetesting/functionality-audit/phase-1/tester/sandbox-readytesting/functionality-audit/shared/environment-configSuccess Criteria:
Deliverables:
Agents: tester (lead), production-validator (support) Duration: 15-20 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 2: Execute Code with Realistic Inputs"
npx claude-flow swarm scale --target-agents 2
# Retrieve sandbox config from Phase 1
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-1/output"
# Memory coordination - define test scenarios
npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/test-scenarios" --value '{"unit_tests":true,"integration_tests":true,"edge_cases":true}'
# Execute phase work
# 1. Copy code to sandbox
echo "Copying code to sandbox environment..."
# (Code paths retrieved from memory)
# 2. Generate test cases with realistic inputs
cat > sandbox-tests.test.js << 'EOF'
// Auto-generated test cases for functionality validation
describe('Functionality Audit Tests', () => {
test('Happy path with valid inputs', async () => {
// Test implementation with realistic data
});
test('Edge cases and boundary conditions', async () => {
// Test edge cases
});
test('Error handling with invalid inputs', async () => {
// Test error scenarios
});
});
EOF
# 3. Execute tests and capture output
npm test -- --coverage --verbose > test-output.log 2>&1
TEST_EXIT_CODE=$?
# 4. Store results in memory
npx claude-flow memory store --key "testing/functionality-audit/phase-2/tester/execution-results" --value "{\"exit_code\":$TEST_EXIT_CODE,\"timestamp\":\"$(date -Iseconds)\"}"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-2-execute"
Memory Pattern:
testing/functionality-audit/phase-1/tester/sandbox-readytesting/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/shared/test-logsSuccess Criteria:
Deliverables:
Agents: coder (lead), tester (support), reviewer (validation) Duration: 20-30 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 3: Debug Issues"
npx claude-flow swarm scale --target-agents 3
# Retrieve execution results from Phase 2
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results"
# Memory coordination - analyze failures
npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/debugging-strategy" --value '{"method":"systematic-root-cause","tools":["debugger","logs","profiler"]}'
# Execute phase work
# 1. Analyze test failures systematically
echo "Analyzing failures and errors..."
grep -E "(FAIL|ERROR|Exception)" test-output.log > failures.txt || true
# 2. Identify root causes using debugging techniques
# - Stack trace analysis
# - Variable inspection
# - Logic flow validation
# - Dependency conflict detection
# 3. Categorize issues by type
cat > issue-analysis.json << 'EOF'
{
"syntax_errors": [],
"runtime_errors": [],
"logic_errors": [],
"integration_failures": [],
"dependency_issues": []
}
EOF
# 4. Prioritize fixes by impact
# Critical: Code doesn't run at all
# High: Core functionality broken
# Medium: Edge cases failing
# Low: Minor issues, optimizations
# Store debugging results
npx claude-flow memory store --key "testing/functionality-audit/phase-3/coder/root-causes" --value "$(cat issue-analysis.json)"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-3-debug"
Memory Pattern:
testing/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/shared/issue-trackerSuccess Criteria:
Deliverables:
Agents: production-validator (lead), reviewer (quality gate), tester (regression) Duration: 15-25 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 4: Validate Functionality"
npx claude-flow swarm scale --target-agents 3
# Retrieve fixes from Phase 3
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes"
# Memory coordination - define validation criteria
npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/criteria" --value '{"functional_correctness":true,"performance_acceptable":true,"no_regressions":true}'
# Execute phase work
# 1. Apply fixes and rerun tests
echo "Applying fixes and validating..."
npm test -- --coverage --verbose > retest-output.log 2>&1
RETEST_EXIT_CODE=$?
# 2. Compare results with baseline
diff test-output.log retest-output.log > validation-diff.txt || true
# 3. Validate no regressions introduced
# - Check previously passing tests still pass
# - Verify no new failures introduced
# - Confirm fixes resolved original issues
# 4. Production readiness assessment
cat > production-readiness.json << 'EOF'
{
"all_tests_passing": false,
"coverage_threshold_met": false,
"performance_acceptable": false,
"security_validated": false,
"ready_for_deployment": false
}
EOF
# Update readiness based on results
if [ $RETEST_EXIT_CODE -eq 0 ]; then
echo "Tests passing, updating readiness..."
fi
# Store validation results
npx claude-flow memory store --key "testing/functionality-audit/phase-4/validator/readiness" --value "$(cat production-readiness.json)"
# Complete phase
npx claude-flow hooks post-task --task-id "phase-4-validate"
Memory Pattern:
testing/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/phase-4/validator/readinesstesting/functionality-audit/shared/validation-resultsSuccess Criteria:
Deliverables:
Agents: reviewer (lead), tester (metrics) Duration: 10-15 minutes
Scripts:
# Initialize phase
npx claude-flow hooks pre-task --description "Phase 5: Report Results"
npx claude-flow swarm scale --target-agents 2
# Retrieve all phase outputs
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-4/validator/readiness"
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-3/coder/root-causes"
npx claude-flow memory retrieve --key "testing/functionality-audit/phase-2/tester/execution-results"
# Execute phase work
# 1. Compile comprehensive audit report
cat > functionality-audit-report.md << 'EOF'
# Functionality Audit Report
## Executive Summary
- **Status**: [PASS/FAIL/PARTIAL]
- **Tests Executed**: X
- **Tests Passed**: Y
- **Code Coverage**: Z%
- **Critical Issues**: N
## Phase-by-Phase Results
### Phase 1: Environment Setup
- Status: Complete
- Issues: None
### Phase 2: Execution Testing
- Total Tests: X
- Passed: Y
- Failed: Z
- Coverage: N%
### Phase 3: Debugging
- Issues Identified: X
- Root Causes: Y categories
- Fixes Applied: Z
### Phase 4: Validation
- Regression Status: No regressions
- Production Ready: Yes/No
- Performance: Acceptable
## Recommendations
1. [Action items for addressing remaining issues]
2. [Performance optimization suggestions]
3. [Code quality improvements]
## Artifacts
- Test logs: test-output.log
- Coverage report: coverage/
- Issue analysis: issue-analysis.json
- Validation results: production-readiness.json
EOF
# 2. Generate metrics dashboard
cat > audit-metrics.json << 'EOF'
{
"timestamp": "",
"duration_minutes": 0,
"tests_total": 0,
"tests_passed": 0,
"coverage_percentage": 0,
"issues_found": 0,
"issues_fixed": 0,
"production_ready": false
}
EOF
# 3. Store final results
npx claude-flow memory store --key "testing/functionality-audit/final-report" --value "$(cat functionality-audit-report.md)"
# Complete phase and export metrics
npx claude-flow hooks post-task --task-id "phase-5-report" --export-metrics true
npx claude-flow hooks session-end --export-metrics true
Memory Pattern:
testing/functionality-audit/phase-{1-4}/*/outputtesting/functionality-audit/final-reporttesting/functionality-audit/shared/complete-auditSuccess Criteria:
Deliverables:
testing/functionality-audit/phase-{N}/{agent-type}/{data-type}
Examples:
testing/functionality-audit/phase-1/tester/sandbox-configtesting/functionality-audit/phase-2/tester/execution-resultstesting/functionality-audit/phase-3/coder/root-causestesting/functionality-audit/phase-4/validator/readinesstesting/functionality-audit/shared/environment-config// Store outputs for downstream agents
mcp__claude_flow__memory_store({
key: "testing/functionality-audit/phase-2/tester/output",
value: JSON.stringify({
status: "complete",
results: {
tests_run: 45,
tests_passed: 38,
tests_failed: 7,
coverage: 82.5,
execution_time_ms: 3420
},
timestamp: Date.now(),
artifacts: {
logs: "/tmp/functionality-audit-sandbox/test-output.log",
coverage: "/tmp/functionality-audit-sandbox/coverage/"
}
})
})
// Retrieve inputs from upstream agents
mcp__claude_flow__memory_retrieve({
key: "testing/functionality-audit/phase-1/tester/sandbox-ready"
}).then(data => {
const sandboxConfig = JSON.parse(data);
console.log(`Using sandbox: ${sandboxConfig.path}`);
})
# Pre-task hook - Initialize audit session
npx claude-flow hooks pre-task \
--description "Functionality audit for module: auth-service" \
--session-id "functionality-audit-$(date +%s)"
# Post-edit hook - Track code changes during debugging
npx claude-flow hooks post-edit \
--file "src/auth/login.ts" \
--memory-key "testing/functionality-audit/coder/edits"
# Post-task hook - Finalize and export metrics
npx claude-flow hooks post-task \
--task-id "functionality-audit" \
--export-metrics true
# Session end - Generate summary and persist state
npx claude-flow hooks session-end \
--export-metrics true \
--summary "Functionality audit completed: 38/45 tests passing, 7 issues debugged and fixed"
Before finalizing each phase, validate:
Does this approach align with successful past work?
Do the outputs support the stated objectives?
Is the chosen method appropriate for the context?
Are there any internal contradictions?
Define objective precisely
Decompose into sub-goals
Identify dependencies
Evaluate options
Synthesize solution
Planning Phase:
Validation Gate 1: Review strategy against objectives
Implementation Phase:
Validation Gate 2: Verify outputs and performance
Optimization Phase:
Validation Gate 3: Confirm targets met before concluding
Related Skills:
when-detecting-fake-code-use-theater-detection - Pre-audit detection of non-functional codewhen-reviewing-code-comprehensively-use-code-review-assistant - Post-audit quality reviewwhen-verifying-quality-use-verification-quality - Comprehensive quality validationwhen-auditing-code-style-use-style-audit - Code style and conventions validationCoordination Points:
Memory Sharing:
# Share audit results with code review skill
npx claude-flow memory store \
--key "code-review/input/functionality-audit-results" \
--value "$(cat functionality-audit-report.md)"
# Retrieve theater detection findings
npx claude-flow memory retrieve \
--key "theater-detection/output/suspicious-code"
Symptoms: Dependency installation errors, configuration failures Root Cause: Missing system dependencies, incompatible versions Solution:
# Validate system dependencies
node --version
npm --version
# Clean install
rm -rf node_modules package-lock.json
npm install --force
# Use Docker for isolation if local setup problematic
docker run -it --rm -v $(pwd):/workspace node:18 bash
Symptoms: Runtime errors referencing undefined config Root Cause: Environment variables not set in sandbox Solution:
# Create .env file with required variables
cat > .env << 'EOF'
NODE_ENV=test
API_KEY=test-key-12345
DATABASE_URL=sqlite::memory:
EOF
# Load environment in test setup
require('dotenv').config();
Symptoms: Coverage report shows gaps in tested code Root Cause: Missing test cases for edge cases, error paths Solution:
Symptoms: Fixing one issue breaks other functionality Root Cause: Tight coupling, shared state, hidden dependencies Solution:
Symptoms: Tests pass but execution time significantly increased Root Cause: Inefficient fix implementation, excessive validation Solution:
# Profile execution to identify bottlenecks
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Optimize hot paths
# - Cache repeated computations
# - Reduce unnecessary iterations
# - Use appropriate data structures
# Initialize audit for API endpoint
npx claude-flow hooks pre-task --description "Audit /api/users endpoint"
# Phase 1: Setup
mkdir -p /tmp/api-audit
cd /tmp/api-audit
npm init -y
npm install --save-dev jest supertest
# Phase 2: Execute tests
cat > users.test.js << 'EOF'
const request = require('supertest');
const app = require('../src/app');
describe('GET /api/users', () => {
test('returns 200 with user list', async () => {
const response = await request(app).get('/api/users');
expect(response.statusCode).toBe(200);
expect(Array.isArray(response.body)).toBe(true);
});
test('handles authentication', async () => {
const response = await request(app)
.get('/api/users')
.set('Authorization', 'Bearer invalid-token');
expect(response.statusCode).toBe(401);
});
});
EOF
npm test
# Phase 3: Debug failures (if any)
# Analyze error messages and stack traces
# Fix authentication logic or data access issues
# Phase 4: Validate fixes
npm test -- --coverage
# Phase 5: Report
echo "API audit complete: All endpoints functional"
// Phase 2: Create test with realistic data
const { processUserData } = require('./data-processor');
describe('processUserData', () => {
test('processes valid user data correctly', () => {
const input = {
name: 'John Doe',
email: 'john@example.com',
age: 30,
roles: ['user', 'admin']
};
const result = processUserData(input);
expect(result).toHaveProperty('id');
expect(result.email).toBe('john@example.com');
expect(result.roles).toContain('admin');
});
test('handles missing fields gracefully', () => {
const input = { name: 'Jane' };
const result = processUserData(input);
expect(result).toHaveProperty('email');
expect(result.email).toBe(''); // Default value
});
test('validates email format', () => {
const input = { email: 'invalid-email' };
expect(() => processUserData(input)).toThrow('Invalid email format');
});
});
// Phase 3: Debug - Found that email validation regex was incorrect
// Fix: Update regex to properly match email patterns
// Phase 4: Validate - All tests pass, coverage 95%
// Phase 2: Create component test with user interactions
import { render, screen, fireEvent } from '@testing-library/react';
import LoginForm from './LoginForm';
describe('LoginForm', () => {
test('submits form with valid credentials', async () => {
const handleSubmit = jest.fn();
render(<LoginForm onSubmit={handleSubmit} />);
fireEvent.change(screen.getByLabelText(/email/i), {
target: { value: 'test@example.com' }
});
fireEvent.change(screen.getByLabelText(/password/i), {
target: { value: 'password123' }
});
fireEvent.click(screen.getByRole('button', { name: /login/i }));
await screen.findByText(/logging in/i);
expect(handleSubmit).toHaveBeenCalledWith({
email: 'test@example.com',
password: 'password123'
});
});
test('displays validation errors', () => {
render(<LoginForm onSubmit={jest.fn()} />);
fireEvent.click(screen.getByRole('button', { name: /login/i }));
expect(screen.getByText(/email is required/i)).toBeInTheDocument();
expect(screen.getByText(/password is required/i)).toBeInTheDocument();
});
});
// Phase 3: Debug - Found that form validation wasn't triggering
// Fix: Add form validation logic with proper error state management
// Phase 4: Validate - Component renders and validates correctly