Machine Psychology Fieldkit

A Claude Code plugin with skills for behavioral evaluation of LLMs using Petri and Bloom.

Installation

Option 1: Add as a marketplace (recommended)

# Add the repo as a marketplace
claude plugin marketplace add https://github.com/k3nnethfrancis/machine-psychology-fieldkit

# Install the plugin
claude plugin install machine-psychology-fieldkit

Option 2: Clone and load locally

# Clone the repo
git clone https://github.com/k3nnethfrancis/machine-psychology-fieldkit.git

# Run Claude Code with the plugin directory
claude --plugin-dir /path/to/machine-psychology-fieldkit

Verify installation

claude plugin list

You should see machine-psychology-fieldkit in the list.

Prerequisites

1. Clone Petri and Bloom

# Clone both repos
git clone https://github.com/anthropics/petri.git
git clone https://github.com/anthropics/bloom.git

2. Set up Petri

cd petri

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e ".[dev]"

# Set API key
export ANTHROPIC_API_KEY="your-key-here"

3. Set up Bloom

cd bloom

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e .

# Set API key
export ANTHROPIC_API_KEY="your-key-here"

Skills

petri-collaborator

Run adversarial audits with Petri. The skill helps you:

Design realistic seed instructions
Run evaluations against target models
Interpret the 36-dimension behavioral scores
Understand meta-scores that flag evaluation quality

Quick start:

cd petri
inspect eval src/petri/tasks/petri.py --model anthropic/claude-sonnet-4-20250514

bloom-collaborator

Generate evaluation scenarios with Bloom. The skill helps you:

Define behaviors to test
Configure variation dimensions for robustness testing
Run the four-stage pipeline (understanding → ideation → rollout → judgment)
Interpret elicitation rates and identify patterns

Quick start:

cd bloom
python -m bloom.run --config configs/your_config.yaml

Usage

Once installed, Claude Code automatically activates these skills when you're working on behavioral evaluation tasks. You can also invoke them directly by typing /petri-collaborator or /bloom-collaborator.

When to Use Which

Use Case	Tool
Broad audit across 36 dimensions	Petri
Test a specific behavior hypothesis	Bloom
Compare models on standard battery	Petri
Measure robustness across framings	Bloom

Resources

Petri paper - Adversarial auditing methodology
Bloom paper - Automated scenario generation
Behavioral Evaluation for AI Systems - Methodology notes
Machine Psychology - Background on the field

License

MIT

machine-psychology-fieldkit

Popularity

What's Inside

README