name: seer description: Set up GPU sandboxes for interpretability research. Use when writing setup.py scripts with Sandbox, SandboxConfig, ModelConfig, or create_notebook_session. Provides the exact API for Modal GPU environments - MUST read before writing any sandbox setup code.

Seer - Sandboxed Environments for Interpretability Research

Overview

Seer provides GPU-accelerated sandboxed environments for running interpretability experiments. You can set up remote environments with models pre-loaded, connect to Jupyter notebooks, and run experiments interactively.

CRITICAL: How to Start a GPU Sandbox

DO NOT use start_new_session() directly. That tool is only for local Jupyter sessions without GPU.

For GPU sandboxes, you MUST:

Write a setup script that uses the src library to create a Modal sandbox
Run the script with uv run --with "seer @ git+https://github.com/ajobi-uhc/seer.git" python setup.py
Parse the JSON output to get session_id and jupyter_url
Then call attach_to_session(session_id, jupyter_url) to connect

Example Setup Script

# setup.py
from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig
from src.workspace import Workspace
from src.execution import create_notebook_session
import json

config = SandboxConfig(
    execution_mode=ExecutionMode.NOTEBOOK,
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate"],
)

sandbox = Sandbox(config).start()
session = create_notebook_session(sandbox, Workspace())

print(json.dumps({
    "session_id": session.session_id,
    "jupyter_url": session.jupyter_url,
    "sandbox_id": sandbox.sandbox_id,  # IMPORTANT: Save this for sandbox management
}))

Then Run and Connect

uv run --with "seer @ git+https://github.com/ajobi-uhc/seer.git" python setup.py
# Output: {"session_id": "abc123", "jupyter_url": "https://...", "sandbox_id": "sb-xyz..."}

# Now use MCP tool to connect
attach_to_session(session_id="abc123", jupyter_url="https://...")

Only after attach_to_session() succeeds can you use execute_code().

Save the sandbox_id - you'll need it to manage the sandbox (terminate, snapshot, exec commands, etc.) using the modal-sandbox MCP tools.

Complete API Reference

Core Classes

SandboxConfig

Configuration for a Modal sandbox environment.

@dataclass
class SandboxConfig:
    gpu: Optional[str] = None              # "A100", "H100", "A10G", "L4", "T4", None for CPU
    gpu_count: int = 1                     # Number of GPUs (for multi-GPU setups)
    execution_mode: ExecutionMode = ExecutionMode.NOTEBOOK  # NOTEBOOK or CLI
    models: List[ModelConfig] = []         # Models to pre-load
    python_packages: List[str] = []        # pip packages
    system_packages: List[str] = []        # apt packages
    secrets: List[str] = []                # Env var names to pass from local env
    repos: List[RepoConfig] = []           # Git repos to clone
    env: Dict[str, str] = {}               # Environment variables
    timeout: int = 3600                    # Timeout in seconds
    local_files: List[Tuple[str, str]] = [] # (local_path, sandbox_path)
    local_dirs: List[Tuple[str, str]] = []  # (local_dir, sandbox_dir)
    debug: bool = False                    # Start code-server for debugging

GPU Options:

"H100" - NVIDIA H100 (80GB, fastest, best for 70B+ models)
"A100-80GB" - NVIDIA A100 80GB (use for large models, 30B+)
"A100-40GB" - NVIDIA A100 40GB (good default for most models)
"A10G" - NVIDIA A10G (24GB, good for 7B-13B models)
"L4" - NVIDIA L4 (24GB, cost-effective)
"T4" - NVIDIA T4 (16GB, cheapest, good for small models)
None - CPU only

Which GPU to use:

7B models: A10G, L4, or T4
9B-13B models: A100-40GB or A10G
30B+ models: A100-80GB
70B+ models: H100 or A100-80GB with gpu_count=2

Multi-GPU Example:

config = SandboxConfig(
    gpu="A100",
    gpu_count=2,  # 2x A100s for large models
    models=[ModelConfig(name="meta-llama/Llama-3-70b-hf")],
)

Example:

config = SandboxConfig(
    gpu="A100",
    execution_mode=ExecutionMode.NOTEBOOK,
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate", "matplotlib"],
    system_packages=["git"],
    secrets=["HF_TOKEN", "OPENAI_API_KEY"],
    env={"CUSTOM_VAR": "value"},
    timeout=7200,  # 2 hours
)

ModelConfig

Configuration for a model to load in the sandbox.

@dataclass
class ModelConfig:
    name: str                              # HuggingFace model ID (REQUIRED)
    var_name: str = "model"                # Variable name in notebook
    hidden: bool = False                   # Hide details from agent
    is_peft: bool = False                  # Is PEFT adapter
    base_model: Optional[str] = None       # Base model for PEFT

IMPORTANT: These are the ONLY valid parameters. Do not add load_kwargs, dtype, device_map, quantization, or any other parameters - they don't exist. Model loading configuration is handled automatically by the sandbox.

Examples:

Basic model:

ModelConfig(name="google/gemma-2-9b-it")

Multiple models with custom names:

models=[
    ModelConfig(name="google/gemma-2-9b-it", var_name="model_a"),
    ModelConfig(name="meta-llama/Llama-2-7b-hf", var_name="model_b"),
]

PEFT adapter (for investigating fine-tuned models):

ModelConfig(
    name="user/gemma-adapter",
    base_model="google/gemma-2-9b-it",
    is_peft=True,
    hidden=True  # Hide the adapter name from the agent
)

RepoConfig

Configuration for cloning Git repositories.

@dataclass
class RepoConfig:
    url: str                               # Git URL (e.g., "owner/repo" or full URL)
    dockerfile: Optional[str] = None       # Path to Dockerfile in repo
    install: bool = False                  # pip install -e the repo

Example:

repos=[RepoConfig(url="anthropics/circuits", install=True)]

Sandbox Class

The Sandbox class manages GPU environments on Modal.

Methods

sandbox = Sandbox(config)

# Start the sandbox (required before any other operations)
sandbox.start(name="my-sandbox")  # Returns self for chaining

# Execute shell commands
output = sandbox.exec("pip list")
output = sandbox.exec("nvidia-smi", timeout=30)

# Execute Python code directly
result = sandbox.exec_python("import torch; print(torch.cuda.is_available())")

# Write files to sandbox
sandbox.write_file("/workspace/script.py", "print('hello')")

# Create directories
sandbox.ensure_dir("/workspace/data/outputs")

# Snapshot current state (for resuming later)
snapshot = sandbox.snapshot("after training")

# Terminate (optionally save snapshot first)
sandbox.terminate()
# or
snapshot = sandbox.terminate(save_snapshot=True, snapshot_description="final state")

# Restore from snapshot
sandbox2 = Sandbox.from_snapshot(snapshot, config)

Properties

sandbox.jupyter_url      # Jupyter server URL (if notebook mode)
sandbox.code_server_url  # VS Code server URL (if debug=True)
sandbox.sandbox_id       # Modal sandbox ID
sandbox.model_handles    # List of ModelHandle objects
sandbox.repo_handles     # List of RepoHandle objects
sandbox.modal_sandbox    # Raw modal.Sandbox object (advanced)

NotebookSession

Returned by create_notebook_session(). Represents a live Jupyter kernel.

Properties

session.session_id       # Unique session ID (use with MCP tools)
session.jupyter_url      # Jupyter server URL
session.sandbox          # Parent Sandbox object
session.model_info_text  # Formatted string describing loaded models
session.mcp_config       # MCP server config dict for connecting agents

Methods

# Execute code in notebook kernel
result = session.exec("print('hello')")
result = session.exec("x = 42", hidden=True)  # Hidden from notebook

# Execute a Python file
result = session.exec_file("script.py")

# Apply workspace (usually done automatically)
session.setup(workspace)

Library Class

Libraries are Python files that get injected into the execution environment.

Creation Methods

from src.workspace import Library

# From a single Python file
lib = Library.from_file("utils.py")
lib = Library.from_file("helpers.py", name="my_helpers")  # Custom import name

# From a directory (Python package)
lib = Library.from_directory("my_package/")  # Must have __init__.py

# From a skill directory (SKILL.md format)
lib = Library.from_skill_dir("skills/steering-hook/")  # Loads code.py

# Manual construction
lib = Library(
    name="tools",
    files={"tools.py": "def helper(): pass"},
    docs="Helper utilities for experiments",
)

Properties

lib.name            # Import name
lib.files           # Dict of filename -> source code
lib.docs            # Documentation string
lib.is_single_file  # True if single .py file (not package)

Workspace Class

Workspace bundles libraries and configuration for a session.

from src.workspace import Workspace, Library

workspace = Workspace(
    libraries=[
        Library.from_file("steering_hook.py"),
        Library.from_file("extract_activations.py"),
    ],
    skills=[],                    # Skill objects to install
    skill_dirs=[],                # Paths to skill directories
    local_files=[],               # (local_path, workspace_path) for files
    local_dirs=[],                # (local_path, workspace_path) for directories
    custom_init_code="",          # Code to run during setup
    preload_models=True,          # Whether to load models into kernel
    hidden_model_loading=True,    # Hide model loading cells from notebook
)

# Get combined documentation from all libraries
docs = workspace.get_library_docs()

Session Types

Sandbox (Regular Notebook Mode)

Standard GPU sandbox with Jupyter notebook. Use this for most experiments.

from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig
from src.workspace import Workspace, Library
from src.execution import create_notebook_session
import json

config = SandboxConfig(
    execution_mode=ExecutionMode.NOTEBOOK,
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate"],
)

sandbox = Sandbox(config).start()

workspace = Workspace(
    libraries=[
        Library.from_file("path/to/helper.py"),
    ]
)

session = create_notebook_session(sandbox, workspace)

# Output connection info as JSON
print(json.dumps({
    "session_id": session.session_id,
    "jupyter_url": session.jupyter_url,
    "model_info": session.model_info_text,
}))

ScopedSandbox (RPC Interface Mode)

Isolated GPU sandbox with RPC interface. Use when you need to expose GPU functions to local code.

from src.environment import ScopedSandbox, SandboxConfig, ModelConfig
from src.workspace import Workspace
from src.execution import create_local_session

config = SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b")],
    python_packages=["torch", "transformers"],
)

scoped = ScopedSandbox(config)
scoped.start()

# Serve an interface file via RPC
interface_lib = scoped.serve(
    "path/to/interface.py",
    expose_as="library",  # or "mcp" for MCP server
    name="model_tools"
)

workspace = Workspace(libraries=[interface_lib])

session = create_local_session(
    workspace=workspace,
    workspace_dir="./workspace",
    name="experiment"
)

# Now the interface functions are available via RPC

ScopedSandbox Methods:

# Start with optional workspace (libraries the RPC code needs)
scoped.start(workspace=Workspace(libraries=[...]), name="my-sandbox")

# Serve code via RPC with different expose modes
lib = scoped.serve("interface.py", expose_as="library", name="tools")  # Returns Library
mcp = scoped.serve("interface.py", expose_as="mcp", name="tools")      # Returns MCP config dict
prompt = scoped.serve("interface.py", expose_as="prompt", name="tools") # Returns prompt string
skill = scoped.serve("interface.py", expose_as="skill", name="tools")   # Returns Skill object

# Debug RPC server issues
scoped.show_rpc_logs(lines=100)  # Print recent RPC server logs

How Models Work (CRITICAL!)

Models are PRE-LOADED into the notebook kernel namespace.

When you run create_notebook_session(), the library:

Downloads models to Modal volumes (cached for future runs)
Loads them into memory on the GPU
Injects them as Python variables: model, tokenizer (or custom var_name)
Returns session.model_info_text describing what's available

Model Information Text

The session.model_info_text contains critical information:

### Pre-loaded Models

The following models are already loaded in the kernel:

**model** (google/gemma-2-9b-it)
- Type: Gemma2ForCausalLM
- Device: cuda:0
- Parameters: 9.24B

**tokenizer** (google/gemma-2-9b-it)
- Type: GemmaTokenizerFast
- Vocab size: 256,000

**IMPORTANT:** Do NOT reload these models. They are already loaded and ready to use.

✅ Correct Usage

execute_code(session_id, """
import torch

# Models are ALREADY loaded - just use them!
print(f"Model device: {model.device}")
print(f"Model type: {type(model).__name__}")

# Use directly
inputs = tokenizer("Hello world", return_tensors="pt").to(model.device)
outputs = model(**inputs)
print(f"Logits shape: {outputs.logits.shape}")
""")

❌ WRONG - Don't Do This

# ❌ Don't load models manually!
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(...)  # WRONG!

# The models are already loaded. Just use `model` and `tokenizer` directly.

Multiple Models

When you load multiple models:

config = SandboxConfig(
    models=[
        ModelConfig(name="google/gemma-2-9b-it", var_name="gemma"),
        ModelConfig(name="meta-llama/Llama-2-7b-hf", var_name="llama"),
    ]
)

Both will be pre-loaded and available:

execute_code(session_id, """
# Use both models
gemma_output = gemma.generate(**gemma_inputs)
llama_output = llama.generate(**llama_inputs)
""")

Interface Files for RPC (Scoped Sandbox)

When using ScopedSandbox, you create an interface file that exposes functions via RPC. This lets you run functions on the GPU while your main code runs locally.

Basic Interface Structure

"""interface.py - Functions that run on GPU via RPC"""

from transformers import AutoModel, AutoTokenizer
import torch

# get_model_path() is injected by the RPC server
model_path = get_model_path("google/gemma-2-9b")
model = AutoModel.from_pretrained(model_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)


@expose  # @expose decorator is also injected by RPC server
def get_model_info() -> dict:
    """Get basic model information."""
    config = model.config
    return {
        "num_layers": config.num_hidden_layers,
        "hidden_size": config.hidden_size,
        "vocab_size": config.vocab_size,
        "device": str(model.device),
    }


@expose
def analyze_text(text: str) -> dict:
    """Analyze text using the model."""
    inputs = tokenizer(text, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model(**inputs, output_hidden_states=True)

    # Return simple types (dict, list, str, int, float, bool)
    return {
        "text": text,
        "num_tokens": len(inputs.input_ids[0]),
        "logits_shape": list(outputs.logits.shape),
    }

Key Points for Interfaces

get_model_path(model_name): Injected helper to get local model path
@expose: Decorator injected by RPC server to expose functions
Return simple types: dict, list, str, int, float, bool (no tensors!)
Load models at module level: They'll be loaded once when interface starts
Type hints: Recommended for clarity

Advanced Interface with State

"""interface.py - Stateful interface with conversation history"""

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = get_model_path("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# State (persists across calls)
conversation_history = []


@expose
def send_message(message: str, max_tokens: int = 512) -> dict:
    """Send message and get response."""
    global conversation_history

    # Add to history
    conversation_history.append({"role": "user", "content": message})

    # Format prompt
    prompt = tokenizer.apply_chat_template(
        conversation_history,
        tokenize=False,
        add_generation_prompt=True
    )

    # Generate
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        temperature=0.7,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract just the new response
    response = response[len(prompt):].strip()

    # Add to history
    conversation_history.append({"role": "assistant", "content": response})

    return {
        "response": response,
        "history_length": len(conversation_history),
    }


@expose
def reset_conversation() -> str:
    """Reset conversation history."""
    global conversation_history
    conversation_history = []
    return "Conversation reset"


@expose
def get_history() -> list:
    """Get full conversation history."""
    return conversation_history.copy()

Complex Interface Example (MCP-style)

For advanced use cases, you can create sophisticated interfaces with multiple functions and state:

"""conversation_interface.py - Full conversation management via RPC"""

from typing import Optional
import asyncio

# Import other local files (they're all in /root/ in the sandbox)
from target_agent import TargetAgent

_target: Optional[TargetAgent] = None

def get_target(model: str = "openai/gpt-4o-mini") -> TargetAgent:
    """Get or create singleton target."""
    global _target
    if _target is None:
        _target = TargetAgent(model=model)
    return _target


@expose
def initialize_target(system_message: str) -> str:
    """Initialize target with system prompt."""
    target = get_target()
    asyncio.run(target.initialize(system_message))
    return "Target initialized"


@expose
def send_to_target(message: str) -> dict:
    """Send message to target and get response."""
    target = get_target()
    response = asyncio.run(target.send_message(message))

    return {
        "type": response["type"],
        "content": response.get("content", ""),
        "tool_calls": response.get("tool_calls", []),
    }

See experiments/petri-style-harness/conversation_interface.py for a complete example.

Libraries and Workspaces

Creating Libraries

Libraries are Python files that get injected into the sandbox environment.

From file:

from src.workspace import Library

lib = Library.from_file("path/to/helper.py")

From code string:

code = '''
def my_helper(x):
    return x * 2
'''

lib = Library.from_code(code, name="helpers")

From RPC interface (ScopedSandbox only):

interface_lib = scoped.serve(
    "path/to/interface.py",
    expose_as="library",
    name="model_tools"
)

Using Workspaces

from src.workspace import Workspace, Library

workspace = Workspace(
    libraries=[
        Library.from_file("my_helpers.py"),  # Your custom helper files
    ]
)

session = create_notebook_session(sandbox, workspace)

Now these libraries are importable in the notebook:

execute_code(session_id, """
import my_helpers

# Use your library
result = my_helpers.analyze(model, tokenizer, "test input")
""")

Complete Workflow Examples

Example 1: Basic Interpretability Experiment

"""Basic steering vectors experiment on Gemma."""

import asyncio
from pathlib import Path
from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig
from src.workspace import Workspace, Library
from src.execution import create_notebook_session
import json

async def main():
    # Configure sandbox
    config = SandboxConfig(
        execution_mode=ExecutionMode.NOTEBOOK,
        gpu="A100",
        models=[ModelConfig(name="google/gemma-2-9b-it")],
        python_packages=["torch", "transformers", "accelerate", "matplotlib", "numpy"],
    )

    # Start sandbox (takes ~5min first time, <1min after)
    sandbox = Sandbox(config).start()

    # Create notebook session
    session = create_notebook_session(sandbox, Workspace())

    # Output connection info
    print(json.dumps({
        "session_id": session.session_id,
        "jupyter_url": session.jupyter_url,
        "model_info": session.model_info_text,
    }))

if __name__ == "__main__":
    asyncio.run(main())

After running this script, parse the JSON output and connect:

attach_to_session(
    session_id="<from output>",
    jupyter_url="<from output>"
)

Then execute experiments:

execute_code(session_id, """
import torch
from steering_hook import create_steering_hook

# Model is already loaded
print(f"Model: {type(model).__name__} on {model.device}")

# Extract steering vector from contrast pair
from extract_activations import get_layer_activations

positive_text = "I strongly agree with your perspective."
negative_text = "I disagree with your perspective."

pos_acts = get_layer_activations(model, tokenizer, positive_text, layer=20)
neg_acts = get_layer_activations(model, tokenizer, negative_text, layer=20)

steering_vector = pos_acts - neg_acts

# Test steering
test_prompt = "What do you think about this idea?"
inputs = tokenizer(test_prompt, return_tensors="pt").to(model.device)

# Generate with steering
with create_steering_hook(model, layer_idx=20, vector=steering_vector, strength=2.0):
    outputs = model.generate(**inputs, max_new_tokens=50)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Steered response: {response}")
""")

Example 2: Hidden Preference Investigation (PEFT)

"""Investigate hidden preferences in a fine-tuned model."""

import asyncio
from pathlib import Path
from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig
from src.workspace import Workspace, Library
from src.execution import create_notebook_session
import json

async def main():
    example_dir = Path(__file__).parent
    toolkit = example_dir.parent / "toolkit"

    # Configure with PEFT adapter (hidden from agent)
    config = SandboxConfig(
        gpu="A100",
        execution_mode=ExecutionMode.NOTEBOOK,
        models=[ModelConfig(
            name="user/gemma-adapter-secret-preference",
            base_model="google/gemma-2-9b-it",
            is_peft=True,
            hidden=True  # Agent won't know which adapter
        )],
        python_packages=["torch", "transformers", "accelerate", "datasets", "peft"],
        secrets=["HF_TOKEN"],
    )

    sandbox = Sandbox(config).start()

    workspace = Workspace(
        libraries=[
            Library.from_file(toolkit / "steering_hook.py"),
            Library.from_file(toolkit / "extract_activations.py"),
            Library.from_file(toolkit / "generate_response.py"),
        ]
    )

    session = create_notebook_session(sandbox, workspace)

    # Include research methodology
    task = (example_dir / "task.md").read_text()
    methodology = (toolkit / "research_methodology.md").read_text()

    print(json.dumps({
        "session_id": session.session_id,
        "jupyter_url": session.jupyter_url,
        "model_info": session.model_info_text,
        "task": task,
        "methodology": methodology,
    }))

if __name__ == "__main__":
    asyncio.run(main())

Example 3: Scoped Sandbox with RPC Interface

"""Expose GPU functions via RPC interface."""

import asyncio
from pathlib import Path
from src.environment import ScopedSandbox, SandboxConfig, ModelConfig
from src.workspace import Workspace
from src.execution import create_local_session
import json

async def main():
    example_dir = Path(__file__).parent

    # Create scoped sandbox with GPU
    config = SandboxConfig(
        gpu="A100",
        models=[ModelConfig(name="google/gemma-2-9b")],
        python_packages=["torch", "transformers", "anthropic", "openai"],
        secrets=["ANTHROPIC_API_KEY", "OPENAI_API_KEY"],
    )

    scoped = ScopedSandbox(config)
    scoped.start()

    # Serve interface as MCP server
    interface_lib = scoped.serve(
        str(example_dir / "conversation_interface.py"),
        expose_as="mcp",
        name="conversation_tools"
    )

    # Also upload supporting files
    # (they'll be in /root/ alongside interface.py)
    scoped.upload_file(str(example_dir / "target_agent.py"))
    scoped.upload_file(str(example_dir / "prompts.py"))

    # Create local session with MCP tools
    workspace = Workspace(libraries=[interface_lib])

    session = create_local_session(
        workspace=workspace,
        workspace_dir=str(example_dir / "workspace"),
        name="multi-agent-experiment"
    )

    print(json.dumps({
        "status": "ready",
        "interface": "conversation_tools (MCP)",
        "workspace_dir": str(example_dir / "workspace"),
    }))

    # Now you can use the MCP tools from your local code

if __name__ == "__main__":
    asyncio.run(main())

Example 4: Using External Repos

"""Experiment using code from an external repository."""

import asyncio
from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig, RepoConfig
from src.execution import create_notebook_session
import json

async def main():
    config = SandboxConfig(
        execution_mode=ExecutionMode.NOTEBOOK,
        gpu="A100",
        models=[ModelConfig(name="google/gemma-2-9b-it")],
        python_packages=["torch", "transformers", "accelerate"],
        system_packages=["git"],
        repos=[
            RepoConfig(
                url="anthropics/transformer-circuits",
                install=True  # Run pip install -e on the repo
            )
        ],
    )

    sandbox = Sandbox(config).start()
    session = create_notebook_session(sandbox, Workspace())

    print(json.dumps({
        "session_id": session.session_id,
        "jupyter_url": session.jupyter_url,
    }))

if __name__ == "__main__":
    asyncio.run(main())

Utility Code Examples

Common utility code patterns you can write directly in notebooks:

Steering Hook

# Activation steering via forward hook
import torch
from contextlib import contextmanager

@contextmanager
def steering_hook(model, layer_idx, vector, strength=1.0):
    """Add steering vector to residual stream at specified layer."""
    def hook(module, input, output):
        # output is (hidden_states, ...) tuple
        hidden = output[0]
        hidden[:, :, :] = hidden + strength * vector.to(hidden.device)
        return (hidden,) + output[1:]

    # Get the layer
    layer = model.model.layers[layer_idx]
    handle = layer.register_forward_hook(hook)
    try:
        yield
    finally:
        handle.remove()

# Usage
with steering_hook(model, layer_idx=20, vector=steering_vec, strength=2.0):
    outputs = model.generate(**inputs)

Extract Activations

# Get activations from a specific layer
def get_layer_activations(model, tokenizer, text, layer):
    """Extract residual stream activations from a layer."""
    activations = []

    def hook(module, input, output):
        activations.append(output[0].detach())

    handle = model.model.layers[layer].register_forward_hook(hook)

    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    with torch.no_grad():
        model(**inputs)

    handle.remove()

    # Return last token's activation
    return activations[0][0, -1, :]

Generate Response

# Simple generation helper
def generate_response(model, tokenizer, prompt, max_tokens=256, temperature=0.7):
    """Generate text from prompt."""
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=max_tokens,
        temperature=temperature,
        do_sample=temperature > 0,
        pad_token_id=tokenizer.eos_token_id,
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

Common Patterns

Pattern 1: Quick Single-Model Experiment

When: You want to quickly test something on a model.

config = SandboxConfig(
    execution_mode=ExecutionMode.NOTEBOOK,
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate"],
)
sandbox = Sandbox(config).start()
session = create_notebook_session(sandbox, Workspace())
# -> attach and experiment

Pattern 2: Steering Investigation

When: You want to investigate steering vectors.

config = SandboxConfig(
    gpu="A100",
    execution_mode=ExecutionMode.NOTEBOOK,
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate", "matplotlib"],
)
sandbox = Sandbox(config).start()
session = create_notebook_session(sandbox, Workspace())
# -> attach, write steering/activation code inline, test steering

Pattern 3: Hidden Adapter Investigation

When: You want to investigate a fine-tuned model without revealing which one.

config = SandboxConfig(
    gpu="A100",
    models=[ModelConfig(
        name="user/secret-adapter",
        base_model="google/gemma-2-9b-it",
        is_peft=True,
        hidden=True,
    )],
    python_packages=["torch", "transformers", "peft"],
    secrets=["HF_TOKEN"],
)

Pattern 4: RPC Interface Setup

When: You need local code to call GPU functions remotely.

scoped = ScopedSandbox(SandboxConfig(gpu="A100", models=[...]))
scoped.start()
interface_lib = scoped.serve("interface.py", expose_as="library", name="tools")
workspace = Workspace(libraries=[interface_lib])
session = create_local_session(workspace=workspace, workspace_dir="./work", name="exp")
# -> local code can import and call GPU functions via RPC

Pattern 5: Using Secrets

When: You need API keys or credentials.

config = SandboxConfig(
    gpu="A100",
    secrets=["HF_TOKEN", "OPENAI_API_KEY", "ANTHROPIC_API_KEY"],
)

Secrets are available as environment variables in the sandbox:

execute_code(session_id, """
import os
hf_token = os.environ.get("HF_TOKEN")
openai_key = os.environ.get("OPENAI_API_KEY")
""")

Troubleshooting

Modal Setup Required

Users need Modal configured:

modal token new

And secrets for HuggingFace:

modal secret create huggingface-secret HF_TOKEN=hf_...

First Run is Slow

First sandbox creation takes ~5 minutes because:

GPU provisioning (~2 min)
Model download (~3 min)

Subsequent runs are much faster (<1 min) because models are cached.

Model Loading Errors

Common issues:

Invalid model name: Verify exact HuggingFace ID
No HF token: Private models need secrets=["HF_TOKEN"]
OOM: Model too large for GPU (try smaller model or H100)

Connection Issues

Ensure jupyter_url is accessible from your network
Check session_id matches exactly
Verify Modal app is still running (apps time out after 1 hour by default)

Package Installation

If a package fails to install:

Check if it needs system dependencies (add to system_packages)
Try pinning version: "transformers==4.36.0"
Some packages require custom Docker config (advanced)

Best Practices

1. Start Small

Begin with minimal config, add complexity incrementally:

# Start here
config = SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate"],
)

# Add as needed
# python_packages += ["matplotlib", "pandas", "datasets"]
# libraries = [Library.from_file("steering_hook.py")]
# secrets = ["HF_TOKEN"]

2. Reuse Sessions

Don't create new sandboxes unnecessarily. Attach to existing sessions when possible.

3. Reuse Utility Code

Define helper functions once at the start of experiments. See the "Utility Code Examples" section for common patterns like steering hooks, activation extraction, and response generation.

4. Document as You Go

Use add_markdown() to document your experiments in the notebook:

add_markdown(session_id, """
## Hypothesis 1: Model has strong helpfulness steering vector

Testing by:
1. Extracting contrast pair activations
2. Computing difference vector
3. Applying with varying strengths
""")

5. Research Methodology

For investigative work, follow these principles:

Explore much more than exploit
Test falsifiable hypotheses
Pivot when signal is weak
Be actively skeptical of early results

6. Use Type Hints in Interfaces

Make interfaces clear with type hints:

@expose
def analyze_text(text: str, layer: int = 20) -> dict:
    """Analyze text at specific layer."""
    ...

Quick Reference

Essential Imports

from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig, RepoConfig
from src.environment import ScopedSandbox
from src.workspace import Workspace, Library
from src.execution import create_notebook_session, create_local_session
import json

Minimal Notebook Setup

config = SandboxConfig(
    execution_mode=ExecutionMode.NOTEBOOK,
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b-it")],
    python_packages=["torch", "transformers", "accelerate"],
)
sandbox = Sandbox(config).start()
session = create_notebook_session(sandbox, Workspace())
print(json.dumps({
    "session_id": session.session_id,
    "jupyter_url": session.jupyter_url,
}))

Minimal RPC Setup

scoped = ScopedSandbox(SandboxConfig(
    gpu="A100",
    models=[ModelConfig(name="google/gemma-2-9b")],
))
scoped.start()
interface = scoped.serve("interface.py", expose_as="library", name="tools")
workspace = Workspace(libraries=[interface])

Modal Sandbox MCP Tools

The modal-sandbox MCP server provides tools for managing sandboxes directly. Use these for debugging, monitoring, and cleanup.

IMPORTANT: These tools require a sandbox_id which is output by your setup script. Make sure your setup script outputs sandbox.sandbox_id in the JSON.

Available Tools

list_sandboxes(app_name?)

List all running sandboxes.

list_sandboxes()  # Lists sandboxes from default app
list_sandboxes(app_name="my-app")  # Filter by app name

get_sandbox_info(sandbox_id)

Get sandbox status and tunnel URLs.

get_sandbox_info(sandbox_id="sb-xxx...")
# Returns: {"sandbox_id": "...", "tunnels": {8888: "https://..."}, "status": "running"}

exec_in_sandbox(sandbox_id, command, timeout?)

Execute a shell command in the sandbox.

exec_in_sandbox(sandbox_id="sb-xxx", command="nvidia-smi")
exec_in_sandbox(sandbox_id="sb-xxx", command="ls -la /root", timeout=30)
# Returns: {"stdout": "...", "stderr": "...", "return_code": 0, "success": true}

exec_python_in_sandbox(sandbox_id, code, timeout?)

Execute Python code in the sandbox.

exec_python_in_sandbox(sandbox_id="sb-xxx", code="import torch; print(torch.cuda.is_available())")

get_gpu_status(sandbox_id)

Quick way to check GPU status (runs nvidia-smi).

get_gpu_status(sandbox_id="sb-xxx")

get_running_processes(sandbox_id)

List running processes in the sandbox.

get_running_processes(sandbox_id="sb-xxx")

terminate_sandbox(sandbox_id)

Terminate a running sandbox.

terminate_sandbox(sandbox_id="sb-xxx")
# Returns: {"success": true, "message": "Sandbox sb-xxx terminated successfully"}

snapshot_sandbox(sandbox_id, description?)

Create a filesystem snapshot for later restoration.

snapshot_sandbox(sandbox_id="sb-xxx", description="After training")
# Returns: {"success": true, "image_id": "im-yyy..."}

read_sandbox_file(sandbox_id, path)

Read a file from the sandbox.

read_sandbox_file(sandbox_id="sb-xxx", path="/root/output.txt")

write_sandbox_file(sandbox_id, path, content)

Write a file to the sandbox.

write_sandbox_file(sandbox_id="sb-xxx", path="/root/script.py", content="print('hello')")

list_sandbox_files(sandbox_id, path?)

List files in a directory.

list_sandbox_files(sandbox_id="sb-xxx", path="/root")

Debugging Workflow

When something goes wrong:

Check sandbox is running:
```
get_sandbox_info(sandbox_id="sb-xxx")
```
Check GPU status:
```
get_gpu_status(sandbox_id="sb-xxx")
```

Check processes:

get_running_processes(sandbox_id="sb-xxx")

Run diagnostic commands:

exec_in_sandbox(sandbox_id="sb-xxx", command="df -h")  # Disk space
exec_in_sandbox(sandbox_id="sb-xxx", command="free -h")  # Memory
exec_in_sandbox(sandbox_id="sb-xxx", command="cat /var/log/jupyter.log")  # Logs

Clean up when done:
```
terminate_sandbox(sandbox_id="sb-xxx")
```

Tips for Success

Read model_info_text: Always show the user what models are pre-loaded
Don't reload models: They're already loaded, just use them
Use workspaces: Organize libraries for reusability
Test interfaces locally first: Debug RPC interfaces before using in experiments
Monitor GPU usage: Check if model fits in memory before starting
Clean up: Terminate sandboxes when done to avoid costs
Version models: Pin model revisions for reproducibility
Check outputs: Always verify execution results before proceeding
Save sandbox_id: Always output sandbox.sandbox_id from setup scripts for management

This skill enables powerful interpretability research workflows. The key is understanding the two modes:

Sandbox: Use for Jupyter notebook experiments (most common)
ScopedSandbox: Use when you need to expose GPU functions via RPC to local code