Schmidhuber's curiosity-driven learning: Intrinsic motivation via compression progress. Seek states that improve world model.
/plugin marketplace add plurigrid/asi/plugin install plurigrid-asi-skills@plurigrid/asiThis skill inherits all available tools. When active, it can use any tool Claude has access to.
"Curiosity is the desire to observe data that improves the observer's world model." — Jürgen Schmidhuber
Curiosity-driven learning provides intrinsic motivation:
Compression Progress = how much better we compress after seeing data.
Curiosity Reward = L(t-1) - L(t)
Where:
L(t) = Description length of history at time t
L(t-1) = Description length before update
Positive reward = "I learned something compressible!"
Negative/zero = "This is noise or already known"
class CuriosityDrivenAgent:
"""
Agent that seeks compression progress.
"""
def __init__(self, world_model: nn.Module, compressor: nn.Module):
self.world_model = world_model
self.compressor = compressor
def compression_progress(self, observation: Tensor) -> float:
"""
Curiosity = improvement in compression ability.
"""
# Compress before learning
with torch.no_grad():
len_before = self.compressor.description_length(observation)
# Update world model with observation
loss = self.world_model.update(observation)
# Compress after learning
with torch.no_grad():
len_after = self.compressor.description_length(observation)
# Progress = reduction in description length
return len_before - len_after
def intrinsic_reward(self, obs: Tensor) -> float:
"""
Intrinsic reward for RL agent.
"""
return self.compression_progress(obs)
def explore(self) -> Action:
"""
Seek states that maximize expected compression progress.
This is NOT the same as seeking novel states!
- Novel but random → no compression progress
- Learnable patterns → high compression progress
"""
best_action = None
best_expected_progress = -float('inf')
for action in self.action_space:
# Predict resulting state
predicted_obs = self.world_model.predict(self.state, action)
# Estimate learnability (how much would we learn?)
expected_progress = self.estimate_learnability(predicted_obs)
if expected_progress > best_expected_progress:
best_action = action
best_expected_progress = expected_progress
return best_action
def estimate_learnability(self, obs: Tensor) -> float:
"""
Predict how much we'd learn from this observation.
High for: novel patterns, surprising regularities
Low for: random noise, already-known patterns
"""
# Use meta-learning: "how learnable is this?"
return self.meta_model.predict_learnability(obs)
| Method | Reward Signal | Schmidhuber's View |
|---|---|---|
| ICM (Pathak) | Prediction error | Noise-sensitive |
| RND | Novelty | Doesn't distinguish learnable from random |
| Compression Progress | Learning improvement | True curiosity |
yoneda-directed (-1) ⊗ cognitive-superposition (0) ⊗ curiosity-driven (+1) = 0 ✓
persistent-homology (-1) ⊗ self-evolving-agent (0) ⊗ curiosity-driven (+1) = 0 ✓
three-match (-1) ⊗ unworld (0) ⊗ curiosity-driven (+1) = 0 ✓
module CuriosityDriven
def self.compute_curiosity(content, world_model_before, world_model_after)
# Description length before and after
len_before = description_length(content, world_model_before)
len_after = description_length(content, world_model_after)
progress = len_before - len_after
{
content: content,
compression_progress: progress,
curious: progress > 0,
trit: 1 # Generator (creates new understanding)
}
end
end