sense - Diagrammatic Video Extraction with Subtitle Alignment

Trit: 0 (ERGODIC - Coordinator)

Extract structured knowledge from video lectures via subtitle parsing, diagram/equation OCR, and GF(3)-balanced skill mapping.

Overview

sense transforms video lectures into indexed, queryable knowledge:

┌─────────────────────────────────────────────────────────────────┐
│                         VIDEO INPUT                              │
│  • Lecture recording (.mkv, .mp4)                               │
│  • Subtitles (.vtt, .srt, auto-generated)                       │
│  • Slides/diagrams (extracted frames)                           │
└──────────────────────────┬──────────────────────────────────────┘
                           │
           ┌───────────────┼───────────────┐
           │               │               │
    ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
    │  SUBTITLE   │ │  DIAGRAM    │ │   SKILL     │
    │  PARSER     │ │  EXTRACTOR  │ │   MAPPER    │
    │  (-1 BLUE)  │ │  (0 GREEN)  │ │  (+1 RED)   │
    └──────┬──────┘ └──────┬──────┘ └──────┬──────┘
           │               │               │
           │         Mathpix OCR           │
           │         frame → LaTeX         │
           │                               │
    ┌──────▼───────────────▼───────────────▼──────┐
    │              DuckDB INDEX                    │
    │  • Timestamped transcript                    │
    │  • Extracted equations (LaTeX)               │
    │  • Skill mappings with GF(3) trits           │
    │  • Queryable views                           │
    └──────────────────────────────────────────────┘

Triadic Structure

Role	Component	Trit	Function
Validator	Subtitle Parser	-1	Parse VTT/SRT, segment by timestamp
Coordinator	Diagram Extractor	0	OCR frames → LaTeX via Mathpix
Generator	Skill Mapper	+1	Assign skills with GF(3) balance

GF(3) Conservation: (-1) + (0) + (+1) = 0 ✓

Components

1. Subtitle Parser (-1)

Parses WebVTT or SRT subtitle files into structured segments:

require 'webvtt'

class SubtitleParser
  def initialize(vtt_path)
    @vtt = WebVTT.read(vtt_path)
  end
  
  def segments
    @vtt.cues.map do |cue|
      {
        start: cue.start.total_seconds,
        end: cue.end.total_seconds,
        text: cue.text.gsub(/<[^>]*>/, '').strip,
        duration: cue.end.total_seconds - cue.start.total_seconds
      }
    end
  end
  
  def by_slide(slide_timestamps)
    # Group subtitles by slide boundaries
    slide_timestamps.map.with_index do |ts, i|
      next_ts = slide_timestamps[i + 1] || Float::INFINITY
      {
        slide: i,
        timestamp: ts,
        text: segments.select { |s| s[:start] >= ts && s[:start] < next_ts }
                      .map { |s| s[:text] }.join(' ')
      }
    end
  end
end

2. Diagram Extractor (0)

Extracts frames at key timestamps and OCRs equations/diagrams:

require 'mathpix'

class DiagramExtractor
  MATHPIX_APP_ID = ENV['MATHPIX_APP_ID']
  MATHPIX_APP_KEY = ENV['MATHPIX_APP_KEY']
  
  def initialize(video_path)
    @video = video_path
  end
  
  def extract_frame(timestamp, output_path)
    # Use ffmpeg to extract frame
    system("ffmpeg -y -ss #{timestamp} -i '#{@video}' -vframes 1 -q:v 2 '#{output_path}'")
    output_path
  end
  
  def ocr_frame(image_path)
    # Send to Mathpix for LaTeX extraction
    response = Mathpix.process(
      src: "data:image/png;base64,#{Base64.encode64(File.read(image_path))}",
      formats: ['latex_styled', 'text'],
      data_options: { include_asciimath: true }
    )
    
    {
      latex: response['latex_styled'],
      text: response['text'],
      confidence: response['confidence'],
      has_diagram: response['is_printed'] || response['is_handwritten']
    }
  end
  
  def extract_all(timestamps)
    timestamps.map.with_index do |ts, i|
      frame_path = "/tmp/frame_#{i}_#{ts.to_i}.png"
      extract_frame(ts, frame_path)
      result = ocr_frame(frame_path)
      result.merge(timestamp: ts, slide_num: i)
    end
  end
end

3. Skill Mapper (+1)

Maps extracted content to skills with GF(3) conservation:

class SkillMapper
  SKILL_KEYWORDS = {
    'acsets' => %w[acset c-set schema functor category],
    'sheaf-cohomology' => %w[sheaf cohomology local global section],
    'structured-decomp' => %w[tree decomposition treewidth bag],
    'kan-extensions' => %w[kan extension adjoint limit colimit],
    'polynomial' => %w[polynomial poly interface arena],
    'temporal-coalgebra' => %w[temporal time varying dynamic coalgebra],
    'operad-compose' => %w[operad wiring diagram composition],
  }
  
  SKILL_TRITS = {
    'acsets' => 0, 'sheaf-cohomology' => -1, 'structured-decomp' => -1,
    'kan-extensions' => 0, 'polynomial' => 0, 'temporal-coalgebra' => -1,
    'operad-compose' => +1, 'oapply-colimit' => +1, 'gay-mcp' => +1,
  }
  
  def map_content(text, latex)
    combined = "#{text} #{latex}".downcase
    
    skills = SKILL_KEYWORDS.select do |skill, keywords|
      keywords.any? { |kw| combined.include?(kw) }
    end.keys
    
    # Ensure GF(3) balance
    balance_skills(skills)
  end
  
  def balance_skills(skills)
    trit_sum = skills.sum { |s| SKILL_TRITS[s] || 0 }
    
    # Add balancing skills if needed
    case trit_sum % 3
    when 1  # Need -1
      skills << 'sheaf-cohomology' unless skills.include?('sheaf-cohomology')
    when 2  # Need +1  (equivalent to -1 mod 3)
      skills << 'operad-compose' unless skills.include?('operad-compose')
    end
    
    skills
  end
end

Complete Pipeline

class Sense
  def initialize(video_path, vtt_path, output_db: 'tensor_skill_paper.duckdb')
    @video = video_path
    @vtt = vtt_path
    @db_path = output_db
    @content_id = File.basename(video_path, '.*')
    
    @subtitle_parser = SubtitleParser.new(vtt_path)
    @diagram_extractor = DiagramExtractor.new(video_path)
    @skill_mapper = SkillMapper.new
  end
  
  def process!
    # 1. Parse subtitles
    segments = @subtitle_parser.segments
    
    # 2. Detect slide transitions (silence gaps or visual changes)
    slide_timestamps = detect_slides(segments)
    
    # 3. Extract and OCR key frames
    diagrams = @diagram_extractor.extract_all(slide_timestamps)
    
    # 4. Map skills with GF(3) balance
    indexed = diagrams.map do |d|
      subtitle_text = @subtitle_parser.by_slide(slide_timestamps)[d[:slide_num]][:text]
      skills = @skill_mapper.map_content(subtitle_text, d[:latex] || '')
      
      d.merge(
        subtitle_text: subtitle_text,
        skills: skills,
        trit: skills.sum { |s| SkillMapper::SKILL_TRITS[s] || 0 } % 3
      )
    end
    
    # 5. Store in DuckDB
    store_index(indexed)
    
    # 6. Create views
    create_views
    
    indexed
  end
  
  private
  
  def detect_slides(segments)
    # Simple: gap > 2s indicates slide change
    timestamps = [0.0]
    segments.each_cons(2) do |a, b|
      if b[:start] - a[:end] > 2.0
        timestamps << b[:start]
      end
    end
    timestamps
  end
  
  def store_index(indexed)
    conn = DuckDB::Database.open(@db_path).connect
    
    conn.execute("DROP TABLE IF EXISTS #{@content_id}_sense_index")
    conn.execute(<<~SQL)
      CREATE TABLE #{@content_id}_sense_index (
        slide_num INTEGER,
        timestamp FLOAT,
        latex VARCHAR,
        has_diagram BOOLEAN,
        subtitle_text TEXT,
        skills TEXT,
        trit INTEGER,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
      )
    SQL
    
    indexed.each do |row|
      conn.execute(<<~SQL, [
        row[:slide_num], row[:timestamp], row[:latex],
        row[:has_diagram], row[:subtitle_text],
        row[:skills].to_json, row[:trit]
      ])
        INSERT INTO #{@content_id}_sense_index VALUES (?, ?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
      SQL
    end
    
    conn.close
  end
  
  def create_views
    conn = DuckDB::Database.open(@db_path).connect
    
    conn.execute(<<~SQL)
      CREATE OR REPLACE VIEW v_#{@content_id}_timeline AS
      SELECT 
        slide_num,
        printf('%02d:%05.2f', CAST(timestamp/60 AS INT), timestamp % 60) as timecode,
        CASE WHEN has_diagram THEN '📊' ELSE '' END ||
        CASE WHEN latex != '' AND latex IS NOT NULL THEN '📐' ELSE '' END as content,
        trit,
        skills
      FROM #{@content_id}_sense_index
      ORDER BY timestamp
    SQL
    
    conn.close
  end
end

Usage

Ruby

require_relative 'lib/sense'

# Process a video lecture
sense = Sense.new(
  'reference/videos/bumpus_ct2021.mkv',
  'reference/videos/bumpus_ct2021.en.vtt'
)
indexed = sense.process!

puts "Indexed #{indexed.size} slides"

Command Line

# Extract subtitles from video (if not available)
uvx yt-dlp --write-auto-sub --sub-lang en --skip-download \
  -o 'reference/videos/%(id)s' 'https://youtube.com/watch?v=VIDEO_ID'

# Run sense extraction
just sense-extract reference/videos/bumpus_ct2021.mkv

# Query the index
just sense-timeline bumpus_ct2021
just sense-skills bumpus_ct2021 acsets

Python Alternative

#!/usr/bin/env python3
"""sense.py - Python implementation of diagrammatic video extraction"""

import duckdb
import webvtt
import subprocess
import json
from pathlib import Path

class Sense:
    def __init__(self, video_path: str, vtt_path: str, db_path: str = "tensor_skill_paper.duckdb"):
        self.video = Path(video_path)
        self.vtt = Path(vtt_path)
        self.db_path = db_path
        self.content_id = self.video.stem
    
    def parse_subtitles(self):
        """Parse VTT file into segments"""
        captions = webvtt.read(str(self.vtt))
        return [
            {
                'start': self._time_to_seconds(c.start),
                'end': self._time_to_seconds(c.end),
                'text': c.text.strip()
            }
            for c in captions
        ]
    
    def extract_frame(self, timestamp: float, output_path: str):
        """Extract single frame at timestamp"""
        subprocess.run([
            'ffmpeg', '-y', '-ss', str(timestamp),
            '-i', str(self.video), '-vframes', '1',
            '-q:v', '2', output_path
        ], capture_output=True)
        return output_path
    
    def ocr_frame_mathpix(self, image_path: str):
        """OCR frame using mathpix-gem"""
        # Shell out to Ruby mathpix-gem
        result = subprocess.run([
            'ruby', '-rmathpix', '-e',
            f"puts Mathpix.process_image('{image_path}').to_json"
        ], capture_output=True, text=True)
        
        if result.returncode == 0:
            return json.loads(result.stdout)
        return {'latex': '', 'text': '', 'has_diagram': False}
    
    def _time_to_seconds(self, time_str: str) -> float:
        """Convert HH:MM:SS.mmm to seconds"""
        parts = time_str.split(':')
        return int(parts[0]) * 3600 + int(parts[1]) * 60 + float(parts[2])

Justfile Commands

# Extract and index a video lecture
sense-extract video:
    @echo "👁️ SENSE: Extracting {{video}}"
    ruby -I lib -r sense -e "Sense.new('{{video}}', '{{video}}'.sub('.mkv', '.en.vtt')).process!"

# Download subtitles for a YouTube video
sense-subtitles url output:
    uvx yt-dlp --write-auto-sub --sub-lang en --skip-download -o '{{output}}' '{{url}}'

# Show timeline for indexed content
sense-timeline content_id:
    @source .venv/bin/activate && duckdb tensor_skill_paper.duckdb \
        "SELECT * FROM v_{{content_id}}_timeline"

# Find slides mentioning a skill
sense-skills content_id skill:
    @source .venv/bin/activate && duckdb tensor_skill_paper.duckdb \
        "SELECT slide_num, timecode, skills FROM v_{{content_id}}_timeline WHERE skills LIKE '%{{skill}}%'"

# Extract frame at timestamp
sense-frame video timestamp:
    flox activate -- ffmpeg -y -ss {{timestamp}} -i '{{video}}' -vframes 1 -q:v 2 /tmp/sense_frame.png
    @echo "✓ Frame extracted to /tmp/sense_frame.png"

# OCR a frame with Mathpix
sense-ocr image:
    ruby -rmathpix -e "puts Mathpix.process_image('{{image}}').to_json" | jq .

# Full pipeline: download, extract, index
sense-full url content_id:
    @echo "📥 Downloading video and subtitles..."
    uvx yt-dlp -o 'reference/videos/{{content_id}}.mkv' '{{url}}'
    uvx yt-dlp --write-auto-sub --sub-lang en --skip-download -o 'reference/videos/{{content_id}}' '{{url}}'
    @echo "👁️ Running sense extraction..."
    just sense-extract 'reference/videos/{{content_id}}.mkv'

GF(3) Conservation

The skill ensures every indexed slide has a balanced trit sum:

-- Verify GF(3) balance
SELECT 
    content_id,
    SUM(trit) as total_trit,
    SUM(trit) % 3 as gf3,
    CASE WHEN SUM(trit) % 3 = 0 THEN '✓' ELSE '✗' END as balanced
FROM sense_index
GROUP BY content_id;

Integration with Galois Infrastructure

After sense extracts content, register it in the Galois connection:

-- Update content_registry
UPDATE content_registry 
SET indexed = TRUE, 
    index_table = 'bumpus_ct2021_sense_index'
WHERE content_id = 'bumpus_ct2021';

-- Content now flows through Galois lattice
SELECT * FROM v_galois_content_to_skills WHERE content_id = 'bumpus_ct2021';

Dependencies

# Ruby gems
gems:
  - webvtt-ruby      # VTT parsing
  - mathpix          # Mathpix OCR API
  - duckdb           # Database storage

# System tools
tools:
  - ffmpeg           # Frame extraction
  - yt-dlp           # Video/subtitle download

# Environment variables
env:
  MATHPIX_APP_ID: "your-app-id"
  MATHPIX_APP_KEY: "your-app-key"

Triads Using sense

# sense as coordinator in extraction triads:
subtitle-parser (-1) ⊗ sense (0) ⊗ skill-mapper (+1) = 0 ✓

# Combined with other skills:
sheaf-cohomology (-1) ⊗ sense (0) ⊗ gay-mcp (+1) = 0 ✓  [Colored diagrams]
temporal-coalgebra (-1) ⊗ sense (0) ⊗ koopman-generator (+1) = 0 ✓  [Dynamics]
persistent-homology (-1) ⊗ sense (0) ⊗ topos-generate (+1) = 0 ✓  [Topology]

Tsao Visual Hierarchy Integration

Sense is maximally informed by Doris Tsao's visual neuroscience. See DORIS_TSAO_VISUAL_NEUROSCIENCE_BRIDGE.md.

Tsao Hierarchy → Sense Components

Tsao Level	Visual Region	Sense Component	Function
Level 0	V1 simple cells	Subtitle Parser (-1)	Edge detection, timestamp boundaries
Level 1	V2/V4 complex	Diagram Extractor (0)	Feature integration, OCR
Level 2	IT face patches	Skill Mapper (+1)	Pattern recognition, skill assignment
Level 3	Prefrontal	GF(3) Balancer	Behavioral goal, conservation

Self-Avoiding Walks via Self-Coloring

From chromatic-walk insight: SAWs don't intersect by definition, but in an effective topos we verify through self-coloring:

def saw_verified_by_self_coloring(walk: list) -> bool:
    """
    In effective topos, self-intersection is decidable.
    
    The reafference equation:
      Generate(seed, i) = Observe(seed, i) ⟺ self ≡ self
    
    If walk revisits (seed, index), it generates the SAME color
    at two walk positions — contradiction detected.
    """
    colors = [Gay.color_at(step.seed, step.index) for step in walk]
    return len(colors) == len(set(colors))  # No repeated colors ⟺ SAW

Connection to Frontier Lab Circuits

Sense extraction parallels mechanistic interpretability:

Sense	Circuits Research	Tsao
Subtitle segments	Attention heads	V1 edges
Diagram features	Activation patterns	V2 shapes
Skill mapping	Circuit identification	IT patches
GF(3) balance	Superposition control	Prefrontal

See: FRONTIER_LAB_CIRCUITS_INTERACTOME.md

Chang-Tsao 50D Face Space → Skill Space

Face Space (Tsao):
  25 shape axes + 25 appearance axes = 50D
  Each neuron encodes ONE axis
  Population decodes via linear combination

Skill Space (Sense):
  N skills with trit assignments (-1, 0, +1)
  Each slide maps to skill subset
  GF(3) conservation ensures balance

Phenomenal Topology

Sense extraction states map to QRI's Symmetry Theory of Valence:

State	Visual Cortex	Sense Extraction	GF(3)
Smooth	All levels coherent	Clean skill mapping	= 0
Defect	Prediction error	Ambiguous slide	≠ 0
Vortex	High entropy	Multiple skill conflicts	≫ 0

Rebalancing

def rebalance_defect(slide_skills: list, target_gf3: int = 0) -> list:
    """Restore GF(3) = 0 by adding compensating skills."""
    current_sum = sum(SKILL_TRITS[s] for s in slide_skills)
    deficit = (target_gf3 - current_sum) % 3
    
    if deficit == 1:
        slide_skills.append('sheaf-cohomology')  # -1
    elif deficit == 2:
        slide_skills.append('operad-compose')    # +1
    
    return slide_skills

Skill Name: sense
Trit: 0 (ERGODIC - Coordinator)
Tsao Integration: V1→V2→IT→Prefrontal hierarchy
SAW Verification: Effective topos self-coloring

sense - Diagrammatic Video Extraction with Subtitle Alignment

sense - Diagrammatic Video Extraction with Subtitle Alignment

Overview

Triadic Structure

Components

1. Subtitle Parser (-1)

2. Diagram Extractor (0)

3. Skill Mapper (+1)

Complete Pipeline

Usage

Ruby

Command Line

Python Alternative

Justfile Commands

GF(3) Conservation

Integration with Galois Infrastructure

Dependencies

Triads Using sense

See Also

Tsao Visual Hierarchy Integration

Tsao Hierarchy → Sense Components

Self-Avoiding Walks via Self-Coloring

Connection to Frontier Lab Circuits

Chang-Tsao 50D Face Space → Skill Space

Phenomenal Topology

Rebalancing

Similar Skills