Skill

langfuse

Langfuse LLM observability operator, CLI guide, and RAG evaluation analyst. Use when the user asks to inspect traces, observations, sessions, prompts, datasets, scores, costs, latency, exceptions, agent tracing, OpenTelemetry integration, Langfuse CLI usage, or RAG evaluation results. Default to the official langfuse-cli through scripts/lf_cli.py for generic API discovery and operations; use bundled Python scripts for curated trace trees, reports, prompt extraction, chat export, and existing RAG-result interpretation. Do not run legacy FlowWise RAG execution scripts until their project-local config dependencies are ported.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/langfuse:langfuse

User invocable

Model invocable

Forked subagent

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Operator, guide, and RAG evaluation analyst for Langfuse projects. Four modes:

Supporting Files

SKILL.md

404 lines · ~6k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitJun 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Langfuse Analyst

Operator, guide, and RAG evaluation analyst for Langfuse projects. Four modes:

CLI Operator — Use the official langfuse-cli via scripts/lf_cli.py for current API coverage.
Curated Operator — Use bundled Python scripts for stable, packaged views and reports.
Guide — Answer conceptual, setup, instrumentation, and tracing questions using references/.
RAG Analyst — Interpret existing evaluation results, diagnose content gaps, compare runs.

Operating Principles

Docs and CLI first. For generic Langfuse API work, use scripts/lf_cli.py; it delegates to the official langfuse-cli, which tracks Langfuse's OpenAPI surface.
Curated scripts where useful. Use bundled scripts when they provide a higher-level workflow than raw API calls: trace trees, chat extraction, reports, prompt lifecycle, schema shortcuts.
No secrets in chat. Credentials stay in ~/.skills/langfuse/credentials.json; setup runs outside the model conversation.
No .env runtime reads. Runtime scripts must use the local credential loader or explicit non-secret project config. Project-local .env imports are blocked.
Separate app tracing from coding-session tracing. Instrument the application or agent runtime with the Langfuse SDK/OpenTelemetry; only use Claude/Codex host tracing when an explicit host-specific integration exists.

Setup Validation

Before first operation, verify connectivity:

python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" --dry-run api __schema
python "${CLAUDE_SKILL_DIR}/scripts/lf_client.py" --action health
python "${CLAUDE_SKILL_DIR}/scripts/lf_client.py" --action list-profiles

Credentials

Credentials live at ~/.skills/langfuse/credentials.json (mode 600, in a mode-700 directory), structured as named profiles. This follows R50 v2.0.0 of the credential storage convention. There is no .env discovery and no --env flag.

First-run setup runs OUTSIDE the model conversation (secrets never enter the transcript):

python3 "${CLAUDE_SKILL_DIR}/scripts/setup_credentials.py" [PROFILE]

The setup script prompts for keys via getpass and writes atomically with mode 600. It does not discover, parse, or migrate project .env files.

File shape — ~/.skills/langfuse/credentials.json:

{
  "default":   { "secret_key": "sk-lf-...", "public_key": "pk-lf-...", "host": "https://cloud.langfuse.com", "project_name": "Default" },
  "iurfriend": { "secret_key": "sk-lf-...", "public_key": "pk-lf-...", "host": "https://cloud.langfuse.com", "project_name": "iurFriend" },
  "simulator": { "secret_key": "sk-lf-...", "public_key": "pk-lf-...", "host": "https://cloud.langfuse.com", "project_name": "Simulator" }
}

Profile selection — five-tier precedence (tier 4 skipped, langfuse is not host-aware):

--profile NAME flag on any script
LANGFUSE_PROFILE=NAME environment variable
Per-project selector file <project>/.skills/langfuse.profile (one-line, safe to commit) — loader walks up from cwd, stops at the first project-root marker (.git, pyproject.toml, package.json, Cargo.toml, MANIFEST.md); never above $HOME
(skipped — host-awareness is opt-in via MANIFEST and langfuse does not declare it)
default profile

Per-key env-var overrides stack on top of the resolved profile (useful in CI / emergency rotations):

LANGFUSE_SECRET_KEY=sk-...
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_BASE_URL=https://...
LANGFUSE_HOST=https://...        # legacy alias; LANGFUSE_BASE_URL wins if both are set
LANGFUSE_PROJECT_NAME=...

All scripts accept --profile <name>. No script reads .env files at runtime. The local loader lives at utilities/credentials.py (this plugin only — it does NOT import from any sibling plugin per the standalone-skill principle).

Decision Tree

User wants to DO something generic/current → CLI Operator mode

Use scripts/lf_cli.py first when the user asks for a Langfuse API operation that is not already a curated workflow below.

python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" api __schema
python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" api <resource> --help
python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" --profile simulator api <resource> <action> [flags]

The wrapper resolves the selected profile, exports LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, and LANGFUSE_BASE_URL to the subprocess, and never prints credential values. Prefer --dry-run before first use in a project:

python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" --profile simulator --dry-run api __schema

User wants packaged views/reports → Curated Operator mode

Intent	Script	Key actions
Use current official API	`lf_cli.py`	Delegates to `langfuse-cli api ...`
Explore traces	`lf_traces.py`	`list`, `get`, `tree`, `search`, `stats`
Inspect observations	`lf_observations.py`	`list`, `get` (supports `--type GENERATION\|SPAN\|EVENT`)
Browse sessions	`lf_sessions.py`	`list`, `get`, `users`, `timeline`
Manage datasets	`lf_datasets.py`	`list`, `create`, `items`, `add-item`, `add-items-bulk`, `export`, `runs`
Manage prompts	`lf_prompts.py`	`list`, `get`, `create`, `from-file`, `from-trace`, `update-labels`, `diff`, `history`
Extract chat logs	`lf_extract_chat.py`	`session`, `trace`, `batch` → JSONL + Markdown
Manage scores	`lf_scores.py`	`list`, `create`, `bulk-create`, `delete`, `analyze`, `export`
Run reports	`lf_report.py`	`overview`, `cost`, `latency`, `quality`, `full`
Run RAG evaluations	quarantined pending port	FlowWise RAG execution scripts are project-specific until ported off `core.config` / `.env`; analyze existing result files instead
Find exceptions	`lf_exceptions.py`	`find`, `file`, `details`, `count`
Look up schema	`lf_schema.py`	`list`, `show`, `fields`, `hierarchy`, `endpoints`
Raw API call fallback	`lf_api.py`	Any `METHOD /path --params '{}' --body '{}'`; use only when CLI is unavailable

User wants to KNOW something → Guide mode

Question type	Reference file
"What is a trace/span/generation?"	`references/concepts.md`
"How do evaluations work?"	`references/evaluators.md`
"How to integrate with Python/JS SDK?"	`references/integrations.md`
"What API endpoints exist?"	`references/api_endpoints.md`
Setup, credentials, troubleshooting	`references/setup.md`
Official CLI usage and API discovery	`references/cli.md`
"How should development projects or agents trace to Langfuse?"	`references/agent-tracing.md`
Error/exception triage	`references/error-analysis.md`
Evaluator and judge calibration	`references/judge-calibration.md`
SDK v4 / OpenTelemetry upgrade planning	`references/sdk-upgrade.md`
User feedback ingestion	`references/user-feedback.md`
Full parameter reference for all tools	`references/tool_reference.md`
"How to migrate prompts to LangFuse?"	`references/prompt_migration.md`
"Full FlowWise prompt lifecycle (capture → promote)?"	FlowWise Prompt Lifecycle section below
"Why are traces nesting unexpectedly?"	`references/trace-nesting-validation-links.md`
"How to interpret RAG evaluation scores?"	`references/rag-eval-interpretation.md`
"What is needed to re-enable FlowWise RAG runners?"	`references/rag-flowwise-port-plan.md`
Object fields and relationships	`python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" show <type>`

User wants to ANALYZE RAG evaluation results → RAG Analyst mode

When the user asks to analyze RAG results, interpret evaluation scores, diagnose content gaps, or compare experiment runs, read references/rag-eval-interpretation.md and follow its structured analysis protocol. The guide covers score semantics, diagnostic patterns, category analysis, run comparison, root cause reasoning, and action recommendations.

Quick workflow:

Gather results — read data/rag-eval/results/<run-name>.json for local results, or use lf_datasets.py --profile simulator runs rag-eval-baseline-v01 to list LangFuse runs
Gather metadata — read data/rag-eval/rag-eval-baseline-v01.json for item category/variant metadata, join by question text
Analyze — follow the interpretation guide: score overview → category breakdown → pattern detection → root cause → recommendations
Compare (if two runs) — load both result files, compute deltas per category, flag regressions >0.15

User wants to SWITCH project

python "${CLAUDE_SKILL_DIR}/scripts/lf_client.py" --action list-profiles
# Then either:
#   - Pass --profile NAME on each command, OR
#   - Set LANGFUSE_PROFILE=NAME in the session env, OR
#   - Drop a one-line ".skills/langfuse.profile" at the project root (auto-routes per-project).

Common Workflows

Trace Exploration

python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" api __schema                         # discover current API surface
python "${CLAUDE_SKILL_DIR}/scripts/lf_traces.py" list --limit 20 --age 60        # last hour
python "${CLAUDE_SKILL_DIR}/scripts/lf_traces.py" list --name "agent-run" --user "user-123"
python "${CLAUDE_SKILL_DIR}/scripts/lf_traces.py" get TRACE_ID --compact           # truncated output
python "${CLAUDE_SKILL_DIR}/scripts/lf_traces.py" tree TRACE_ID                    # observation hierarchy
python "${CLAUDE_SKILL_DIR}/scripts/lf_traces.py" search "error" --age 1440        # text search

Observation Inspection

python "${CLAUDE_SKILL_DIR}/scripts/lf_observations.py" list --type GENERATION --age 60
python "${CLAUDE_SKILL_DIR}/scripts/lf_observations.py" list --trace-id TRACE_ID
python "${CLAUDE_SKILL_DIR}/scripts/lf_observations.py" get OBS_ID --compact

Session Analysis

python "${CLAUDE_SKILL_DIR}/scripts/lf_sessions.py" list --age 1440
python "${CLAUDE_SKILL_DIR}/scripts/lf_sessions.py" get SESSION_ID                 # traces in session
python "${CLAUDE_SKILL_DIR}/scripts/lf_sessions.py" timeline SESSION_ID            # chronological view
python "${CLAUDE_SKILL_DIR}/scripts/lf_sessions.py" users --age 1440               # group by user

Dataset Management

python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" create --name "eval-v1" --description "Core eval set"
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" add-item "eval-v1" --input '{"q":"What is AI?"}' --expected '{"a":"..."}'
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" add-items-bulk "eval-v1" data.json     # JSON or CSV
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" items "eval-v1"
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" export "eval-v1" --format json
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" runs "eval-v1"
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" run-items "eval-v1" "run-name"

Bulk import formats: JSON array of {input, expectedOutput, metadata}, or CSV with input/expected_output columns (or input_*/expected_* prefixed columns).

FlowWise Prompt Lifecycle (capture → version → test → compare → promote)

This skill is the canonical tool for managing FlowWise prompt versions in LangFuse Prompt Management. Use the five-step workflow below together with references/prompt_migration.md; there is no separate packaged references/prompt-lifecycle.md file.

Quick reference:

# Step 1 — Capture baseline from a production trace
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" from-trace TRACE_ID --upload --prefix "salesbot-v2" --labels "baseline"

# Step 2 — Create fix version (prompt text from the approved tuning workflow)
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" create --name "salesbot-v2-interviewer" --prompt "..." --labels "fix-b2,candidate"

# Step 3 — Test via regression (see regression-runner skill)
uv run regression_runner.py --dataset <name> --prompt-name "salesbot-v2-interviewer" --prompt-label candidate

# Step 4 — Compare versions
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" diff "salesbot-v2-interviewer" --v1 1 --v2 2

# Step 5 — Promote verified version
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" update-labels "salesbot-v2-interviewer" --version 2 --add "production,latest"

Label conventions: production, latest, baseline, candidate, fix-<code> (e.g. fix-b2), deprecated.

For generic prompt migration, read references/prompt_migration.md. FlowWise-specific SimHuman lifecycle details (DNA catalog → simulation-tuner → apply_sim_tuning.py) are project-local and must not be assumed to ship with this plugin unless they are explicitly packaged in the consuming project.

Prompt Management

python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" list
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" get "my-prompt" --label production
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" create --name "qa" --prompt "Rate: {{answer}}"
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" from-file --name "system" prompt.txt
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" from-trace TRACE_ID                   # discover system prompts
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" from-trace TRACE_ID --upload --prefix "mybot"
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" from-trace --session SID --strategy max_coverage --upload
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" update-labels "my-prompt" --version 3 --add production
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" diff "my-prompt" --v1 1 --v2 3
python "${CLAUDE_SKILL_DIR}/scripts/lf_prompts.py" history "my-prompt"

Prompt types: text (string with {{vars}}), chat (JSON array of {role, content}). Auto-detected on create.

from-trace extracts system prompts from generation observations. Strategies for --session: latest (most recent trace), first_last, max_coverage (all traces, keep longest per node), all.

RAG Evaluation (run → analyze → compare → act)

Runtime status: execution scripts in this section are blocked until ported. scripts/lf_rag_eval.py, scripts/lf_rag_question_gen.py, and scripts/lf_rag_retrieval_audit.py currently import project-local core.config and load .env. That violates the runtime credential contract for an installed skill. Do not run these scripts from the shipped plugin until they are ported to the local credential loader and explicit non-secret project configuration.

Important: RAG datasets and experiment runs live in the Simulator LangFuse project (--profile simulator), not FlowWise. This avoids polluting production traces.

Run an evaluation after the port is complete:

# Full run (all evaluators: semantic similarity + LLM judge + compliance)
uv run python "${CLAUDE_SKILL_DIR}/scripts/lf_rag_eval.py" --dataset-name rag-eval-baseline-v01 --run-name my-run

# Fast run (skip LLM judge, Tier 1 only — cheaper)
uv run python "${CLAUDE_SKILL_DIR}/scripts/lf_rag_eval.py" --dataset-name rag-eval-baseline-v01 --run-name my-fast-run --skip-llm-judge

# Test with small sample
uv run python "${CLAUDE_SKILL_DIR}/scripts/lf_rag_eval.py" --dataset-name rag-eval-baseline-v01 --run-name test-3 --limit 3

Access results:

# Local results (includes bot responses, judge reasoning)
# data/rag-eval/results/<run-name>.json

# List runs in LangFuse
python "${CLAUDE_SKILL_DIR}/scripts/lf_datasets.py" --profile simulator runs rag-eval-baseline-v01

# Get dataset item metadata (categories, variants)
# data/rag-eval/rag-eval-baseline-v01.json

Analyze existing results: Read references/rag-eval-interpretation.md for the full interpretation framework — score semantics, diagnostic patterns, category analysis protocol, run comparison methodology, root cause reasoning, and action recommendations. The guide teaches you how to reason about the scores, not just compute them.

Porting plan: Read references/rag-flowwise-port-plan.md before re-enabling any FlowWise RAG runner.

Evaluator tiers:

Tier 1 (always on): semantic_similarity (embedding cosine), response_quality (heuristics), no_marketing (compliance regex)
Tier 2 (opt-out via --skip-llm-judge): llm_relevance, llm_completeness, llm_accuracy (LLM-as-judge via OpenRouter)
Run-level: avg_semantic_similarity, avg_llm_quality, marketing_violation_rate

Score Management

python "${CLAUDE_SKILL_DIR}/scripts/lf_scores.py" list --name "accuracy" --age 1440
python "${CLAUDE_SKILL_DIR}/scripts/lf_scores.py" create --trace ID --name "accuracy" --value 0.85
python "${CLAUDE_SKILL_DIR}/scripts/lf_scores.py" bulk-create scores.json
python "${CLAUDE_SKILL_DIR}/scripts/lf_scores.py" analyze --name "accuracy"
python "${CLAUDE_SKILL_DIR}/scripts/lf_scores.py" export --format csv --output scores.csv
python "${CLAUDE_SKILL_DIR}/scripts/lf_scores.py" delete SCORE_ID

Score types: NUMERIC (float), CATEGORICAL (string), BOOLEAN (0/1). Sources: API (code), ANNOTATION (human), EVAL (automated evaluator).

Reporting & Analysis

python "${CLAUDE_SKILL_DIR}/scripts/lf_report.py" overview                         # project snapshot
python "${CLAUDE_SKILL_DIR}/scripts/lf_report.py" cost --age 1440                  # cost by trace name
python "${CLAUDE_SKILL_DIR}/scripts/lf_report.py" latency --age 1440 --name "agent-run"
python "${CLAUDE_SKILL_DIR}/scripts/lf_report.py" quality --score-name "accuracy"
python "${CLAUDE_SKILL_DIR}/scripts/lf_report.py" full --age 1440 --output report.json

Exception Analysis

python "${CLAUDE_SKILL_DIR}/scripts/lf_exceptions.py" find --age 1440 --group-by type
python "${CLAUDE_SKILL_DIR}/scripts/lf_exceptions.py" find --group-by filepath
python "${CLAUDE_SKILL_DIR}/scripts/lf_exceptions.py" file "src/agent.py" --age 1440
python "${CLAUDE_SKILL_DIR}/scripts/lf_exceptions.py" details TRACE_ID
python "${CLAUDE_SKILL_DIR}/scripts/lf_exceptions.py" count --age 60

Chat Extraction

Extract session conversations into simplified JSONL (machine-readable) + Markdown (human-readable):

python "${CLAUDE_SKILL_DIR}/scripts/lf_extract_chat.py" session SESSION_ID               # single session
python "${CLAUDE_SKILL_DIR}/scripts/lf_extract_chat.py" trace TRACE_ID                   # single trace
python "${CLAUDE_SKILL_DIR}/scripts/lf_extract_chat.py" batch sessions.txt               # batch from file
python "${CLAUDE_SKILL_DIR}/scripts/lf_extract_chat.py" session SID --output-dir ./chats # custom output dir

Output per session: {session_id}.jsonl with {session_id, source, conversation.turns[], metadata} and {session_id}.md with numbered turns in blockquotes.

Schema Lookup

python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" list                             # all object types
python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" show trace                       # fields + endpoints
python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" show generation                  # alias for observation
python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" fields userId                    # search across types
python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" hierarchy                        # object tree
python "${CLAUDE_SKILL_DIR}/scripts/lf_schema.py" endpoints dataset                # API endpoints

Raw API Access

For any endpoint not covered by convenience scripts, prefer the official CLI:

python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" api __schema
python "${CLAUDE_SKILL_DIR}/scripts/lf_cli.py" api <resource> <action> [flags]

Use the local raw HTTP script only when the official CLI is unavailable in the shell:

python "${CLAUDE_SKILL_DIR}/scripts/lf_api.py" GET /api/public/traces --params '{"limit":5}'
python "${CLAUDE_SKILL_DIR}/scripts/lf_api.py" POST /api/public/comments --body '{"objectId":"...","objectType":"TRACE","content":"flagged"}'
python "${CLAUDE_SKILL_DIR}/scripts/lf_api.py" GET /api/public/annotation-queues

Time Filtering

Most scripts support --age <minutes> for relative time ranges:

--age 60 → last hour
--age 1440 → last 24 hours
--age 10080 → last 7 days

Some also support --from-ts and --to for absolute ISO timestamps.

Script Architecture

Generic Langfuse API work goes through lf_cli.py, which delegates to the official langfuse-cli and maps this plugin's credential profile to the CLI environment. Curated scripts use lf_client.py — a zero-dependency HTTP client (stdlib only: urllib, json, base64). Project-specific FlowWise RAG execution scripts are excluded until their core.config / .env dependency is ported.

Key shared features:

Multi-profile canonical store at ~/.skills/langfuse/credentials.json (per R50 v2.0.0)
Local credential loader at utilities/credentials.py — five-tier precedence ladder; no walk-up .env discovery
Official CLI bridge via scripts/lf_cli.py — exports LANGFUSE_BASE_URL, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY to the subprocess only
Paginated fetching via api_call_paginated()
Smart truncation via --compact flag (preserves essential fields, truncates large ones)
Per-key env-var overrides — LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, LANGFUSE_BASE_URL, LANGFUSE_HOST, LANGFUSE_PROJECT_NAME

Important Notes

Always verify the credential bridge with lf_cli.py --dry-run api __schema before first use
Then verify REST connectivity with lf_client.py --action health when using bundled scripts
Dataset names support virtual folders with / (e.g., eval/accuracy)
Prompt names are unique per project; creating with same name auto-increments version
The tree command shows ⚠ indicators on ERROR/FATAL observations
--compact on get actions truncates input/output/stacktrace for readability
Exception analysis extracts from OTEL span events (exception.type, exception.message, exception.stacktrace)

Language Handling

Use the user's working language for explanations, diagnostics, and recommendations unless the user asks otherwise.
Keep Langfuse API field names, trace/span/observation identifiers, environment variable names, and CLI flags exactly as emitted by Langfuse tooling.
When summarizing traces or prompts in a multilingual project, preserve quoted model/user text in its source language and explain findings in the user's working language.

End-of-run (conditional)

Standard runs end after credentials/tooling are verified, traces/prompts/datasets/scores are inspected or exported, and findings are surfaced to the user. No self-improvement prompt fires for uneventful work.

Prompt for skill improvement only when the run deviated from the documented flow: a Langfuse API surface was missing from lf_cli.py/local scripts, credential setup hit an unsupported profile shape, quarantined FlowWise RAG code was needed, a repeated analysis pattern should become a reference, or the skill had to improvise around CLI/runtime drift. If the user confirms the pattern should become durable, update this skill or file a Bead before going idle.

langfuse

Invocation

Context Preview

Supporting Files

SKILL.md

langfuse

Invocation

Context Preview

Supporting Files

SKILL.md

Langfuse Analyst

Operating Principles

Setup Validation

Credentials

Decision Tree

User wants to DO something generic/current → CLI Operator mode

User wants packaged views/reports → Curated Operator mode

User wants to KNOW something → Guide mode

User wants to ANALYZE RAG evaluation results → RAG Analyst mode

User wants to SWITCH project

Common Workflows

Trace Exploration

Observation Inspection

Session Analysis

Dataset Management

FlowWise Prompt Lifecycle (capture → version → test → compare → promote)

Prompt Management

RAG Evaluation (run → analyze → compare → act)

Score Management

Reporting & Analysis

Exception Analysis

Chat Extraction

Schema Lookup

Raw API Access

Time Filtering

Script Architecture

Important Notes

Language Handling

End-of-run (conditional)

Similar Skills

Langfuse Analyst

Operating Principles

Setup Validation

Credentials

Decision Tree

User wants to DO something generic/current → CLI Operator mode

User wants packaged views/reports → Curated Operator mode

User wants to KNOW something → Guide mode

User wants to ANALYZE RAG evaluation results → RAG Analyst mode

User wants to SWITCH project

Common Workflows

Trace Exploration

Observation Inspection

Session Analysis

Dataset Management

FlowWise Prompt Lifecycle (capture → version → test → compare → promote)

Prompt Management

RAG Evaluation (run → analyze → compare → act)

Score Management

Reporting & Analysis

Exception Analysis

Chat Extraction

Schema Lookup

Raw API Access

Time Filtering

Script Architecture

Important Notes

Language Handling

End-of-run (conditional)

Similar Skills