Skill

traceway

From tw

Operates a Traceway observability instance via CLI: login, query exceptions/logs/endpoints/metrics, and debug production issues to root cause. Activated by /traceway commands.

monitoring

Popularity

Stars

878

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/tw:traceway

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Drive a Traceway instance from the terminal with the `traceway` CLI. The first word of the argument decides the flow:

SKILL.md

285 lines · ~4.8k tokens

Stats

LanguageGo

Stars878

Forks26

MaintenanceExcellent

Last CommitJun 24, 2026

Actions

View Source View Plugin View on GitHub View README

Traceway

Drive a Traceway instance from the terminal with the traceway CLI. The first word of the argument decides the flow:

Invocation	Flow
`/traceway login`	Login: install the CLI if missing, authenticate, select a project
`/traceway debug <issue ref or bug description>`	Debug: resolve the issue and investigate to root cause
`/traceway <anything else>`	Query: answer the observability question with CLI reads
`/traceway` (no argument)	Ask what they want: log in, debug an issue, or run a query

The CLI is under active development. If a flag documented here does not appear in traceway <command> --help, trust the binary.

Ground Rules (All Flows)

Reads are safe: any list / show / query subcommand may run freely; they never mutate server state.
Writes require explicit user instruction: exceptions archive / unarchive are the only mutating data commands; only run them when the user asks by name, with --yes in non-interactive contexts. "Look at this error" means read it, not archive it.
Output: piped output defaults to JSON (table on a TTY). Prefer JSON + jq, and --fields a,b,c to trim responses. Keep --page-size at 10 to 20 for triage.
Time windows: always bound queries, default --since 1h for "now" questions, --since 24h otherwise. --since accepts s, m, h, lowercase Nd (no 1w, no 7d2h). Absolute windows via --from / --to (RFC3339).
Exit codes: 0 ok, 1 generic/API, 2 usage, 3 connection, 4 auth, 5 not found, 6 rate limited, 7 server 5xx. Errors emit {"error":"<stable_id>","message":"...","hint":"...","exit_code":N} on stderr; branch on the error field.
On exit code 4 (auth), do not run traceway login yourself; switch to the Login flow and let the user enter credentials.

Resolving Dashboard URLs

Users paste dashboard URLs (https://<instance>/<route>) as references in any flow. Resolve by route family:

URL path	Identifies	How to fetch it
`/issues/<hash>` and `/issues/<hash>/events`	Exception group (hash = 16 hex chars)	`traceway exceptions show <hash>`
`/issues/<hash>/<occurrenceId>` (UUID)	One occurrence within the group	`traceway exceptions occurrence <occurrenceId> --recorded-at <t>` where `t` is the URL's `?t=` param. Direct and fast; also returns the occurrence's `sessionId` and session recording. No URL? get `recordedAt` from `traceway exceptions show <hash>` occurrences
`/endpoints/<endpoint>`	Endpoint group; the segment is the URL-encoded endpoint name (`GET%20%2Fapi%2Fusers%2F%3Aid` is `GET /api/users/:id`)	Decode it, then `traceway endpoints list --search "<decoded name>"` (the group has no id; `endpoints show` is for one request — next row)
`/endpoints/<endpoint>/<endpointId>`	One request (transaction) of that endpoint	`traceway endpoints show <endpointId> --recorded-at <t>` (`t` = the URL's `?t=` param). Returns the request, its span waterfall, and any linked exception/messages
`/tasks/<task>`	Background task group	No CLI for the group; for one run use the next row
`/tasks/<task>/<taskId>`	Single task run	`traceway tasks show <taskId> --recorded-at <t>` (`t` = the URL's `?t=` param)
`/sessions/<sessionId>`	Session (the exceptions that fired during it; replay stays dashboard-only)	`traceway sessions show <sessionId> --started-at <t>`. The URL has no `?t=`; use the session's start, the URL's `from=`, or a linked occurrence's `recordedAt` (it falls inside the window). Occurrences reference sessions via their `sessionId`
`/ai-traces/<traceName>`	AI trace group	No CLI for the group; for one trace use the next row
`/ai-traces/<traceName>/<traceId>`	Single AI trace	`traceway ai-traces show <traceId> --recorded-at <t>` (`t` = the URL's `?t=` param); returns token/cost stats + the conversation
`/logs`	Logs page (its filters are not stored in the URL)	`traceway logs query` with flags taken from the user's description
`/issues`, `/endpoints`, `/metrics`, `/`	List and dashboard pages	The matching `list` / `query` command

Time window: most dashboard URLs carry ?preset=<p> or ?from=<iso>&to=<iso> (sticky across pages); honor them instead of the default window.

preset values 5m 30m 60m 3h 6h 12h 24h 3d 7d map directly to --since; the CLI has no month unit, so map 1M to --since 30d and 3M to --since 90d.
from/to are ISO timestamps; pass via --from/--to, appending Z (or the correct offset) when missing, since the CLI requires RFC3339.
No time params means the page was on its default; pick --since per the ground rules.

preset/from/to set the window for list/group views. Detail URLs additionally carry ?t=<iso> — the single record's timestamp, URL-encoded. That t value is exactly what the by-id commands need as --recorded-at (or --started-at for sessions). See "Fast by-id lookups" next.

Fast by-id lookups (always pass the timestamp)

The by-id detail commands — exceptions occurrence, endpoints show, tasks show, ai-traces show, sessions show, traces show — require the record's timestamp (--recorded-at, or --started-at for sessions). Telemetry tables are partitioned by day: with the timestamp the lookup is bounded to a small window and ClickHouse prunes to a few partitions; without it the server scans every partition (slow cold load). The flag is mandatory for exactly this reason — never omit it. It can be approximate (within ±24h), and you can recover or estimate it when it isn't handed to you; see "When you don't have the timestamp" below.

Where the timestamp comes from, in order of preference:

A dashboard URL — the ?t=<iso> param is the record's recordedAt; URL-decode it and pass it verbatim. (Sessions have no t; use the session start, from=, or a linked occurrence's recordedAt.)
A list/group you already fetched — every exceptions show occurrence carries recordedAt. Capture the id and its recordedAt together, then drill in.
A notification — see below.

Query order when you hold an id: resolve its recordedAt first (URL, group, or notification), then call the by-id command with it.

When you don't have the timestamp

The flag is required, so you must supply something — but it can be approximate. The lookup window is ±24h around what you pass (±48h for traces show), and if the record isn't in that window the server falls back to an unbounded scan. So a timestamp within a day of the truth stays fast; a wrong guess still returns the right record, just slower. Resolve it in this order:

Recover it from an API. For an occurrence whose hash you know (e.g. /issues/<hash>/<occurrenceId> pasted without ?t=), run traceway exceptions show <hash> and read that occurrence's recordedAt — the hash endpoint needs no timestamp. A group's firstSeen/lastSeen from exceptions list bound when its occurrences happened (lastSeen ≈ the most recent one).
Estimate from context. A notification's send time, the issue's firstSeen/lastSeen, or the URL's preset/from window all put you inside ±24h — good enough for a fast lookup.
Ask the user. If nothing pins it down (e.g. a bare occurrence/endpoint id with no hash, no time, and no list to recover from), ask roughly when it happened — "around when did this fire? within a day is enough" — and pass that. Don't invent a placeholder like "now" when the issue is old; that defeats the pruning and can miss the ±24h window entirely.

Resolving an issue notification

Traceway issue notifications (email / Slack / webhook) embed everything for a direct, fast lookup. The body contains:

Hash: <16-hex> — the exception group → traceway exceptions show <hash>.
Exception ID: <uuid> — the specific occurrence.
Occurred at: 2006-01-02 15:04:05 UTC — the occurrence timestamp. Convert to RFC3339: replace the space with T and UTC with Z (→ 2006-01-02T15:04:05Z).
View details: /issues/<hash> — the deep link.

So from a notification, go straight to the occurrence (fast), then pivot reusing the same timestamp:

traceway exceptions occurrence <Exception ID> --recorded-at <Occurred at → RFC3339> --output json
# the result carries distributedTraceId and sessionId → traces show / sessions show below

Flow: Login

1. Check for an existing install

traceway version

If it prints a version, skip to authentication.

2. Install if missing

Prebuilt binaries are on the tracewayapp/traceway releases page under cli/vX.Y.Z tags (the latest release may be a Backend release, so filter for CLI tags):

OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m); [ "$ARCH" = "aarch64" ] && ARCH=arm64
URL=$(curl -s "https://api.github.com/repos/tracewayapp/traceway/releases?per_page=20" \
  | grep -o "https://[^\"]*traceway_[^\"]*_${OS}_${ARCH}\.tar\.gz" | head -1)
TMP=$(mktemp -d)
curl -sL "$URL" | tar -xz -C "$TMP"
install -m 755 "$TMP/traceway" ~/.local/bin/traceway && rm -rf "$TMP"

Make sure ~/.local/bin is on PATH (or install to /usr/local/bin). Fallback, build from source (requires Go):

git clone https://github.com/tracewayapp/traceway && cd traceway/cli
go build -o bin/traceway ./cmd/traceway && install -m 755 bin/traceway ~/.local/bin/traceway

Verify with traceway version.

3. Authenticate

Login prompts for the password interactively, so ask the user to run it themselves (in Claude Code, suggest typing ! traceway login --url https://<instance> so the output lands in the session):

traceway login --url https://<traceway-instance>

Non-interactive alternative when the password is in a secret store (never echo a password into the command line or shell history):

printf '%s' "$TRACEWAY_PASSWORD" | traceway login --url https://<instance> --username [email protected] --password-stdin

Multiple instances or accounts coexist via profiles: traceway login --url ... --profile work, then traceway profiles list / traceway profiles use work.

4. Select a project and smoke-check

traceway projects list
traceway projects use <project-id>
traceway exceptions list --since 24h

The selected project is used implicitly by all subsequent commands.

Flow: Debug

/traceway debug issue X or /traceway debug <free-form bug description>.

1. Resolve the issue reference

X can be several things; resolve it to an exception hash (16 hex chars):

Reference looks like	How to resolve
Dashboard URL	See "Resolving Dashboard URLs" above; for `/issues/...` URLs the path segment right after `/issues/` is the hash, and `?preset`/`?from`/`?to` give the time window
Bare 16-char hex string	Already the hash
Anything else (title, error message, type, file name)	Search: `traceway exceptions list --since 7d --search "<text>"`; widen to `--since 30d` (and `--include-archived`) if empty
No issue reference, just a bug description	Skip to triage below

When a search returns multiple groups, show a shortlist (hash, count, lastSeen, first stack line) and ask the user which one before drilling in.

traceway exceptions list --since 7d --search "checkout" --output json \
  | jq '.data[]? | {hash: .exceptionHash, count, lastSeen, top: (.stackTrace | split("\n")[0])}'

2. Drill into the issue

traceway exceptions show <hash>

This is the high-value call: full stack trace, occurrence list with recordedAt, attributes (user IDs, app versions, request context), and optional distributedTraceId / sessionId per occurrence. firstSeen correlates with deploys: a group that first appeared right after a release points at that release's diff. A bogus hash exits 5 with not_found; fall back to search.

3. Triage and correlate (also the entry point for free-form bug descriptions)

From the description extract symptom, affected endpoint/feature, and time window, then read several signals before forming a hypothesis:

traceway exceptions list --since 24h --order-by lastSeen        # what is erroring (firstSeen for regressions, count for volume)
traceway logs query --since 24h --min-severity 17               # errors and worse
traceway logs query --since 24h --search "payment declined"     # search log bodies
traceway logs query --since 24h --service checkout-api --min-severity 13
traceway endpoints list --since 24h --search "checkout"         # latency p50/p95/p99 and error counts, --order-by impact|count|p95|lastSeen

Severity is an OTel number, not a name: 1 TRACE, 5 DEBUG, 9 INFO, 13 WARN, 17 ERROR, 21 FATAL. The flag is --min-severity 17, never --severity error.

Correlate by trace: when an occurrence or log line carries a trace ID, pull the whole request timeline; this is usually the fastest route to a root cause:

traceway exceptions show $HASH --output json | jq -r '.occurrences[0].distributedTraceId' \
  | xargs -I{} traceway logs query --trace-id {} --output json

Pull the whole cross-service trace and the user's session, reusing the occurrence's recordedAt as the (mandatory) time hint so both lookups stay partition-bounded:

OCC=$(traceway exceptions show $HASH --output json | jq -c '.occurrences[0]')
TS=$(jq -r '.recordedAt' <<<"$OCC")
DT=$(jq -r '.distributedTraceId // empty' <<<"$OCC")
SID=$(jq -r '.sessionId // empty' <<<"$OCC")
[ -n "$DT" ]  && traceway traces show "$DT" --recorded-at "$TS"      # every endpoint/task/ai-trace/exception node across services
[ -n "$SID" ] && traceway sessions show "$SID" --started-at "$TS"    # the session + the exceptions that fired in it

traces show is usually the single highest-value RCA call: it stitches one logical request together end to end across services.

Check metrics for systemic causes (spikes lining up with firstSeen suggest saturation rather than a code bug):

traceway metrics query --name system.cpu.utilization --aggregation max --since 24h
traceway metrics query --name <name> --aggregation avg|sum|count|min|max [--tag key=value] [--group-by <tag>]

The CLI also accepts p50|p95|p99, but the server has no quantile aggregation for metric points and silently computes avg for them — never present those as percentiles. Latency percentiles come from traceway endpoints list, computed from raw request durations. There is no metrics list; a bogus name returns an empty series: {} cleanly, so probing names is safe. Host metrics from the Traceway OTel Agent live under system.* names, and OTLP histogram metrics are stored as two series, <name>.avg and <name>.count.

4. Correlate with the code

Open the files and lines named in the stack trace and read the failing path.
If the issue started at a known time, check what shipped then: git log --since "<firstSeen>" --until "<firstSeen + 1h>" or the deploy history.
Form a hypothesis that explains ALL observations (error message, affected endpoint, timing, volume), not just the first stack frame.
Propose or implement the fix per the user's instruction.

5. Report and clean up

Summarize: symptom, evidence (exception hashes, log excerpts, metric anomalies), root cause, fix. Include traceway exceptions show <hash> references so the user can verify. After a fix is deployed and verified, archive only when the user asks:

traceway exceptions archive <hash> --yes

Flow: Query

For free-form requests ("what's broken in prod?", "is /api/checkout slow?", "show errors for service X"), use the read commands directly.

Command reference

Command	Purpose
`traceway projects {list,use}`	List or select the active project
`traceway exceptions list`	Grouped exceptions; `--search`, `--search-type text\|regex`, `--order-by lastSeen\|firstSeen\|count`, `--include-archived`
`traceway exceptions show <hash>`	One group: full stack trace + occurrences
`traceway exceptions occurrence <id> --recorded-at <t>`	One occurrence by id (fast): full detail + `sessionId` + recording
`traceway exceptions archive/unarchive <hash>...`	Mutating; explicit user request + `--yes` only
`traceway logs query`	Logs; `--search` (`--search-type body\|attribute`), `--service`, `--min-severity <n>`, `--trace-id`
`traceway endpoints list`	Per-endpoint p50/p95/p99 and counts; `--search`, `--order-by impact\|count\|p95\|lastSeen`
`traceway endpoints show <id> --recorded-at <t>`	One request by id: span waterfall + linked errors
`traceway tasks show <id> --recorded-at <t>`	One background task run by id
`traceway ai-traces show <id> --recorded-at <t>`	One AI trace by id + its conversation
`traceway sessions show <id> --started-at <t>`	One session by id + the exceptions that fired in it
`traceway traces show <id> --recorded-at <t>`	Distributed trace: every service node sharing the id
`traceway metrics query --name <metric>`	Time series; `--aggregation`, `--tag`, `--group-by`, `--interval-minutes`
`traceway profiles {list,use}`, `login`, `logout`, `version`	Profile and session management

The by-id show/occurrence commands take their id from a dashboard URL, a notification, or an exceptions show occurrence — and require the record's timestamp (--recorded-at / --started-at); see "Fast by-id lookups" above.

Not implemented yet (do not fabricate flags; point the user at the web UI): list verbs for tasks / sessions / ai-traces / traces (only by-id show exists for those), and metrics list/discover.

Recipes

# What's broken right now
traceway exceptions list --since 1h --order-by lastSeen --page-size 10 --output json \
  | jq '.data[]? | {hash: .exceptionHash, count, lastSeen}'

# Did anything NEW break since a deploy at 13:00 UTC
traceway exceptions list --from 2026-06-11T13:00:00Z --to "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --order-by firstSeen --output json \
  | jq '.data[]? | select(.firstSeen >= "2026-06-11T13:00:00Z") | {hash: .exceptionHash, firstSeen, count}'

# Worst endpoint by latency
traceway endpoints list --since 1h --order-by p95 --page-size 1 --output json | jq '.data[0]'

# Errors for one service (exceptions --search is free text, not a service filter; use logs)
traceway logs query --service checkout-api --min-severity 17 --since 1h --output json \
  | jq '.data[]? | {timestamp, body, traceId}'

Empty results (data: null or data: []) are not errors: widen the window, re-check the active project (traceway projects list), and if the app was never connected to Traceway, set it up first (the traceway-setup skill).

traceway

Popularity

Invocation

Context Preview

SKILL.md

traceway

Popularity

Invocation

Context Preview

SKILL.md

Traceway

Ground Rules (All Flows)

Resolving Dashboard URLs

Fast by-id lookups (always pass the timestamp)

When you don't have the timestamp

Resolving an issue notification

Flow: Login

1. Check for an existing install

2. Install if missing

3. Authenticate

4. Select a project and smoke-check

Flow: Debug

1. Resolve the issue reference

2. Drill into the issue

3. Triage and correlate (also the entry point for free-form bug descriptions)

4. Correlate with the code

5. Report and clean up

Flow: Query

Command reference

Recipes

Similar Skills

Traceway

Ground Rules (All Flows)

Resolving Dashboard URLs

Fast by-id lookups (always pass the timestamp)

When you don't have the timestamp

Resolving an issue notification

Flow: Login

1. Check for an existing install

2. Install if missing

3. Authenticate

4. Select a project and smoke-check

Flow: Debug

1. Resolve the issue reference

2. Drill into the issue

3. Triage and correlate (also the entry point for free-form bug descriptions)

4. Correlate with the code

5. Report and clean up

Flow: Query

Command reference

Recipes

Similar Skills