From posthog
Audits all PostHog data warehouse source connections, sync schemas, and webhook channels, producing a prioritized report grouped by severity with recommended next steps.
How this skill is triggered — by the user, by Claude, or both
Slash command
/posthog:auditing-warehouse-source-healthThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill produces a project-wide audit of the **source and sync** side of the data warehouse pipeline — source
This skill produces a project-wide audit of the source and sync side of the data warehouse pipeline — source
connections, sync schemas, and webhook push channels. Use it when the user wants a summary of what's broken with
their imports, not a deep-dive on one sync. The deep-dive on individual failures is
diagnosing-failed-warehouse-syncs; this skill is the scan that tells them where to look first.
The same underlying endpoint (data-warehouse-data-health-issues-retrieve) also reports materialized-view,
batch-export-destination, and transformation issues. Materialized views are covered by
auditing-warehouse-view-health. Destinations (batch exports) and transformations are owned by other products — surface
them if they appear, but route them to the relevant team rather than diagnosing here.
| Tool | Purpose |
|---|---|
data-warehouse-data-health-issues-retrieve | One-shot: all failed/degraded items across the whole pipeline |
external-data-sources-list | All sources with status and latest error |
external-data-schemas-list | All schemas with status, last_synced_at, latest_error |
external-data-sources-webhook-info-retrieve | Check per-source webhook state (not covered by data-health-issues) |
The data-health-issues endpoint aggregates across the whole pipeline — it's the fastest path to a summary. Filter
its results to the source and external_data_sync types for this audit. Use the list endpoints when you need more
context than the summary provides (row counts, non-failing items, schema-level detail).
From the data-health endpoint, this audit cares about two of the five categories:
type | Trigger | Typical urgency |
|---|---|---|
source | ExternalDataSource.status = Error — whole source connection broken | High |
external_data_sync | schema in Failed or BillingLimitReached state (the data-health endpoint returns status: "failed" or status: "billing_limit" respectively) | Medium–High |
Each entry includes id, name, type, status, error, failed_at, url, and source_type.
The other categories the endpoint returns are out of scope for this skill:
materialized_view → auditing-warehouse-view-healthdestination (batch export) → owned by the batch exports / data pipelines producttransformation (HogFunction) → owned by the CDP / ingestion sideNote the data-health endpoint only reports active failures. For source/sync health it doesn't flag:
should_sync = false)Completedsync_type: "webhook" schemas. The bulk-sync safety net can succeed while the webhook
push channel is silently broken (deregistered, disabled on the remote side, failing signature verification).
These don't surface in data-health-issues — check per-source with webhook-info-retrieve.If the user asks about staleness or unused items, reach beyond this endpoint — see Step 4.
Call data-warehouse-data-health-issues-retrieve and keep the source and external_data_sync entries.
If there are no source/sync issues, tell the user their sources are healthy and stop. Don't invent problems.
status: "billing_limit" entries (billing issue, non-technical — flag and route to billing)Failed on heavily-used tables (user asks / check row counts via schemas-list if needed)Failed on less-used tablesRender a prioritized report. Don't dump the raw JSON — human-readable table per category:
## Data warehouse source health — 4 issues
### 🔴 Sources (1)
- Stripe — authentication failed (failed 2h ago). All 8 tables under it are currently dead.
→ `diagnosing-failed-warehouse-syncs` on this source
### 🟠 Sync schemas (3)
- postgres_prod.orders (Failed 6h ago) — column "updated_at" does not exist
- postgres_prod.invoices (Failed 6h ago) — column "updated_at" does not exist
- hubspot.contacts (BillingLimitReached) — team quota exceeded
Recommended order:
1. Stripe auth (everything under it is dead)
2. Schema-drift on postgres_prod.orders / invoices — looks like upstream renamed a column
3. Billing limit on hubspot
The exact format is less important than: prioritized, grouped, actionable, and hinting at the right next skill.
If the user wants more than just "what's on fire" — e.g. "what else should I look at?" — cross-check:
Stale but "Completed" schemas:
Call external-data-schemas-list and look for schemas with old last_synced_at relative to their sync_frequency.
A schema on 1hour frequency that last synced 3 days ago is effectively broken even if status says Completed.
Sources with zero sync activity:
Sources where every schema has should_sync: false or status = Paused. These were set up and then abandoned —
candidates for cleanup via external-data-sources-destroy.
Broken webhooks on webhook-type schemas:
Iterate the sources that have any schema with sync_type: "webhook" (visible via external-data-schemas-list). For
each, call external-data-sources-webhook-info-retrieve({source_id}):
exists: false while a schema is sync_type: "webhook" → webhook was never registered, or was deleted. Push
channel is dead; only the bulk fallback is ingesting.external_status.error present → remote service is reporting a problem (permission revoked, endpoint
deleted on their dashboard).external_status.status not "enabled" → remote has disabled the endpoint (often after repeated delivery
failures).Report these separately from the primary audit — they're a different shape of problem than failed syncs, and the fix
is a different skill (diagnosing-failed-warehouse-syncs scenario I, or setting-up-a-data-warehouse-source step
5.5).
Only run these extra checks if the user explicitly asks for a broader audit — they involve more tool calls and heuristics.
End the audit with a clear hand-off:
diagnosing-failed-warehouse-syncstuning-incremental-sync-configexternal-data-schemas-partial-updateNever start applying fixes autonomously from an audit — the audit's job is to report and recommend, not remediate. Any fix should be confirmed explicitly before executing.
data-health-issues only surfaces active failures. For staleness or abandoned sources you need to cross-check
the list endpoints. Only do this when the user explicitly asks for a deeper audit.webhook-info-retrieve rather than inferring from schema status.npx claudepluginhub anthropics/claude-plugins-official --plugin posthogDiagnose why a data warehouse sync is failing and recommend the right recovery action. Covers source-level vs schema-level failures, stuck states, credential and schema-drift errors, and incremental-field misconfig.
Generates scripts using pycarlo SDK to collect and push metadata, lineage, query logs from any data warehouse to Monte Carlo. Uses templates for warehouses like Snowflake.
Manage data quality in DataHub: create and run assertions, check outcomes, raise/resolve incidents, and diagnose health problems across your data estate.