From astronomer-data
Generates detailed profiles of database tables including metadata, row counts, column statistics, cardinality analysis, sample data, and quality checks for completeness, uniqueness, and freshness.
How this skill is triggered — by the user, by Claude, or both
Slash command
/astronomer-data:profiling-tablesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate a comprehensive profile of a table that a new team member could use to understand the data.
Generate a comprehensive profile of a table that a new team member could use to understand the data.
Query column metadata:
SELECT COLUMN_NAME, DATA_TYPE, COMMENT
FROM <database>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '<schema>' AND TABLE_NAME = '<table>'
ORDER BY ORDINAL_POSITION
If the table name isn't fully qualified, search INFORMATION_SCHEMA.TABLES to locate it first.
Run via run_sql:
SELECT
COUNT(*) as total_rows,
COUNT(*) / 1000000.0 as millions_of_rows
FROM <table>
For each column, gather appropriate statistics based on data type:
SELECT
MIN(column_name) as min_val,
MAX(column_name) as max_val,
AVG(column_name) as avg_val,
STDDEV(column_name) as std_dev,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
SELECT
MIN(LEN(column_name)) as min_length,
MAX(LEN(column_name)) as max_length,
AVG(LEN(column_name)) as avg_length,
SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
SELECT
MIN(column_name) as earliest,
MAX(column_name) as latest,
DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count
FROM <table>
For columns that look like categorical/dimension keys:
SELECT
column_name,
COUNT(*) as frequency,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM <table>
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 20
This reveals:
Get representative rows:
SELECT *
FROM <table>
LIMIT 10
If the table is large and you want variety, sample from different time periods or categories.
Summarize quality across dimensions:
Provide a structured profile:
2-3 sentences describing what this table contains, who uses it, and how fresh it is.
| Column | Type | Nulls% | Distinct | Description |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |
List any data quality concerns discovered.
3-5 useful queries for common questions about this data.
npx claudepluginhub astronomer/agents --plugin astronomer-dataProfiles unfamiliar datasets: schema structure, column distributions, null rates, cardinality, outliers, table relationships, and temporal coverage. Onboard new data sources, audit freshness, or discover foreign keys.
Profiles tables or files (CSV, Excel, Parquet, JSON) to reveal shape, null rates, column distributions, top values, percentiles, data quality issues, and column categories.
Profiles an AIDP table via Spark SQL — row count, per-column null %, distinct count, min/max/mean, and top-K values. Use for data-quality snapshots or understanding dataset shape.