From oracle-ai-data-platform-workbench-engineer-agent
Validates AIDP tables against data-quality rules (not-null, uniqueness, range/set, referential integrity, freshness) using bounded Spark SQL. Reports pass/fail with violation counts and can persist rule sets for re-runs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oracle-ai-data-platform-workbench-engineer-agent:aidp-data-qualityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Validate AIDP tables against explicit data-quality rules, each compiled to bounded Spark SQL and executed
aidp-data-quality — rule checks via Spark SQLValidate AIDP tables against explicit data-quality rules, each compiled to bounded Spark SQL and executed
with the bundled helper — no MCP and no ai-data-engineer-agent repo required.
| Rule | Check (violations) |
|---|---|
| not-null | COUNT(*) WHERE col IS NULL |
| unique | COUNT(*) - COUNT(DISTINCT key) (or GROUP BY key HAVING COUNT(*)>1) |
| range / set | COUNT(*) WHERE col NOT BETWEEN lo AND hi / col NOT IN (...) |
| referential | COUNT(*) child LEFT JOIN parent ... WHERE parent.key IS NULL |
| freshness | MAX(ts) vs SLA (e.g. datediff(current_date, MAX(ts)) <= N) |
.aidp/catalog.md for referential checks (don't guess).
Pull rule definitions from .aidp/semantic.md value dictionaries where available.aidp-cluster-ops / oci raw-request), then for each rule run the
violation-count SQL with the bundled helper (PASS if 0, else FAIL):
python "$PLUGIN_DIR/scripts/aidp_sql.py" --region <region> --datalake <DATALAKE_OCID> --workspace <ws> \
--cluster <cluster-key> \
--code "spark.sql('''SELECT COUNT(*) AS v FROM cat.sch.t WHERE col IS NULL''').show()"
It mints a UPST from the api_key DEFAULT profile, auto-creates a scratch notebook, and returns JSON with
status / outputs / spark_job_ids. No AIDP_SESSION required (--session-profile optional).LIMIT query.aidp-pipelines) as a gating task.Register validated rules in .aidp/dq-rules.md so they can be re-run later (the quality analogue of
.aidp/verified-queries.md). One entry per rule records the target table/column, rule-type (the five types
above), the violation-SQL (counts violations → PASS when 0), and last-result / last-checked. To
re-run, execute each entry's stored violation-SQL via scripts/aidp_sql.py, set the result to PASS (0) or
FAIL (<count>), and record the cluster + date — never mark PASS without a status: ok run returning 0.
Format and re-run rules: references/dq-rules.md.
scripts/aidp_sql.py; never assert a rule passed without a status: ok result.status: error, read the error, fix the SQL grounded in the catalog, and retry..aidp/dq-rules.md rule-set format + re-run)npx claudepluginhub anthropics/claude-plugins-official --plugin oracle-ai-data-platform-workbench-engineer-agent2plugins reuse this skill
First indexed Jun 12, 2026
Defines DQX data quality rules for PySpark DataFrames or Delta tables using Python classes (DQRowRule, DQDatasetRule, DQForEachColRule) or YAML/JSON metadata. Supports filters, custom checks, and criticality levels.
Profiles an AIDP table via Spark SQL — row count, per-column null %, distinct count, min/max/mean, and top-K values. Use for data-quality snapshots or understanding dataset shape.
Validates data quality using Great Expectations, dbt tests, and data contracts for formal rules, expectation suites, checkpoints, and CI/CD pipelines.