From oracle-ai-data-platform-workbench-engineer-agent
Runs write-side Spark SQL on an AIDP lakehouse: DDL (CREATE/ALTER/DROP), DML (INSERT/UPDATE/DELETE/MERGE upsert), and Delta maintenance (OPTIMIZE, VACUUM, time travel, RESTORE, schema evolution, liquid clustering). Use for any non-SELECT SQL.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oracle-ai-data-platform-workbench-engineer-agent:aidp-sql-ddlThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
AIDP runs **Spark 3.5 + Delta Lake 3.2** (platform-ref §39), so the full write grammar works through the
aidp-sql-ddl — lakehouse write SQL (DDL / DML / Delta maintenance)AIDP runs Spark 3.5 + Delta Lake 3.2 (platform-ref §39), so the full write grammar works through the
bundled scripts/aidp_sql.py helper — the same engine aidp-analyzing-data uses for SELECT. This skill
covers everything that changes data or schema. Live-verified on AIDP 2026-06-10 (USER cluster, via
aidp_sql.py, all un-wrapped → status: ok): CREATE TABLE, INSERT, UPDATE, DELETE, MERGE,
OPTIMIZE, VACUUM, DESCRIBE HISTORY, time travel (VERSION AS OF), DROP — the full Spark 3.5 / Delta
3.2 write grammar (platform-ref §39) executes end-to-end. (A tester once saw the first bare CREATE TABLE on a
brand-new instance throw a transient catalog-registration error; it was not reproducible on a fresh
instance — see Caveat below.)
aidp-analyzing-data. NOT for control-plane catalog/connection registration
→ aidp-table-management.Every statement here changes state. Before running any DML/DDL:
aidp-cluster-ops) and qualify objects fully (catalog.schema.table).DROP, DELETE,
TRUNCATE, INSERT OVERWRITE, VACUUM (VACUUM permanently removes old files → breaks time travel)..aidp/payloads/ per references/payloads.md.SELECT count(*) … WHERE <predicate> before the matching UPDATE/DELETE.python "$PLUGIN_DIR/scripts/aidp_sql.py" \
--region <region> --datalake <DATALAKE_OCID> --workspace <ws> --cluster <cluster-key> \
--code "spark.sql('''<SQL>''')"
Returns JSON {status, execution_count, outputs, spark_job_ids, error}. No MCP / no AIDP_SESSION needed.
| Need | SQL |
|---|---|
| Create managed Delta table | CREATE TABLE c.s.t (id INT, v STRING) USING DELTA |
| CTAS | CREATE TABLE c.s.t USING DELTA AS SELECT … |
| Liquid clustering | CREATE TABLE … CLUSTER BY (col) (or ALTER TABLE … CLUSTER BY) |
| Insert | INSERT INTO c.s.t VALUES (…) · full reload INSERT OVERWRITE c.s.t SELECT … |
| Update / Delete | UPDATE c.s.t SET v='x' WHERE … · DELETE FROM c.s.t WHERE … |
| Upsert (MERGE) | MERGE INTO c.s.t t USING src s ON t.id=s.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * |
| Schema evolution | ALTER TABLE c.s.t ADD COLUMNS (x INT) · … RENAME COLUMN · … DROP COLUMN; or write with mergeSchema |
| Compact | OPTIMIZE c.s.t [ZORDER BY (col)] |
| Reclaim files | VACUUM c.s.t RETAIN 168 HOURS (destructive — confirm; <168h needs spark.databricks.delta.retentionDurationCheck.enabled=false) |
| History | DESCRIBE HISTORY c.s.t |
| Time travel | SELECT * FROM c.s.t VERSION AS OF 3 · … TIMESTAMP AS OF '2026-06-01' |
| Restore | RESTORE TABLE c.s.t TO VERSION AS OF 3 |
| View | CREATE VIEW c.s.v AS SELECT … · CREATE OR REPLACE VIEW … · DROP VIEW … |
| Schema (namespace) | CREATE SCHEMA IF NOT EXISTS c.s · DROP SCHEMA c.s [CASCADE] |
CREATE TABLE report (NOT reproduced)A tester once reported that the first bare CREATE TABLE … USING DELTA on a just-provisioned DataLake threw
ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0 (a suspected catalog-registration race,
not a SQL error). This could NOT be reproduced. On a genuinely freshly-provisioned DataLake + cluster, the
very first bare CREATE TABLE succeeded (live-checked 2026-06-12 on a brand-new instance — table created,
confirmed via SHOW TABLES; also fine on established instances). So treat it as a possible transient in a
narrow post-provision window, not a deterministic behavior. If you ever do hit it, don't hammer the DDL —
create the first table via the writer (spark.createDataFrame(rows).write.saveAsTable('c.s.t')) or
ingest-then-CTAS (aidp-ingest-file-to-table) to warm the catalog, then bare CREATE TABLE works.
.aidp/catalog.md or a DESCRIBE/SHOW COLUMNS cell.error field, fix grounded in the catalog, retry once — don't guess repeatedly.aidp-pipelines).oci:// table reads/writes from other sources → delegate to the
…-spark-connectors plugin (aidp-iceberg, aidp-object-storage); this skill is lakehouse-native SQL.2plugins reuse this skill
First indexed Jun 12, 2026
npx claudepluginhub anthropics/claude-plugins-official --plugin oracle-ai-data-platform-workbench-engineer-agentImplements GDPR compliance, PII tagging/masking, data retention policies, and deletion workflows in Databricks Delta Lake using Unity Catalog and PySpark.
Develops Lakeflow Spark Declarative Pipelines on Databricks for batch and streaming data pipelines using Python or SQL. Guides dataset types like Streaming Tables and features like Auto Loader, Auto CDC via decision tree.
Develops Microsoft Fabric Spark/data engineering workflows and writes code in Fabric Notebook cells (PySpark, Scala, SparkR, SQL). Manages workspaces, lakehouses, notebooks, pipelines, and Materialized Lake Views (MLVs).