Connect Oracle AIDP Spark notebooks to 23+ enterprise data sources (Oracle DB, AWS S3, Azure ADLS, Salesforce, Snowflake, PostgreSQL, MySQL, etc.) for read/write data pipelines using JDBC, REST, and cloud-native connectors.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Connect from an AIDP notebook to Oracle AI Lakehouse (ALH), Autonomous Data Warehouse (ADW), or Autonomous Transaction Processing (ATP) via Spark JDBC. Use when the user mentions ALH, AI Lakehouse, ADW, ATP, Autonomous Database, or wants to query a 26ai-backed Oracle Autonomous DB from Spark. Covers wallet (mTLS), IAM DB-Token (with on-executor refresh for long jobs), and API Key auth paths.
Read and write AWS S3 (`s3a://`) from an AIDP notebook. Use when the user mentions S3, AWS S3 bucket, s3a, or has AWS access keys. Auth is access key + secret key via the Hadoop S3A connector. boto3 is also available for non-Spark management operations (list, copy).
Read and write Azure Data Lake Storage Gen2 (`abfss://`) from an AIDP notebook. Use when the user mentions ADLS, Azure Data Lake, abfss, or wants to ingest from a multi-cloud Azure source. Auth is OAuth client-credentials (Service Principal client_id + secret + tenant).
First-time setup. Use when the user wants to install/upload the AIDP Spark connectors helper package into their AIDP workspace, or has just installed this plugin and asks "how do I set it up", "first-time setup", "install the helpers", "bootstrap aidp connectors". Drives the AIDP MCP tools to push the helper package to /Workspace/Shared/ and runs a sanity import.
Help the user pick the right connector skill for their data source from an AIDP notebook. Use as a router when the user mentions multiple sources, isn't sure which connector applies, or asks "how do I connect to X from AIDP". Covers 23 data sources — Oracle Autonomous DB family (ALH/ADW/ATP), generic Oracle DB, ExaCS, PeopleSoft, Siebel, Fusion ERP/BICC, EPM Cloud, Essbase, OCI Streaming, Object Storage, Iceberg, plus PostgreSQL, MySQL/HeatWave, SQL Server, Hive, Snowflake, Azure ADLS, AWS S3, Salesforce, generic REST, custom JDBC, Excel.
This repository contains a curated collection of sample notebooks demonstrating how to build data pipelines, run machine learning workloads, and integrate AI capabilities using Oracle AI Data Platform (AIDP) Workbench — a unified, governed workspace for data engineering, ML, and AI development powered by Apache Spark.
Oracle AI Data Platform Workbench is a unified, governed workspace for building, managing, and deploying AI and data-driven solutions. It brings together notebooks, agent development, orchestration, and catalog management in a single collaborative platform — empowering teams to explore data, fine-tune models, and operationalize AI with trust and speed.
Learn more about AIDP Workbench →
oracle-aidp-samples/
├── getting-started/ # Foundational notebooks for new users
│ ├── Delta_Lake/ # Delta Lake feature walkthroughs
│ └── migration/ # Migrating workloads to AIDP
├── data-engineering/
│ ├── ingestion/ # Connectors and data loading patterns
│ └── transformation/ # Pipeline architectures and table formats
│ ├── liquid-clustering/
│ ├── medallion-lake/
│ ├── scd/
│ └── streaming/
├── ai/
│ ├── agent-flows/ # Agent orchestration and scheduling
│ └── ml-datascience/ # ML, LLM, and AI service integrations
└── shared-utils/ # Reusable utilities and data generators
Foundational examples to help you get up and running on AIDP Workbench.
| Notebook | Description |
|---|---|
| Access ALH Data | Write and query data in Oracle Autonomous AI Lakehouse (ALH) using PySpark insertInto and SQL INSERT statements with external catalogs. |
| Access Object Storage Data | Read and write data from OCI Object Storage using direct access, external volumes, and external tables. |
| Analyse Data Using PySpark | PySpark fundamentals: catalog and schema setup, table creation, data insertion, schema exploration, and matplotlib visualizations. |
| Analyse Data Using SQL | Core SQL operations on AIDP including DataFrame creation, transformations, aggregations, and simple visualizations. |
| ALH External Catalog MERGE | End-to-end MERGE workflow into an ALH table via an AIDP external catalog: insert/update/delete with merge keys and OOS-staging skip optimization. |
| Notebook | Description |
|---|---|
| Use Delta Lake Table | Comprehensive guide covering Delta table operations: updates, merges, time travel, liquid clustering, and vacuuming. |
| Delta Change Data Feed | Capture row-level changes (inserts, updates, deletes) from Delta tables for CDC, incremental processing, and streaming pipelines. |
| Handle Schema Evolution | Add and evolve columns in Delta tables without rewriting existing data, leveraging automatic schema evolution. |
| Delta UniForm Tables | Create Delta UniForm tables that automatically synchronize Iceberg metadata for cross-format interoperability. |
| Notebook | Description |
|---|---|
| Migrate Files from Databricks to AIDP | Recursively export notebooks and files from a Databricks workspace to AIDP using the databricks-sdk library. |
| Download from Git to AIDP | Download notebooks and files from a Git repository as a ZIP archive and extract them directly into an AIDP workspace volume. |
Patterns for connecting to and loading data from a wide range of sources.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimnpx claudepluginhub anthropics/claude-plugins-official --plugin oracle-ai-data-platform-workbench-spark-connectorsOperate the entire Oracle AI Data Platform (AIDP) Workbench in natural language — a 37-skill agent (not a single-engine orchestrator). Discovers your catalog into a grounding cache (FK/join hints + value dictionaries), turns plain English into accurate Spark SQL, and runs the full lakehouse SQL lifecycle (CREATE/INSERT/UPDATE/DELETE/MERGE/OPTIMIZE/VACUUM/DESCRIBE HISTORY/time-travel on the Spark/Delta lakehouse). Ingests files, profiles data and sets quality rules, authors and repairs cron pipelines, provisions clusters (Compute/AI Compute), and debugs via the Spark UI. Governs the platform (roles + per-resource permissions, credential store, Delta Sharing, audit logs, user settings; plus native Git, bundles, and MLOps/MLflow in Preview) and ships AI — Agent Flows across 13 node types with guardrails (content moderation, prompt-attack prevention, PII detection), Knowledge Base RAG, high-code LangGraph/aidputils agents, and reusable Tools. A semantic model + verified-query repository are matched before free generation for accuracy. Signature differentiators: LLM-in-SQL via ai_generate('openai.gpt-5.4', '<prompt>') and cross-source federation in one Spark session. Runs via the official Oracle aidp CLI with an oci raw-request REST fallback — under either api_key or oci-session-token auth. Additive to your Oracle stack.
Operate the entire Oracle AI Data Platform (AIDP) Workbench in natural language — a 37-skill agent (not a single-engine orchestrator). Discovers your catalog into a grounding cache (FK/join hints + value dictionaries), turns plain English into accurate Spark SQL, and runs the full lakehouse SQL lifecycle (CREATE/INSERT/UPDATE/DELETE/MERGE/OPTIMIZE/VACUUM/DESCRIBE HISTORY/time-travel on the Spark/Delta lakehouse). Ingests files, profiles data and sets quality rules, authors and repairs cron pipelines, provisions clusters (Compute/AI Compute), and debugs via the Spark UI. Governs the platform (roles + per-resource permissions, credential store, Delta Sharing, audit logs, user settings; plus native Git, bundles, and MLOps/MLflow in Preview) and ships AI — Agent Flows across 13 node types with guardrails (content moderation, prompt-attack prevention, PII detection), Knowledge Base RAG, high-code LangGraph/aidputils agents, and reusable Tools. A semantic model + verified-query repository are matched before free generation for accuracy. Signature differentiators: LLM-in-SQL via ai_generate('openai.gpt-5.4', '<prompt>') and cross-source federation in one Spark session. Runs via the official Oracle aidp CLI with an oci raw-request REST fallback — under either api_key or oci-session-token auth. Additive to your Oracle stack.
Databricks development toolkit with skills for data engineering, ML, and AI agents plus MCP tools for direct Databricks operations
Claude Code skill pack for Databricks (24 skills)
This plugin provides a specialized suite of skills for data engineers and database practitioners working on Google Cloud. It acts as an expert assistant, allowing you to use natural language prompts in your preferred coding agent to architect complex data pipelines, transform data with dbt, write Spark and BigQuery SQL notebooks, and orchestrate end-to-end workflows across GCP's data ecosystem.
Spec-Driven Development framework for Data Engineering — 58 agents, 24 KB domains, 5-phase SDD workflow, 31 commands
Data lake, analytics, and ETL workflows with S3 Tables, AWS Glue, and Athena. Covers managed Iceberg tables on S3 Tables, ingestion from JDBC databases (Oracle, SQL Server, PostgreSQL, MySQL, RDS), Amazon Redshift, Snowflake, BigQuery, and DynamoDB, AWS Glue Data Catalog inventory and asset discovery, federated Athena queries, and vector storage and semantic search on Amazon S3 Vectors.