From datafusion-skills
Registers Parquet, CSV, JSON, Arrow IPC, or Avro files as persistent external tables in DataFusion sessions. Auto-detects format, explores schema, and persists state for reuse across skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/datafusion-skills:create-tableThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are helping the user register a data file as a persistent table in their DataFusion session.
You are helping the user register a data file as a persistent table in their DataFusion session.
File path given: $0
Additional arguments: ${1:-}
Follow these steps in order.
If $0 is a relative path, resolve it:
RESOLVED_PATH="$(cd "$(dirname "$0")" 2>/dev/null && pwd)/$(basename "$0")"
Check the file exists (for local files):
test -f "$RESOLVED_PATH" || test -d "$RESOLVED_PATH"
For directories (partitioned data), use the directory path as-is.
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion.
If --format was specified, use that. Otherwise detect from extension:
| Extension | Format |
|---|---|
.parquet, .pq | PARQUET |
.csv, .tsv, .txt | CSV |
.json, .jsonl, .ndjson | JSON |
.arrow, .ipc, .feather | ARROW |
.avro | AVRO |
| directory | PARQUET (default for partitioned data) |
If the extension is unknown, try Parquet first, then CSV.
If --name was specified, use that. Otherwise derive from the filename:
Example: My-Data File.parquet → my_data_file
Confirm the name with the user.
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
If no state directory exists, ask the user where to store state (same as other skills):
- In the project directory (
.datafusion-skills/)- In your home directory (
~/.datafusion-skills/<project-id>/)
mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"
Build the CREATE EXTERNAL TABLE statement:
For Parquet:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS PARQUET LOCATION '<RESOLVED_PATH>';
For CSV:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS CSV LOCATION '<RESOLVED_PATH>' OPTIONS ('has_header' 'true');
For JSON:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS JSON LOCATION '<RESOLVED_PATH>';
For Arrow IPC:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS ARROW LOCATION '<RESOLVED_PATH>';
For Avro:
CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS AVRO LOCATION '<RESOLVED_PATH>';
Test it:
datafusion-cli --file "$STATE_DIR/state.sql" -c "
<CREATE_STATEMENT>
DESCRIBE <table_name>;
SELECT COUNT(*) AS row_count FROM <table_name>;
SELECT * FROM <table_name> LIMIT 5;
"
Check if this table is already in the state file:
grep -q "<table_name>" "$STATE_DIR/state.sql" 2>/dev/null
If not present, append:
cat >> "$STATE_DIR/state.sql" <<'SQL'
-- Table: <table_name> (<FORMAT> from <RESOLVED_PATH>)
<CREATE_STATEMENT>
SQL
Summarize:
<table_name>This table is now available in all
/datafusion-skills:querysessions. Try:/datafusion-skills:query SELECT * FROM <table_name> LIMIT 10
npx claudepluginhub datafusion-contrib/datafusion-skills --plugin datafusion-skillsRuns SQL queries or natural language questions against registered tables or ad-hoc on Parquet, CSV, JSON, Arrow IPC files using datafusion-cli.
Ingests CSV/JSON/Parquet files into managed AIDP Delta tables via the `aidp` CLI (one-step or three-step upload→infer→create). Use when the user says "load this file into a table" or "create a table from a file."
Executes raw SQL or natural language queries against attached DuckDB databases or ad-hoc files. Manages session state, schema retrieval, and result size estimation.