From workflows
Queries the Dewey Data academic data marketplace for foot traffic, POI, mobility, consumer, and real estate datasets via API key. Downloads partitioned Parquet/CSV files for local analysis with DuckDB.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:deweyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- [What Dewey Is](#what-dewey-is)
Dewey Data is an academic data marketplace — one institutional Platform Subscription unlocks a catalog of ~300 datasets from ~40 providers (foot traffic, POI, mobility, consumer transactions, real estate, labor). UVA Library and NYU both hold the institutional subscription; SafeGraph and most providers are free under it.
Dewey is not a SQL warehouse like WRDS. Data is delivered as partitioned Parquet/CSV.gz files downloaded via an API key. You discover datasets, read metadata, sample, filter (by date partition + columns), then download. Think "S3 of presigned Parquet links," not "PostgreSQL."
| WRDS | Dewey | |
|---|---|---|
| Data | Finance/accounting | POI, foot traffic, mobility, consumer, real estate |
| Access | PostgreSQL / SAS on the grid | File download (Parquet/CSV.gz) via API key |
| Query engine | server-side SQL | DuckDB over the files (local or remote presigned URLs) |
| Licensing | per-vendor, negotiated | one platform subscription unlocks the catalog |
| AI access | none | MCP server (api.deweydata.io/mcp) |
apikey = "your_api_key" and run it — it will 401 and waste a round trip. Read from DEWEY_API_KEY env var or a gitignored file (~/.config/dewey/apikey).Guessing or hardcoding the key is NOT HELPFUL — every call 401s, and a committed key is a security incident the user must rotate.
Each product (dataset) has its own product path / project ID (prj_…), obtained from the dataset page: Get Data → (Skip filtering) → Connect to API / Bulk API → API URL. One API key, many product paths. If you don't have the product path, discover it via the MCP server (search_datasets) rather than guessing.
Before downloading ANY Dewey dataset, you MUST:
get_meta (deweydatapy) / get_download_info (MCP) to learn partition columns, date range, file count, total sizeread_sample / MCP sample_dataset) and INSPECT the schema before committing to a full pullpartition_key_after/before) AND columns; for selective pulls use DuckDB COPY TO over the presigned URLs, never download the whole catalogThis is not negotiable. Skipping the sample-and-filter step is NOT HELPFUL — Dewey datasets are routinely hundreds of GB to multiple TB; an unfiltered pull burns hours of bandwidth and disk for data you'll immediately throw away.
COPY TO with a WHERE clause on the remote parquet to pull only the rows/columns you need.naics_code vs NAICS_CODE; opened_on may not exist at all). A full pull against guessed columns is the exact incompetence the sample step exists to prevent — read_sample(nrows=100) BEFORE the full pull.partition_key_after/before to the study window.download_files0). For large multi-day pulls use download_files1 (page-by-page, refreshes links) — a long job on download_files0 dies mid-pull.prj_ product path 404s or returns someone else's data. Get the path from Connect to API or MCP search_datasets; hardcoding a guessed path is an unverified claim presented as fact.deweypy.get_dataset_files, not deweydatapy.get_meta/get_file_list — the latter's external-api/v3 endpoint is dead (returns non-JSON / 500 → JSONDecodeError), confirmed 2026-06-10. See references/deweypy-client.md.COPY read_csv([...]). For filtered pulls: chunk (~20 files), retry per chunk re-minting fresh URLs, fall back to per-file skip; restartable via per-chunk parquet. Set SET http_timeout=120000; SET http_retries=3;. Worked example in references/deweypy-client.md.prj_ path mints. Don't auto-accept a provider license without the user's OK.claude mcp add … dewey-prod, the search_datasets/sample_dataset/etc. tools are NOT available in the current session — start a new session to use them.download_files* without first calling get_meta + read_sample → STOP. Meta + sample first.start_date/end_date / partition filter → STOP. Scope the date range.COPY TO … (FORMAT PARQUET, PARTITION_BY …) to persist a filtered subset to disk.apikey="your_api_key" or any guessed key → STOP. Ask the user; read from env/file.| Need | Method | Reference |
|---|---|---|
| Discover/search datasets, check schema, sample — from inside Claude | MCP server (api.deweydata.io/mcp) | references/mcp.md |
| Scripted Python bulk download | deweypy (recommended) or deweydatapy (legacy, product_path API) | references/deweypy-client.md |
| Selective pull — specific columns/rows from huge datasets | DuckDB over presigned URLs (read_parquet($urls) + COPY TO) | references/duckdb.md |
| R workflow | deweyr (download_dewey()) | references/deweypy-client.md |
| One-off, dataset < 2.0 GB | UI CSV download (platform → project) | references/access-options.md |
| Analyze data already on disk | DuckDB / pandas / polars over *.parquet or *.csv.gz | references/access-options.md |
Get the key once from app.deweydata.io → Connections → Add Connection → API Key. Store it out of source control:
mkdir -p ~/.config/dewey && echo 'YOUR_KEY' > ~/.config/dewey/apikey && chmod 600 ~/.config/dewey/apikey
# or: export DEWEY_API_KEY=... (add to .envrc, which should be gitignored)
import os, pathlib
apikey = os.environ.get("DEWEY_API_KEY") or pathlib.Path("~/.config/dewey/apikey").expanduser().read_text().strip()
Institutional login (to browse the catalog / create the key) is via UVA NetBadge (use your UVA email) or NYU SSO. The Platform Subscription is what makes SafeGraph etc. free — see references/datasets.md.
| Provider | Dataset(s) | What it is |
|---|---|---|
| SafeGraph | Global Places (POI), Geometry, Spend, Patterns | POI master, building footprints, card spend, foot-traffic visit patterns |
| Advan Research | Monthly/Weekly Patterns, Home Panel | Foot traffic aggregated to place & census-block |
| dataplor | POI | Global POI, strong emerging-markets coverage |
| Veraset | Movement | Device-level mobility (institutional license only) |
| PassBy | Foot Traffic | Per-POI foot-traffic analytics |
| Consumer Edge / PDI | Spend / transactions | Card & product-level purchasing |
| LinkUp | Job postings | Labor-market activity |
| ATTOM / Dwellsy / RentHub | Real estate | Property records, rentals |
Full catalog (all ~250 datasets): references/catalog.md — every dataset grouped by category with time coverage, row count, size, and download access (machine-readable: references/catalog.csv). Featured-dataset detail + discovery workflow: references/datasets.md.
Core POI schema — columns are UPPERCASE, NAICS_CODE is a string, BRANDS is a JSON-array string (extract with json_extract_string(BRANDS,'$[0].safegraph_brand_name')). Always sample before filtering.
| Column | Meaning |
|---|---|
PLACEKEY | Stable unique POI id (join key across SafeGraph products) |
LOCATION_NAME | POI name |
BRANDS | JSON array: [{"safegraph_brand_name":"…"}] — not plain text |
STREET_ADDRESS,CITY,REGION,POSTAL_CODE,ISO_COUNTRY_CODE | Address (REGION=US state) |
LATITUDE,LONGITUDE | Coordinates |
NAICS_CODE,NAICS_CODE_2022 | 6-digit NAICS (string) |
TOP_CATEGORY,SUB_CATEGORY | Category labels |
OPENED_ON,CLOSED_ON,TRACKING_CLOSED_SINCE | Open/close dates (exist but sparsely populated — NULL for BTMs) |
Resolved empirically: crypto/Bitcoin ATMs do exist as standalone POIs under NAICS_CODE='522320'; all major operators are present. But OPENED_ON/CLOSED_ON are NULL for BTMs in the current release → it's a cross-section, not a time series. Full details, the 7 BTM operators, and the worked example: references/safegraph-places.md and examples/btm_safegraph_pull.py.
references/access-options.md — all download methods (UI, deweypy, deweydatapy, DuckDB, MCP, R), 24h link expiry, partitioning, reading data on diskreferences/deweypy-client.md — deweypy (modern CLI + auth/download) and deweydatapy (get_meta, get_file_list, read_sample, download_files0/1) function reference; deweyr for Rreferences/duckdb.md — selective remote-Parquet pulls, COPY TO … PARTITION_BY pattern, querying downloaded filesreferences/mcp.md — Dewey MCP server URL, JSON config, the 9 tools, discovery → schema → sample workflowreferences/datasets.md — featured-dataset catalog, UVA NetBadge / NYU institutional access, discovery workflowreferences/catalog.md + catalog.csv — full enumerated catalog (~250 datasets / 39 partners) by category, with coverage / rows / column count / size / accessreferences/schemas.json — full column schemas for all ~250 datasets (keyed by slug → columns[] with name/type/description; 11,264 columns). Look up a dataset's columns here before pulling, instead of a live get_dataset_schema callreferences/linkage.md — cross-dataset join-key map (placekey, ticker, cusip/cik, domain, person id, lat/long, fips, zip…) — which datasets combine and on what spinereferences/safegraph-places.md — Global Places schema, NAICS 522320, BTM operator brands, opened_on/closed_on, the Bitcoin-ATM worked exampleexamples/btm_safegraph_pull.py — acceptance test: filter SafeGraph Global Places to the 7 BTM operator brands + NAICS 522320, verify standalone-POI / open-close coverage, export the US subset to ~/projects/batm/npx claudepluginhub edwinhu/workflows --plugin workflowsQueries OpenData API datasets via REST for row fetching, filtering, sorting, aggregation, column inspection, and metadata retrieval. Use for data research, analysis, and pipelines on Parquet files.
Queries public statistical data from Data Commons (demographics, economics, health, environment) via the Python API v2. Use for population, GDP, unemployment, disease prevalence, and geographic entity resolution.
Access Bright Data datasets, Web Archive search/dump, Web Unlocker zones, and FINRA/SEC coverage. Only for users with a Bright Data account and token.