By zytedata
Orchestrate end-to-end web scraping workflows: explore sites, define extraction schemas, generate Scrapy spiders with web-poet page objects, validate data quality, and deploy to Zyte Cloud—all from within Claude Code.
Add an empty web-poet page object to a Scrapy project
Extract structured data (all available fields with values) from a page saved locally as an HTML file, optionally following a schema. Use this skill only to process already downloaded files. Do not invoke when the user provides a URL. When invoking, pass the user's full request verbatim as args — do not pre-parse file paths and don't rephrase it.
Analyze an HTML page to produce field extraction instructions for code generation
Generate web-poet page object code from per-page extraction analyses
Generate web-poet page object code from an extraction spec
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
From a plain-English prompt to a working Scrapy spider.
<a href="https://github.com/zytedata/claude-skills">
<img src="https://img.shields.io/github/stars/zytedata/claude-skills?style=social" alt="GitHub stars">
</a>
Not using exclusively Claude Code? See Zyte Coding Agent Add-Ons for alternatives.
claude plugin marketplace add zytedata/claude-skills
claude plugin install zyte-web-data@zyte-ai
If Claude Code is already running, reload plugins in the active session:
/reload-plugins
If /reload-plugins isn't available (e.g. in the VS Code extension), restart Claude Code.
See also: Discovering and installing plugins
This is Zyte's official Claude Code plugin that generates production-ready Scrapy spiders with web-poet page objects from a plain-English prompt. Give it a URL and describe what you want to extract. It handles site exploration, schema discovery, code generation, and smoke testing: no boilerplate, no manual selector hunting.
The plugin explores the target site, discovers available fields, and presents a schema for your approval before generating a single line of code. After you confirm the schema, it creates a Scrapy project with all dependencies configured, generates web-poet page objects and test fixtures, wires up the spider, and runs a smoke test to verify that extraction is working before handing the project back to you.
Optionally, use /scrape-scrapy-cloud to deploy directly to Scrapy Cloud for scheduled runs, job history, and monitoring. A free tier is available.
The /scrape skill works on any website with repeating structured content: detail pages linked from a listing or category page. Examples from the skill:
The /scrape skill orchestrates five stages automatically:
1. Decide which fields to extract → /scrape-define
2. Analyze the website → /scrape-spec
3. Create the Scrapy project → /scrape-ensure-project
4. Generate the extraction code → /scrape-codegen
5. Generate the spider → /scrape-create-spider
Each stage feeds directly into the next. When the pipeline completes, you have a runnable spider and a passing test suite:
uv run scrapy crawl <spider_name>
uv run pytest fixtures/
| Skill | Description |
|---|---|
scrape | End-to-end web scraping workflow — from URL to working spider with web-poet page objects |
/scrape)| Skill | Description |
|---|---|
scrape-define | Quick schema definition: explore one detail page, discover fields, fast approval loop |
scrape-spec | Explore diverse pages and validate the extraction spec: downloads pages, compares variants, optional browser review |
scrape-explore-site | Explore a website to find and save diverse pages (start, list, detail) with classified links |
scrape-analyze-page | Extract all available fields with values from a detail page |
scrape-ensure-project | Ensure a Scrapy project exists with scrapy-poet and Zyte API support |
scrape-codegen | Generate web-poet page object code from an extraction spec |
scrape-codegen-analyze | Analyze an HTML page to produce field extraction instructions for code generation |
scrape-codegen-generate | Generate web-poet page object code from per-page extraction analyses |
scrape-create-spider | Generate a Scrapy spider that wires page objects together |
| Skill | Description |
|---|---|
scrape-add-page-object | Add an empty web-poet page object to a Scrapy project |
scrape-review-schema | Generate an HTML review page for schema and extracted data verification |
| Skill | Description |
|---|---|
scrape-scrapy-cloud | Deploy projects, schedule spiders, list/stop jobs, and view items or logs on Scrapy Cloud |
scrape-zyte-login | Set up your Zyte account and credentials |
npx claudepluginhub zytedata/claude-skills --plugin zyte-web-dataZyte Web Data for Claude Code
Official Apify agent skills for web scraping, data extraction, and automation
Claude Code skill pack for FireCrawl (30 skills)
Scrape, search, crawl, and map the web with a single command.
Firecrawl v2.5 API for web scraping/crawling to LLM-ready markdown. Use for site extraction, dynamic content, or encountering JavaScript rendering, bot detection, content loading errors.
Scrape, crawl, map, search, parse, extract, and change-track the web with fastCRW — the open-source, self-hostable Firecrawl alternative. Single Rust binary, ~6 MB RAM, Firecrawl-compatible /v1 + /v2 API, bundled SearXNG search.
The best web scraping tool for LLMs. USE --smart-extract to give your AI agent only the data it needs from any web page — extracts from JSON/HTML/XML/CSV/Markdown using path language with recursive search, filters, and regex. Handles JS, CAPTCHAs, anti-bot automatically. AI extraction in plain English. Google/Amazon/Walmart/YouTube/ChatGPT APIs. Batch, crawl, cron scheduling.