From oxylabs
Scrapes web pages and extracts structured JSON data from 40+ e-commerce and search targets with automatic anti-bot bypass and geo-targeting. Use for product data, search results, or any structured web data extraction.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oxylabs:web-scraper-apiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Requires HTTP Basic Auth with credentials from environment variables:
Requires HTTP Basic Auth with credentials from environment variables:
curl -u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" ...
POST https://realtime.oxylabs.io/v1/queries # immediate response
POST https://data.oxylabs.io/v1/queries # Push-Pull jobs, callbacks, storage
Content-Type: application/json
| Parameter | Required | Description |
|---|---|---|
source | Yes | Target scraper (e.g., universal, amazon_product, google_search) |
url | Conditional | URL to scrape (for universal and *_url sources) |
query | Conditional | Search query or product ID (for *_search and *_product sources) |
parse | No | Enable structured data parsing (recommended for supported sources) |
render | No | JavaScript rendering: html or png |
geo_location | No | Geographic targeting: country/state/city, ZIP/postcode, coordinates, or Criteria ID where supported |
session_id | No | Reuse the same proxy IP across multiple jobs |
content_encoding | No | Set to base64 when downloading image files via Realtime or Push-Pull |
user_agent_type | No | Device/browser preset, e.g., desktop_chrome, mobile_ios, tablet_android |
locale | No | Interface language / Accept-Language, e.g., de-DE |
callback_url | No | Push-Pull callback endpoint |
storage_type, storage_url | No | Push-Pull cloud upload target (gcs, s3, tos, s3_compatible) |
markdown, xhr | No | Enable markdown or captured XHR result types |
browser_instructions | No | Rendered browser actions; requires render: "html" |
parsing_instructions, parser_preset | No | Custom parser rules or saved preset; pair with parse: true |
client_notes | No | Client-side job tag saved with the job metadata |
domain, subdomain, start_page, pages, limit, store_id, delivery_zip, fulfillment_type | Source-specific | Marketplace/search/store localization and pagination fields |
user_agent_type values: desktop, desktop_chrome, desktop_edge, desktop_firefox, desktop_opera, desktop_safari, mobile, mobile_android, mobile_ios, tablet, tablet_android, tablet_ios.
Add these as { "key": "...", "value": ... } objects in context:
| Key | Use |
|---|---|
force_headers, headers | Merge custom headers with managed headers |
force_cookies, cookies | Merge custom cookies with managed cookies |
http_method, content | Use post with Base64-encoded body content |
follow_redirects | Follow 3xx redirect chains |
successful_status_codes | Treat specific non-standard HTTP codes as successful |
For multi-format output, enable types in the payload (parse, markdown, xhr, render: "png") and request them with ?type=raw,parsed,png,markdown,xhr.
For batch Push-Pull jobs, use POST /v1/queries/batch with arrays only for query or url; keep all other parameters singular. Maximum batch size is 5,000 values.
Scrape any URL:
curl -X POST 'https://realtime.oxylabs.io/v1/queries' \
-u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" \
-H 'Content-Type: application/json' \
-d '{"source": "universal", "url": "https://example.com"}'
Google search with parsing:
curl -X POST 'https://realtime.oxylabs.io/v1/queries' \
-u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" \
-H 'Content-Type: application/json' \
-d '{"source": "google_search", "query": "best laptops", "parse": true}'
Amazon product by ASIN:
curl -X POST 'https://realtime.oxylabs.io/v1/queries' \
-u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" \
-H 'Content-Type: application/json' \
-d '{"source": "amazon_product", "query": "B07FZ8S74R", "parse": true}'
amazon_product, google_search) - better parsing and reliabilityuniversal for unsupported sites - works with any URLparse: true for structured JSON output on supported sources{
"results": [{
"content": "...",
"status_code": 200,
"url": "https://..."
}]
}
With parse: true, content contains structured data (title, price, reviews, etc.) instead of raw HTML.
For the complete list of 40+ supported sources organized by category, see sources.md.
For detailed request/response examples including geo-location, JavaScript rendering, and custom headers, see examples.md.
| Code | Meaning |
|---|---|
| 200 | Success |
| 400 | Invalid parameters |
| 401 | Authentication failed |
| 403 | Access denied |
| 429 | Rate limit exceeded |
parse: true for supported sources to get structured data"90210")"California,United States")render: "html" for JavaScript-heavy pagesrender: "" only to disable automatic forced rendering for force-rendered pages; set client timeouts near 180 seconds for rendered Realtime or Proxy Endpoint requestscontent_encoding: "base64" when scraping image URLs, then decode results[0].content before saving the filenpx claudepluginhub oxylabs/agent-skills --plugin oxylabsExtracts public web data and search engine results via HasData APIs—SERP scraping, AI mode, Maps leads, and structured endpoints for ecommerce, travel, and local business data.
Scrapes web pages, SERPs, e-commerce, YouTube, and ChatGPT via ScrapingBee CLI with smart path extraction and AI rules for LLMs. Handles JS, CAPTCHAs, anti-bot automatically.
Scrapes websites, searches the web, extracts structured data from 40+ platforms (Amazon, LinkedIn, Instagram, TikTok, YouTube, Reddit), manages proxy zones, and checks account budget using the Bright Data CLI.