Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request.
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
bun.lockpackage.jsonscripts/start-server.tsserver.shsrc/client.tssrc/index.tssrc/snapshot/__tests__/snapshot.test.tssrc/snapshot/browser-script.tssrc/snapshot/index.tssrc/snapshot/inject.tssrc/types.tstsconfig.jsonvitest.config.tsBrowser automation that maintains page state across script executions. Write small, focused scripts to accomplish tasks incrementally. Once you've proven out part of a workflow and there is repeated work to be done, you can write a script to do the repeated work in a single execution.
Local/source-available sites: If you have access to the source code (e.g., localhost or project files), read the code first to write selectors directly—no need for multi-script discovery.
Unknown page layouts: If you don't know the structure of the page, use getAISnapshot() to discover elements and selectSnapshotRef() to interact with them. The ARIA snapshot provides semantic roles (button, link, heading) and stable refs that persist across script executions.
Visual feedback: Take screenshots to see what the user sees and iterate on design or debug layout issues.
First, start the dev-browser server using the startup script:
./skills/dev-browser/server.sh &
The script will automatically install dependencies and start the server. It will also install Chromium on first run if needed.
The server script accepts the following flags:
--headless - Start the browser in headless mode (no visible browser window). Use if the user asks for it.Wait for the Ready message before running scripts. On first run, the server will:
tmp/ directory for scriptsprofiles/ directory for browser data persistenceThe first run may take longer while dependencies are installed. Subsequent runs will start faster.
Important: Scripts must be run with bun x tsx (not bun run) due to Playwright WebSocket compatibility.
The server starts a Chromium browser with a REST API for page management (default: http://localhost:9222).
Execute scripts inline using heredocs—no need to write files for one-off automation:
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect } from "@/client.js";
const client = await connect("http://localhost:9222");
const page = await client.page("main");
// Your automation code here
await client.disconnect();
EOF
Only write to tmp/ files when:
Use the @/client.js import path for all scripts.
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect("http://localhost:9222");
const page = await client.page("main"); // get or create a named page
// Your automation code here
await page.goto("https://example.com");
await waitForPageLoad(page); // Wait for page to fully load
// Always evaluate state at the end
const title = await page.title();
const url = page.url();
console.log({ title, url });
// Disconnect so the script exits (page stays alive on the server)
await client.disconnect();
EOF
"checkout", "login", "search-results"await client.disconnect() at the end of your script so the process exits cleanly. Pages persist on the server.page.evaluate() callbacks—never TypeScript. The code runs in the browser which doesn't understand TS syntax.bun x tsx which transpiles TypeScript but does NOT type-check. Type errors won't prevent execution—they're just ignored.page.evaluate(), page.evaluateHandle(), or similar methods runs in the browser. Use plain JavaScript only:// ✅ Correct: plain JavaScript in evaluate
const text = await page.evaluate(() => {
return document.body.innerText;
});
// ❌ Wrong: TypeScript syntax in evaluate (will fail at runtime)
const text = await page.evaluate(() => {
const el: HTMLElement = document.body; // TS syntax - don't do this!
return el.innerText;
});
Follow this pattern for complex tasks:
const client = await connect("http://localhost:9222");
const page = await client.page("name"); // Get or create named page
const pages = await client.list(); // List all page names
await client.close("name"); // Close a page
await client.disconnect(); // Disconnect (pages persist)
// ARIA Snapshot methods for element discovery and interaction
const snapshot = await client.getAISnapshot("name"); // Get ARIA accessibility tree
const element = await client.selectSnapshotRef("name", "e5"); // Get element by ref
The page object is a standard Playwright Page—use normal Playwright methods.
Use waitForPageLoad(page) after navigation (checks document.readyState and network idle):
import { waitForPageLoad } from "@/client.js";
// Preferred: Wait for page to fully load
await waitForPageLoad(page);
// Wait for specific elements
await page.waitForSelector(".results");
// Wait for specific URL
await page.waitForURL("**/success");
Take screenshots when you need to visually inspect the page:
await page.screenshot({ path: "tmp/screenshot.png" });
await page.screenshot({ path: "tmp/full.png", fullPage: true });
Use getAISnapshot() when you don't know the page layout and need to discover what elements are available. It returns a YAML-formatted accessibility tree with:
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect("http://localhost:9222");
const page = await client.page("main");
await page.goto("https://news.ycombinator.com");
await waitForPageLoad(page);
// Get the ARIA accessibility snapshot
const snapshot = await client.getAISnapshot("main");
console.log(snapshot);
await client.disconnect();
EOF
The snapshot is YAML-formatted with semantic structure:
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- link "past" [ref=e3]
- link "comments" [ref=e4]
- link "ask" [ref=e5]
- link "submit" [ref=e6]
- link "login" [ref=e7]
- main:
- list:
- listitem:
- link "Article Title Here" [ref=e8]
- text: "528 points by username 3 hours ago"
- link "328 comments" [ref=e9]
- contentinfo:
- textbox [ref=e10]
- /placeholder: "Search"
button, link, textbox, heading, listitem, etc.link "Click me", button "Submit"[ref=eN] - Element reference for interaction. Only assigned to visible, clickable elements[checked] - Checkbox/radio is checked[disabled] - Element is disabled[expanded] - Expandable element (details, accordion) is open[level=N] - Heading level (h1=1, h2=2, etc.)/url: - Link URL (shown as a property)/placeholder: - Input placeholder textUse selectSnapshotRef() to get a Playwright ElementHandle for any ref:
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect("http://localhost:9222");
const page = await client.page("main");
await page.goto("https://news.ycombinator.com");
await waitForPageLoad(page);
// Get the snapshot to see available refs
const snapshot = await client.getAISnapshot("main");
console.log(snapshot);
// Output shows: - link "new" [ref=e2]
// Get the element by ref and click it
const element = await client.selectSnapshotRef("main", "e2");
await element.click();
await waitForPageLoad(page);
console.log("Navigated to:", page.url());
await client.disconnect();
EOF
If a script fails, the page state is preserved. You can:
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect } from "@/client.js";
const client = await connect("http://localhost:9222");
const page = await client.page("main");
await page.screenshot({ path: "tmp/debug.png" });
console.log({
url: page.url(),
title: await page.title(),
bodyText: await page.textContent("body").then((t) => t?.slice(0, 200)),
});
await client.disconnect();
EOF