From s1-secops-skills
Authors, debugs, and runs SentinelOne PowerQuery (PQ) for threat hunting, EDR investigations, and detection rules. Triggered by PQ, Deep Visibility, or security telemetry queries.
How this skill is triggered — by the user, by Claude, or both
Slash command
/s1-secops-skills:powerqueryThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
PowerQuery (PQ) is SentinelOne's pipeline query language for the Singularity Data Lake. It reads like `filter | command | command | …` — events that match the initial filter flow through a sequence of piped transformations (group, let, join, sort, columns, etc.).
examples/behavioral-baselines.mdexamples/detection-library.mdexamples/investigations.mdexamples/o365-email-hunting.mdreferences/automatic-lookups.mdreferences/commands-reference.mdreferences/datasource-command.mdreferences/detection-rules.mdreferences/fields-and-schema.mdreferences/functions-reference.mdreferences/lrq-api.mdreferences/o365-fields.mdreferences/pitfalls.mdreferences/syntax-and-operators.mdPowerQuery (PQ) is SentinelOne's pipeline query language for the Singularity Data Lake. It reads like filter | command | command | … — events that match the initial filter flow through a sequence of piped transformations (group, let, join, sort, columns, etc.).
Use this skill to write correct, efficient, runnable PowerQueries for threat hunting, investigations, detection rule bodies, and dashboards.
Sandbox proxy blocked? If the LRQ API at
POST /sdl/v2/api/querieson your console host fails with a connection or proxy error inside the Claude sandbox, use thes1-secops-mcpserver instead. It runs locally vianodeand bypasses the sandbox proxy entirely. Setup: add it toclaude_desktop_config.json(sees1-secops-mcp/README.md). The MCP server exposespowerquery_run,powerquery_enumerate_sources, andpowerquery_schema_discover— all running through the LRQ API on your machine.
When the user asks you to write or investigate with a PowerQuery:
filter | group | sort | limit | columns as the default shape — it's what most real investigations need.POST /sdl/v2/api/queries on the tenant's console URL. LRQ is the fastest, highest-limit, most reliable path for any programmatic use and supersedes both /api/powerQuery and the Deep Visibility /dv/events/pq endpoint (both deprecated; sunset Feb 15 2027). It is async, supports cursor paging to essentially unlimited rows, has a 100 req/sec per-account cap, and lets you parallelize across time slices. Reach for the Purple MCP powerquery tool only for a single quick exploratory check when no API client is already wired up. See "Running queries (LRQ API by default)" below and references/lrq-api.md for the canonical runner, body schema, auth, rate limits, and the gotchas that make it fail silently with 0 rows. If the user's request is clear and low-risk (read-only query), just run it; don't ask permission.powerquery tool and it timed out or returned a server error (common for anything past 24h or with wide initial filters), don't retry and don't shrink the range to fit the MCP budget - switch to the LRQ API path (see "Fallback" under Running queries below).savelookup, transpose, compare), explain why you chose it.initial-filter-expression
| command
| command
| …
Initial filter (everything before the first |) is the only place where * contains "x" and * matches "regex" multi-field search works. It can be empty — start the query with | and it is treated as "all events" (e.g., | group ct=count() by event.type).
Commands (each starts with |):
filter expr — keep matching rows (initial filter implicit)columns f1, "Renamed f2"=f2, … — select, rename, compute output columns (creates a new record set — previous fields not accessible after)let f = expr, … — add computed fields without discarding existing onesgroup agg(x), name2=agg2(y) by f1, "grouped name"=f2 — aggregate; also creates a new record setsort +f1, -f2 — - = descendinglimit N — truncate (default shows 10 without it; output is capped at 1,000 rows if no limit/group)parse "…$field$…" from srcField — extract fields from unstructured textlookup col, … from tableName by key=expr — join against a CSV/JSON config data tabledataset 'config://datatables/<name>' — read a lookup table as the source of the pipelinedatasource <name> [from <dataset>] — read SentinelOne-managed inventory outside the event store (assets, alerts, vulnerabilities, misconfigurations, metering); the only PQ path to the Asset Inventory / AD identity attributes. See references/datasource-command.mdsavelookup 'tableName'[, 'merge'] — persist current result as a reusable lookup table| [inner|left|outer|sql inner|sql left|sql outer] join (q1), (q2), … on k1, a.x = b.y — correlate subqueries (must start | join, not just join)| union (q1), (q2), … — merge heterogeneous result sets (up to 10 queries; use when filter (…or…) can't express it)| transpose colName on keyCol, … — pivot a column into many columns (must be LAST command)| compare [name=]timeshift('-1w') — re-run the same query shifted in time and put both in one table (must be LAST command; only one compare allowed)| top K agg(x) by f1, f2 — probabilistic top-N (fast on huge ranges; count()/sum() are "(estimated)", min/max exact)| nolimit — raise the row cap to 3 GB (slow; one concurrent nolimit query at a time; never use in Dashboards or PowerQuery Alerts)Expressions use standard operators: = / == / !=, < / <= / > / >=, && / || / ! (or AND / OR / NOT), ternary a ? b : c (put spaces around the :), arithmetic + - * / %, and these text operators:
| Operator | Meaning |
|---|---|
x contains 'sub' | substring (case-insensitive) — also contains ('a','b','c') for OR |
x contains:matchcase 'Sub' | case-sensitive substring |
x matches 'regex' | regex (case-insensitive, double-escape) — matches ('a','b') for OR |
x matches:matchcase '…' | case-sensitive regex |
x in ('a','b',123,true) | exact equals any; case-sensitive; in:anycase for case-insensitive; does NOT match null |
x = * | field is present/non-null |
!(x = *) | field is null/missing |
$"regex" | shorthand for message matches "regex" (initial filter only) |
#shortcut = 'value' | pre-defined multi-field shortcut (e.g., #ip, #hash, #name, #cmdline, #storylineid, #username) |
Strings need quotes ('foo' or "foo"); numbers and booleans don't. Underscores in numbers are OK for readability (1_000_000).
These function names do not exist in PowerQuery. Using any of them produces Unknown function '<name>' (HTTP 500). Do not invent plausible-sounding names — if a function isn't in references/functions-reference.md, it doesn't exist.
| Do NOT write | Write this instead |
|---|---|
formattime(...) | strftime(ts) / strftime(ts, pattern) |
formatdate(...) | strftime(ts) / simpledateformat(ts, pattern) |
floor_time(...) | bucket=timebucket(unit) in group by |
date_trunc(...) | timebucket(unit) |
coalesce(a, b) | a ? a : b (bare-field ternary) |
ifnull(a, b) | a ? a : b |
if(cond, a, b) inside aggregates | count(predicate) |
percentile(x, N) | p50(x) / p95(x) / p99(x) |
first(x) / last(x) | min_by(x, timestamp) / max_by(x, timestamp) |
sort count desc / sort field asc | sort -count / sort +field — PowerQuery uses -/+ prefix, NOT SQL-style desc/asc. Using desc or asc causes HTTP 500 "Unable to parse the entire query". Purple AI frequently generates this wrong — always correct before running. |
`field.name` (backtick-quoted identifiers) | field.name — dotted field names are written bare, no backticks. Using backtick quoting returns HTTP 500 "Don't understand [`]". |
The only valid date/time functions are: strftime, simpledateformat, strptime, simpledateparse, timebucket, querystart, queryend, queryspan.
These are where queries go wrong. Internalize them before writing.
* alone is NOT a valid initial filter. * | limit 5 returns a 500 error ("Don't understand [*]"). There are three distinct * idioms — pick the right one for your intent:
dataSource.name=* means "field is present/non-null". Use as a query-opener for aggregations, e.g. dataSource.name=* | group count=count() by dataSource.name. This is NOT an all-column search.* contains 'evil.com' or * matches 'regex' in the initial filter (before the first |) searches ALL indexed fields — use when you need to find text anywhere in the event. Dramatically faster than message contains. Only works before the first |; not valid in | filter … after a pipe, and not valid in Alerts.|, e.g. | group ct=count() by event.type.Double-escape regex almost everywhere. src.process.cmdline matches "\\d+", tgt.file.path matches '^C:\\\\Windows\\\\Temp\\\\[a-z]{8}\\.tmp$'. The only place you don't double-escape is the $"…" shorthand (searches message).
Regex lazy quantifiers (?) are not supported. The SDL regex engine does not support lazy (non-greedy) quantifiers: .*?, .+?, [^x]*? etc. all return HTTP 500 "Dangling meta character '?'". Use a negated character class instead: [^"]* in place of .*?", [^ ]* in place of .*? , etc.
After columns or group, previous fields are gone. These commands create an entirely new record set. If you'll need a field later, carry it through: group ct=count(), host=any(endpoint.name) by src.process.storyline.id — don't expect endpoint.name to still be addressable after that group unless you aggregate it.
Subqueries can't go after group, sort, or limit. And the subquery must itself produce the column named in the in (...) expression (via columns or group). user in (action='login' | group 1 by user) is valid; user in (action='login') is not.
compare and transpose must be the LAST command. Put sort before compare if you want to order the non-shifted side.
join must start with a pipe. | join (…), (…) on x — without the |, "join" is interpreted as a search term. Inner/left joins allow up to 10 subqueries; sql inner and sql left allow only 2.
null behaves like false in boolean context. filter x = null works after the field is defined by a prior command; before then, use !(x = *) for is-null and x = * for is-not-null.
contains is case-insensitive by default; in is case-sensitive by default. The :matchcase / :anycase suffixes reverse this.
Performance: filter early, group narrow. Push filters above the first pipe when possible. In group, prefer low-cardinality fields; for long ranges, consider | top K … instead (probabilistic but orders of magnitude faster).
Alerts and Dashboards have tighter limits. A PowerQuery Alert is capped at 1,000 rows intermediate / 1 MB RAM. Don't put nolimit in a dashboard panel.
Shortcut fields (#cmdline, #name, #hash, …) don't work as initial filters on every tenant. They're documented but return 500 on many deployments. Prefer explicit field names (src.process.cmdline contains 'x') — they're as terse and always work. Save shortcuts for exploratory Event Search where you're not scripting against the API.
Aggregates to prefer: min_by / max_by over first / last. first(x) and last(x) are sometimes listed as aggregates but fail on many tenants. Use min_by(x, timestamp) and max_by(x, timestamp) — they're explicit about ordering and always work.
Percentiles: use p50/p95/p99, not percentile(x, N). The latter isn't a real function and returns 500.
Null-filter at the wrong stage: filter x = null before x is computed returns 500. Use filter !(x =*) for is-null until after a let/join/lookup has produced x.
Coalesce-style fallback uses bare-field truthy test, NOT (field = *) ? a : b. PQ has no coalesce() / ifnull() / nvl(). To pick the first non-null of several fields inside a let, chain bare-field ternaries — they evaluate the field's truthiness directly:
| let user_id = actor.user.email_addr
? actor.user.email_addr
: (actor.user.name ? actor.user.name : src.process.user)
The (field = *) ? a : b form (i.e. wrapping the field-presence test in parens before the ternary) returns HTTP 500 inside let on this engine — field = * is a filter operator, not a boolean expression usable in computed columns. Bare-field truthy is the only working coalesce idiom in PQ.
if(...) is not a function in aggregates. sum(if(cond, 1, 0)) returns 500. Use count(<predicate>) instead — count(severity_id == 5) evaluates the predicate per row and sums the truthy ones. Same for any "count where X" semantic.
Always filter field=* before projecting or inspecting any field. | limit N | columns field returns the first N events regardless of whether the field is populated — most rows will be null. Add field=* to the initial filter to scope to events that actually carry the field:
// Wrong — returns nulls; message may not be present on most events
dataSource.name='FortiGate' | limit 3 | columns message
// Correct — only events where message is present
dataSource.name='FortiGate' message=* | limit 3 | columns message
This applies to every field, not just message. Any time you want to sample, inspect, or aggregate a field, include field=* in the initial filter.
Statistical baselining is two queries plus a client-side merge, not one inline join. Subqueries inside a single | join share the parent query's time range. To compare a 24h live window against a 7d/30d baseline, run them as separate LRQs (or as separate savelookup+lookup rounds) and merge — there is no single-pass form. Pattern in examples/behavioral-baselines.md.
If the user asks for any of the following, you need MORE than this skill — load s1-secops-skills:mgmt-console-api alongside, because the runner, the schema discovery, and the source-agnostic key picker live there:
<source>" / "establish a baseline" / "build a 7d / 30d baseline"What mgmt-console-api adds:
scripts/inspect_source.py — auto-discovers field schema for any dataSource.name and classifies fields into principal_user / principal_host / principal_ip / action etc. via pick_keys(schema) → returns (prim_key, action_key). This means you don't hand-hardcode actor.user.email_addr for every source — the right principal field is picked from whatever the source actually carries (Okta uses email, FortiGate uses IP, SentinelOne uses process user, etc.).scripts/pq.py — run_pq() LRQ runner that handles auth, forward-tag, polling, slicing.scripts/baseline_anomaly.py — source-agnostic 30-day-DoW-stratified baseliner that takes a dataSource.name, discovers the schema, and produces anomalies. Read its source for the canonical end-to-end pattern.Use examples/behavioral-baselines.md in THIS skill for the PQ building blocks (per-day slice, live slice, z-score math, silent-pair detector). Use the mgmt-console-api skill for the runner, schema discovery, and the productionised baseliner script. Don't reinvent the schema-discovery or the daily-slice runner — both already exist there.
The primary execution path is the Long Running Query API. It is async (launch + poll + cancel), handles queries that would time out on any other endpoint, scales cleanly to 30-day aggregates, and is the only path that stays supported after Feb 15 2027 when both /api/powerQuery and /web/api/v2.1/dv/events/pq retire.
Three calls, in order:
POST https://<console>.sentinelone.net/sdl/v2/api/queries -> launch, returns {id, ...}
GET https://<console>.sentinelone.net/sdl/v2/api/queries/{id}?lastStepSeen=N -> poll every 1-2s
DELETE https://<console>.sentinelone.net/sdl/v2/api/queries/{id} -> cancel when done
Required body fields for a PowerQuery:
{
"queryType": "PQ",
"tenant": true,
"startTime": "2026-04-21T00:00:00Z",
"endTime": "2026-04-22T00:00:00Z",
"queryPriority": "HIGH",
"pq": { "query": "<your PQ>", "resultType": "TABLE" }
}
Five things that will bite you if you skip them (full list in references/lrq-api.md):
Authorization: Bearer <jwt>, not ApiToken. Same JWT from the mgmt console, different prefix. Wrong prefix returns HTTP 500 "Header must start with Bearer".queryType is required. Omit and you get 400 "Query type must be specified".tenant: true is required unless you pass accountIds. Without either, the query runs against a near-empty default scope and returns matchCount=0.X-Dataset-Query-Forward-Tag from the POST response header on every poll and the cancel. GET/DELETE without it is rejected.dataSource.name='SentinelOne' dataSource.category='security' (or i.scheme="edr") to your query. Without it you get Scalyr / infra logs mixed with everything, and on some tenants you get only infra.Rate limits: 100 req/sec per account, 3 req/sec per user (the tight one). A user token capped at 3 rps means a token-bucket limiter at ~2.5 rps with a pool of 3 parallel slices is the sweet spot. To push further, use two distinct service user JWTs and round-robin slices across them - each user identity has its own 3 rps budget.
Query expires 30s after launch or 30s after the last poll. Poll every 1-2s; don't let the deadline slip.
Sizing & parallelism. Two bottlenecks stack: per-user rate cap first, then slowest-slice server runtime. Measured on your-tenant for a 30d count-by-event.type over 574M events:
Once two tokens are in play the per-user rate cap stops being the bottleneck and the slowest slice's backend runtime (p95 ~9-17s) becomes the floor. Push further by adding a third service-user JWT (pool=9, ~7.5 rps budget, expected 18-22s), swapping | group for | top K on huge ranges, or narrowing the initial filter. Defaults table and the full benchmark live in references/lrq-api.md.
Canonical runner. The full Python implementation (rate limiter, two-token round-robin, aggregate merge across slices) is documented in references/lrq-api.md. Read that file before writing a new runner from scratch.
Quick one-shot exploration (no API client wired up): the Purple MCP mcp__purple-mcp__powerquery tool is fine for an interactive 24h hunt. It wraps the same engine but with lower limits, tighter timeouts, and no parallelism. Pair it with mcp__purple-mcp__get_timestamp_range(hours=24) for ISO-8601 ranges, and mcp__purple-mcp__purple_ai when you need a starting-point query draft from natural language. Prefer LRQ for anything programmatic, multi-slice, over long windows, or producing results the user will use downstream.
Fallback: when the Purple MCP powerquery tool times out or returns an error (common for ranges > 24h, large aggregates, or wide initial filters), do NOT retry with a tighter time range as a first resort. Instead, re-run the same query through the LRQ API. The mgmt console API's S1Client already holds a valid JWT (S1Client().api_token); swap the prefix from ApiToken to Bearer and POST to the same tenant's /sdl/v2/api/queries. Canonical inline fallback:
# Starting from the already-loaded S1Client used by the mgmt-console-api skill:
from sentinelone_sdl_lrq import LRQClient, run_lrq_pq, parallel_run_roundrobin, slice_window
s1 = S1Client() # same client the mgmt skill uses
jwt = s1.api_token # raw JWT - no prefix
base = s1.base_url # e.g. https://your-tenant.sentinelone.net
lrq = LRQClient(base, jwt, label="fallback", rps=2.5)
result = run_lrq_pq(lrq, query, start_iso, end_iso) # launches, polls 1s, cancels
If the window is longer than a couple of days or the aggregate is heavy, slice it and use parallel_run_roundrobin with two clients built from two service-user JWTs (see references/lrq-api.md). The LRQ path handles anything the MCP times out on; there's no need to shrink the user's requested range to fit the MCP budget.
Don't read these upfront. Read the one you need.
references/lrq-api.md - the canonical Long Running Query API runner: auth, body schema, forward-tag routing, rate-limit strategy, 30-second query expiration, slicing/parallelism patterns, two-JWT round-robin to exceed the per-user rate cap. Read before writing any programmatic PQ runner, or when a query silently returns matchCount=0.references/syntax-and-operators.md — full operator reference, identifier rules, shortcut fields, regex dialect, date/time formats, short-circuit ||.references/commands-reference.md — deep dive on every command (join variants, subqueries, lookup / dataset / savelookup, transpose, compare, top, nolimit). Read before writing anything non-trivial with join, transpose, or compare.references/functions-reference.md — all built-in functions: string, numeric, JSON, network, URL, aggregate, array (method chaining), geolocation, timestamp, time, string-formatting. Read when you need a function and can't remember the name.references/fields-and-schema.md — common EDR/XDR field paths (src.process.*, tgt.file.*, event.login.*, dst.ip.*, indicator.*, etc.) and OCSF conventions. Read when you're not sure what field holds the thing you want.references/o365-fields.md: Microsoft 365 / Exchange / Teams / SharePoint audit field shape, covering OCSF vs unmapped.* duality, fields that live only inside the JSON message blob, the discover-before-you-filter rule, send-style operations, RecordType, service-tier IP filtering, and investigation-noise separator. Read before writing any PQ against an M365 audit source.references/detection-rules.md — how to author PowerQuery Alerts / STAR / Custom Detection rule bodies, including the 1,000-row / 1 MB alert constraints and which PQ features are supported in alert context.references/pitfalls.md — curated list of common failures and their fixes (the *-as-filter trap, forgetting | before join, subquery position errors, memory-limit messages, message contains vs * contains on JSON-blob sources, and more).references/automatic-lookups.md — tenant-wide /automaticLookups enrichment that applies to every search and PowerQuery with no | lookup typed: config schema, the "output value fields must be unique across all specs" rule, the 100-row / 5 MB / 50-column limits, deploy-via-SDL-API flow, verified lookup/dataset gotchas, and a full Windows Event Logs SID-to-username worked example. Read when the user wants to add a lookup for SID/username (or any key) that everyone should see automatically, or asks about /automaticLookups.references/datasource-command.md — the | datasource <name> [from <dataset>] command for querying SentinelOne-managed inventory (Asset Inventory, Alerts, Vulnerabilities, Misconfigurations, Metering, SDL retention) that lives outside the event store. Covers datasource names, the assets/metering datasets, column discovery, time-series via *_aggregated_snapshots, and the tenant-validated specifics for asset enrichment: from 'surface/identity' vs sparse from identity, from 'surface/endpoint' vs sparse from device, single-quoting slash dataset names, empty-riskFactors ("[]") suppression, and the datasource ... | savelookup pattern for building enrichment lookup tables. Read whenever the user asks about assets, identities, vulnerabilities, alerts inventory, or building an asset-enrichment lookup.examples/investigations.md — ready-to-run investigation queries (PowerShell outbound, suspicious cmdline patterns, lateral movement, LOLBins, credential access, defense evasion, user-activity baselines, endpoint heartbeat, indicator prevalence). Each example includes a brief "what this finds" note and the full PQ.examples/detection-library.md — PQ bodies ready to paste into a STAR / Custom Detection / PowerQuery Alert, sized to stay within the 1,000-row/1 MB alert budget. Each entry names the MITRE technique and gives a threshold suggestion.examples/behavioral-baselines.md — statistical baselining recipes for any data source: per-day count slices, moving-average + stddev, z-score detection, silent-pair detection, day-of-week stratification. Read when the user asks to "baseline X behavior", "detect anomalies in Y", "find users / hosts / IPs behaving differently than usual", or any equivalent. Source-agnostic, works on EDR, identity, network, cloud, email, or any custom log source.examples/o365-email-hunting.md: workflow recipes for the most common M365 audit hunts, covering discovery-first patterns, "did user X send mail" questions, per-day cadence, outbound external mail with recipient extraction, top recipient domains, DLP correlation, "user appears but never as actor" attribution gap, and identity-investigation-noise separation. Use whenever the hunt is against an M365 audit source.These three blur together. Quick rules:
field in (inner | columns field)) — single-field "is this value in that set" filtering. Simplest and usually fastest. Use for allowlist / denylist / top-N-and-pivot patterns.on a.user = b.user, a.host = b.host) or you need to bring extra columns from the second query into your output.Prefer subqueries for exclusion/inclusion; reach for join when a row must "know" multiple things at once.
A PowerQuery used as a detection rule body (STAR / Custom Detection / PowerQuery Alert) is more constrained than a hunt query:
nolimit.compare, usually no transpose (depends on version).agent.uuid, endpoint.name, src.process.storyline.id, timestamp).For detection rule patterns and a checklist, see references/detection-rules.md and examples/detection-library.md.
Hunt: PowerShell that made an outbound connection to a non-RFC1918 IP in the last 24 hours, with command line.
src.process.name contains 'powershell' dst.ip.address = *
| let is_private = net_rfc1918(dst.ip.address)
| filter is_private = false
| group hits = count(),
ips = array_agg_distinct(dst.ip.address, 20),
cmdline = any(src.process.cmdline)
by endpoint.name, src.process.storyline.id
| sort -hits
| limit 50
Notice: filter early (dst.ip.address = * prunes events without a destination IP), net_rfc1918 is the right way to split internal vs external (don't hand-roll CIDRs), array_agg_distinct caps the array so the row stays small, any(src.process.cmdline) grabs a representative cmdline since we're collapsing per storyline.
PowerQuery execution uses the s1-secops-mcp MCP tools, which bypass the Cowork sandbox
proxy entirely. Use powerquery_run and powerquery_schema_discover directly instead of
falling back to the mgmt-console-api skill scripts. The MCP tools run locally
on your machine and make direct HTTPS calls to *.sentinelone.net without proxy interference.
npx claudepluginhub sentinel-one/ai-siem --plugin s1-secops-skillsExecutes threat hunting queries using SentinelOne PowerQuery on Singularity Data Lake, explains pipeline syntax, manages time ranges with get_timestamp_range and iso_to_unix_timestamp, analyzes results.
Develop, optimize, and troubleshoot CrowdStrike LogScale (Humio) security detection queries using CQL syntax. Use when writing LogScale queries, building security detections, creating threat hunting rules, fixing CQL syntax errors, working with CrowdStrike EDR/Falcon security monitoring, or building behavioral rules with the correlate() function. Handles case statements, risk categorization, multi-event correlation, investigation playbooks, and actionable security outputs.
Wraps SentinelOne Singularity Data Lake (SDL) API with a Python client and CLI for querying data lake and managing parsers, dashboards, alerts, lookups, and datatables.