Comprehensive analysis of BigQuery usage patterns, costs, and query performance
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
This skill performs comprehensive analysis of BigQuery usage patterns, costs, and query performance for a given project. It identifies expensive queries, heavy users, and provides actionable optimization recommendations.
This skill is automatically invoked by the /bigquery:analyze-usage command to perform usage analysis.
bq command-line tool) must be installedgcloud auth login)bigquery.jobs.list permission at minimumWhen invoked, this skill expects:
First, verify the environment is ready:
bq command is availableExecute the following BigQuery queries against INFORMATION_SCHEMA:
SELECT
COUNT(*) as total_queries,
ROUND(SUM(total_bytes_processed) / POW(10, 12), 2) as total_tb_scanned,
ROUND(SUM(total_bytes_processed) / POW(10, 12) * 6.25, 2) as estimated_cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL @hours HOUR)
AND job_type = 'QUERY'
AND state = 'DONE'
AND statement_type != 'SCRIPT'
SELECT
user_email,
COUNT(*) as query_count,
ROUND(SUM(total_bytes_processed) / POW(10, 12), 2) as total_tb_scanned,
ROUND(SUM(total_bytes_processed) / POW(10, 12) * 6.25, 2) as estimated_cost_usd,
ROUND(AVG(total_bytes_processed) / POW(10, 9), 2) as avg_gb_per_query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL @hours HOUR)
AND job_type = 'QUERY'
AND state = 'DONE'
AND statement_type != 'SCRIPT'
GROUP BY user_email
ORDER BY total_tb_scanned DESC
LIMIT 20
SELECT
creation_time,
user_email,
job_id,
ROUND(total_bytes_processed / POW(10, 12), 3) as tb_scanned,
ROUND(total_bytes_processed / POW(10, 12) * 6.25, 2) as cost_usd,
SUBSTR(query, 1, 200) as query_preview
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL @hours HOUR)
AND job_type = 'QUERY'
AND state = 'DONE'
AND statement_type != 'SCRIPT'
AND total_bytes_processed > 0
ORDER BY total_bytes_processed DESC
LIMIT 20
SELECT
SUBSTR(query, 1, 200) as query_pattern,
COUNT(*) as execution_count,
ROUND(SUM(total_bytes_processed) / POW(10, 12), 3) as total_tb_scanned,
ROUND(AVG(total_bytes_processed) / POW(10, 9), 2) as avg_gb_per_execution,
ANY_VALUE(user_email) as sample_user
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL @hours HOUR)
AND job_type = 'QUERY'
AND state = 'DONE'
AND statement_type != 'SCRIPT'
AND total_bytes_processed > 0
GROUP BY query_pattern
HAVING execution_count > 10
ORDER BY total_tb_scanned DESC
LIMIT 15
For the top 2-3 users by data scanned, perform detailed query pattern analysis:
SELECT
SUBSTR(query, 1, 300) as query_pattern,
COUNT(*) as execution_count,
ROUND(SUM(total_bytes_processed) / POW(10, 12), 3) as total_tb_scanned,
ROUND(AVG(total_bytes_processed) / POW(10, 9), 2) as avg_gb_per_execution,
MIN(creation_time) as first_execution,
MAX(creation_time) as last_execution
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL @hours HOUR)
AND job_type = 'QUERY'
AND state = 'DONE'
AND statement_type != 'SCRIPT'
AND user_email = @user_email
AND total_bytes_processed > 0
GROUP BY query_pattern
ORDER BY total_tb_scanned DESC
LIMIT 20
This reveals:
Look for common issues:
Query Anti-Patterns:
SELECT * on large tablesUser Behavior Patterns:
Cost Drivers:
For each issue found, provide:
Prioritization Framework:
Structure the output as:
# BigQuery Usage Analysis Report
**Project:** <project-id>
**Analysis Period:** Last <timeframe>
**Generated:** <timestamp>
## Executive Summary
- Total Queries Executed: <count>
- Total Data Scanned: <TB>
- Estimated Cost: $<amount>
### Key Findings
1. <Top issue with data/cost>
2. <Second major issue>
3. <Third issue>
## Usage by User/Service Account
<Table with top 10-20 users>
## Top Query Patterns
<Detailed breakdown of top patterns with recommendations>
## Per-User Analysis
### Top User 1: <user_email> (<TB> scanned, $<cost>)
<Summary of what this user does>
**Primary Query Types:**
1. <Pattern description> (<data scanned>)
- Execution count
- Average per query
- Specific optimization recommendation
### Top User 2: <user_email> (<TB> scanned, $<cost>)
<Summary of what this user does>
**Primary Query Types:**
1. <Pattern description> (<data scanned>)
- Execution count
- Average per query
- Specific optimization recommendation
## Top Individual Queries
<Table of most expensive queries>
## Optimization Recommendations
### Priority 1: High Impact, Easy Wins
1. **<Recommendation title>**
- Issue: <description>
- Fix: <specific steps>
- Estimated Savings: <$/day or %>
- Difficulty: Easy/Medium/Hard
### Priority 2: Medium Impact
...
### Priority 3: Architectural Improvements
...
## Cost Breakdown Summary
- Service Accounts: $<amount> (<percentage>)
- Human Users: $<amount> (<percentage>)
After presenting the analysis, ask the user if they want to save it to a markdown file:
bigquery-usage-<project-id>-<YYYYMMDD>.mdHandle common issues gracefully:
"bq command not found"
brew install google-cloud-sdk"Access Denied" errors
gcloud auth loginbq ls --project_id=<project-id>"Invalid project ID"
gcloud projects listNo data returned
region-us, US, EU, etc.)Query execution errors
region-us INFORMATION_SCHEMAUS or EU--format=json for easy parsing--use_legacy_sql=false for standard SQLGood User Analysis Example:
### openshift-ci-data-writer (4.07 TB, $25.41)
This service account runs test run analysis and backend disruption monitoring.
**Primary Query Types:**
1. **TestRuns_Summary_Last200Runs** (3.02 TB - 74% of usage)
- 96 executions using SELECT *
- 31.41 GB per query
- **Recommendation:** Replace SELECT * with specific columns.
Estimated savings: $9-15/day
2. **BackendDisruption Lookups** (0.68 TB - 17% of usage)
- 4,165 queries checking for job run names
- **Recommendation:** Add clustering on JobName + JobRunStartTime.
Implement result caching.
This level of detail helps users understand exactly what's driving their costs and how to fix it.
A successful analysis should:
Consider adding: