From r-package-skills
Use when code loads or uses collapse (library(collapse), collapse::), performing fast grouped or weighted statistics in R, or seeking faster alternatives to dplyr aggregation
How this skill is triggered — by the user, by Claude, or both
Slash command
/r-package-skills:r-collapseThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**collapse provides C/C++-based high-performance grouped and weighted statistics.** 50-100x faster than dplyr for grouped operations, matches data.table speed while working with any data frame type (tibbles, data.tables, xts).
collapse provides C/C++-based high-performance grouped and weighted statistics. 50-100x faster than dplyr for grouped operations, matches data.table speed while working with any data frame type (tibbles, data.tables, xts).
Core principle: Fast aggregation, transformation, and panel data operations through vectorized C code.
Read references/API.md before writing code.
references/API.md - Complete function referencereferences/collapse-for-tidyverse-users.md - Migration guide and patternsreferences/collapse-documentation.md - Core concepts and usagereferences/collapse-and-sf.md - Working with spatial datareferences/collapse-object-handling.md - Data structure handlingUse collapse when:
Don't use:
vs Alternatives:
| Scenario | Use This |
|---|---|
| Large grouped stats | collapse |
| Weighted computations | collapse |
| sf manipulation | dplyr |
| Reference semantics | data.table |
| Complex joins | data.table |
| Arbitrary group functions | dplyr |
| Task | Function/Example |
|---|---|
| Grouped stats | fmean(), fsum(), fsd(), fmedian() |
| Aggregation | collap(df, ~ by, list(fmean, fsd)) |
| Transform | ftransform(), fmutate() |
| Selection | fselect(), fsubset() (~100x faster) |
| Time series | flag(), fdiff(), fgrowth() |
| Panel data | fwithin(), fbetween(), qsu() |
| Grouping | fgroup_by(), GRP() |
library(collapse)
# Basic: grouped mean (50-100x faster than dplyr)
data |> fgroup_by(category) |> fmean()
# Weighted aggregation
data |> fgroup_by(region) |> fmean(w = weight_col)
# Multiple stats at once
collap(data, ~ category, list(fmean, fsd, fmedian))
# TRA transformations (key differentiator - single C pass)
data |> fgroup_by(id) |> fmean(TRA = "-") # Demean: subtract group mean
data |> fgroup_by(id) |> fsd(TRA = "/") # Scale: divide by group SD
data |> fgroup_by(id) |> fmean(TRA = "fill") # Fill: replace NA with group mean
# See references/API.md for full TRA options ("-", "/", "fill", "-+", "replace")
| Mistake | Fix |
|---|---|
Using group_by() with collapse functions | Use fgroup_by() or pass g = GRP(groupvar) |
collap() applies to ALL numeric columns | Explicitly select columns before calling |
Expecting na.rm = FALSE default | collapse defaults to na.rm = TRUE |
fwithin()/fbetween() collapse rows | They return same # rows (centered/group means) |
| Global options affect behavior | Set arguments explicitly in package code |
Ignoring sort = FALSE speedup | Add sort = FALSE when order doesn't matter (3x faster) |
See references/ for API reference, vignette content (tidyverse comparison, sf integration, object handling, development guidelines), and panel data patterns.
Validator: lib/r-validators/numerical-validator.R
Resources: Docs
npx claudepluginhub arthurgailes/r-package-skills --plugin r-package-skillsModern R operations for data analysis, statistics, and reproducible work. Use for: R, Rstats, tidyverse, dplyr, tidyr, ggplot2, the native pipe |>, tibbles, data wrangling (filter/mutate/summarise/group_by/across/joins/pivot), reading and writing data (readr, readxl, arrow/Parquet, DBI/dbplyr databases, data.table::fread, rvest scraping), strings (stringr) and regex, dates/times (lubridate), factors (forcats), iteration and functional programming (purrr map family, list-columns), statistics and modeling (t.test/lm/glm, formulas, broom, tidymodels), high-performance data.table, time series (tsibble/fable, zoo/xts), and project workflow (renv, Quarto, here, testthat, styler, RStudio/Posit Projects). Covers tidyverse-first idioms with base R and data.table as named alternatives.
Modern tidyverse patterns for R including pipes, joins, grouping, purrr, and stringr. Use when writing tidyverse R code.
Provides pandas API patterns for DataFrame operations, data cleaning, aggregation, merging, and performance optimization. Useful for generating pandas code in data loading, manipulation, or profiling workflows.