This skill should be used when the user asks to "work with polars", "create a dataframe", "use lazy evaluation", "migrate from pandas", "optimize data pipelines", "read parquet files", "group by operations", or needs guidance on Polars DataFrame operations, expression API, performance optimization, or data transformation workflows.
This skill inherits all available tools. When active, it can use any tool Claude has access to.
references/best-practices.mdreferences/core-concepts.mdreferences/io-guide.mdreferences/operations.mdreferences/pandas-migration.mdreferences/transformations.mdPolars is a lightning-fast DataFrame library for Python built on Apache Arrow. It provides an expression-based API, lazy evaluation framework, and automatic parallelization for high-performance data processing.
uv pip install polars
import polars as pl
# Create DataFrame
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["NY", "LA", "SF"]
})
# Select columns
df.select("name", "age")
# Filter rows
df.filter(pl.col("age") > 25)
# Add computed columns
df.with_columns(
age_plus_10=pl.col("age") + 10
)
Expressions are composable units describing data transformations. Use pl.col("column_name") to reference columns and chain methods for complex operations:
df.select(
pl.col("name"),
(pl.col("age") * 12).alias("age_in_months")
)
Expressions execute within contexts: select(), with_columns(), filter(), group_by().agg().
Eager (DataFrame): Operations execute immediately.
df = pl.read_csv("file.csv") # Reads immediately
result = df.filter(pl.col("age") > 25)
Lazy (LazyFrame): Operations build an optimized query plan.
lf = pl.scan_csv("file.csv") # Doesn't read yet
result = lf.filter(pl.col("age") > 25).select("name", "age")
df = result.collect() # Executes optimized query
Use lazy mode for large datasets, complex pipelines, and when performance is critical. Benefits include automatic query optimization, predicate pushdown, projection pushdown, and parallel execution.
For detailed concepts including data types, type casting, null handling, and parallelization, see references/core-concepts.md.
# Select specific columns
df.select("name", "age")
# Select with expressions
df.select(
pl.col("name"),
(pl.col("age") * 2).alias("double_age")
)
# Add new columns (preserves existing)
df.with_columns(
age_doubled=pl.col("age") * 2,
name_upper=pl.col("name").str.to_uppercase()
)
# Single condition
df.filter(pl.col("age") > 25)
# Multiple conditions (AND)
df.filter(
pl.col("age") > 25,
pl.col("city") == "NY"
)
# OR conditions
df.filter(
(pl.col("age") > 25) | (pl.col("city") == "LA")
)
df.group_by("city").agg(
pl.col("age").mean().alias("avg_age"),
pl.len().alias("count")
)
Apply aggregations while preserving row count:
df.with_columns(
avg_age_by_city=pl.col("age").mean().over("city"),
rank_in_city=pl.col("salary").rank().over("city")
)
For comprehensive operations including sorting, conditionals, string/date operations, and list handling, see references/operations.md.
# Eager
df = pl.read_csv("file.csv")
df.write_csv("output.csv")
# Lazy (preferred for large files)
lf = pl.scan_csv("file.csv")
result = lf.filter(...).select(...).collect()
df = pl.read_parquet("file.parquet")
df.write_parquet("output.parquet")
# Lazy with predicate pushdown
lf = pl.scan_parquet("file.parquet")
For comprehensive I/O including JSON, Excel, databases, cloud storage, and streaming, see references/io-guide.md.
# Inner join
df1.join(df2, on="id", how="inner")
# Left join
df1.join(df2, on="id", how="left")
# Different column names
df1.join(df2, left_on="user_id", right_on="id")
# Vertical (stack rows)
pl.concat([df1, df2], how="vertical")
# Horizontal (add columns)
pl.concat([df1, df2], how="horizontal")
# Pivot (wide format)
df.pivot(values="sales", index="date", columns="product")
# Unpivot (long format)
df.unpivot(index="id", on=["col1", "col2"])
For detailed transformation patterns including asof joins, exploding, and transposing, see references/transformations.md.
Use lazy evaluation for large datasets:
lf = pl.scan_csv("large.csv") # Not read_csv
result = lf.filter(...).select(...).collect()
Avoid Python functions in hot paths - stay within the expression API for parallelization:
# Good: Native expression (parallelized)
df.with_columns(result=pl.col("value") * 2)
# Avoid: Python function (sequential)
df.with_columns(result=pl.col("value").map_elements(lambda x: x * 2))
Select only needed columns early:
lf.select("col1", "col2").filter(...) # Good
lf.filter(...).select("col1", "col2") # Less optimal
Use streaming for very large data:
lf.collect(streaming=True)
Use appropriate data types - Categorical for low-cardinality strings, appropriate integer sizes.
pl.when(condition).then(value).otherwise(other_value)
pl.col("x").fill_null(0)
pl.col("x").is_null()
pl.col("x").drop_nulls()
For comprehensive best practices including anti-patterns, memory management, testing, and code organization, see references/best-practices.md.
Polars offers significant performance improvements over pandas with a cleaner API. Key differences:
| Operation | pandas | Polars |
|---|---|---|
| Select | df["col"] | df.select("col") |
| Filter | df[df["col"] > 10] | df.filter(pl.col("col") > 10) |
| Add column | df.assign(x=...) | df.with_columns(x=...) |
| Group by | df.groupby("col").agg(...) | df.group_by("col").agg(...) |
| Window | df.groupby("col").transform(...) | df.with_columns(...).over("col") |
For comprehensive migration guide including operation mappings, migration patterns, and anti-patterns to avoid, see references/pandas-migration.md.
This skill includes comprehensive reference documentation:
references/core-concepts.md - Expressions, data types, lazy evaluation, parallelizationreferences/operations.md - Selection, filtering, grouping, window functions, string/date operationsreferences/best-practices.md - Performance optimization, anti-patterns, memory managementreferences/io-guide.md - CSV, Parquet, JSON, Excel, databases, cloud storagereferences/transformations.md - Joins, concatenation, pivots, reshaping operationsreferences/pandas-migration.md - Migration guide from pandas to PolarsLoad these references as needed for detailed information on specific topics.