From fenic
Guides usage of fenic Python library for semantic DataFrames with LLM operators, embeddings, and text processing. Covers imports, namespace, model config, and known pitfalls.
How this skill is triggered — by the user, by Claude, or both
Slash command
/fenic:fenic-mechanicsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
fenic looks like PySpark and you already write its DataFrame surface well
fenic looks like PySpark and you already write its DataFrame surface well
(select/filter/join/group_by/agg, semantic.extract/classify).
This skill covers the mechanics that don't transfer — where fenic differs
from PySpark/pandas intuition in ways that fail (often loudly, sometimes
silently). For full signatures see reference/*.md (generated from the
installed version); for the correction table and traps see gotchas.md.
Golden rule: after writing or editing a fenic pipeline, run
fenic check <file>— a static lint (no execution) that resolves yourfc.*symbols against the installed fenic and flags namespace/import mistakes (fenic.functions,fc.arrayvsfc.arr,fc.explode, …). Fix what it reports.
import fenic as fc. Everything hangs off fc.. There is no
fenic.functions (don't write from fenic import functions as F), no
fenic.api.types, and no unified OpenAIModelConfig.fc.text.*, fc.json.*, fc.markdown.*,
fc.semantic.*, fc.embedding.*, fc.dt.*, and fc.arr.* for array ops.
⚠️ fc.array(...) is a constructor for array literals; the array-operations
namespace is fc.arr (fc.arr.size, fc.arr.contains, fc.arr.sort, …).fc: free functions (fc.col, fc.lit, fc.when, fc.coalesce,
fc.count, fc.sum, fc.avg, fc.collect_list, fc.struct, fc.udf,
fc.async_udf, …), all types, and all model-config classes.explode / unnest are DataFrame methods, not functions:
df.explode("col"), df.unnest("col") — never fc.explode(...).withColumn, groupBy, orderBy,
dropDuplicates) do exist and work, but prefer snake_case.import fenic as fc
session = fc.Session.get_or_create(fc.SessionConfig(
app_name="my_app",
semantic=fc.SemanticConfig(
language_models={"mini": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=500, tpm=200_000)},
default_language_model="mini",
# embeddings are a SEPARATE class + SEPARATE dict:
embedding_models={"emb": fc.OpenAIEmbeddingModel(model_name="text-embedding-3-small", rpm=500, tpm=200_000)},
default_embedding_model="emb",
),
))
fc.OpenAILanguageModel
vs fc.OpenAIEmbeddingModel) and live in different config keys
(language_models vs embedding_models). No single unified model class.default_language_model / default_embedding_model are required when more
than one of that kind is registered.fc.AnthropicLanguageModel(model_name, rpm, input_tpm, output_tpm) — no single tpm. OpenAI/Google/Cohere use tpm.select/with_column): fc.semantic.map,
extract, classify, predicate, reduce, summarize, analyze_sentiment,
embed, parse_pdf.df.semantic.*): join (LLM predicate), sim_join
(embedding similarity), with_cluster_labels (clustering). Only these three.fc.semantic.predicate("Is this a complaint? {{ msg }}", msg=fc.col("msg"))
fc.semantic.map("Summarize {{ body }}", body=fc.col("body"))
Same for fc.text.jinja(template, **columns) and df.semantic.join's
predicate (which uses the literal placeholders {{ left_on }} / {{ right_on }}).fc.semantic.extract(col, MyPydanticModel) — schema is positional (or
response_format=). fc.semantic.classify(col, [..>=2 classes..]).parse_pdf is fc.semantic.parse_pdf (under semantic, NOT markdown —
it calls the model). Input is a column of PDF path strings (no cast needed):
fc.semantic.parse_pdf(fc.col("path"), page_separator="--- PAGE {page} ---") —
pass page_separator (the {page} placeholder is filled per page) when you
want page breaks; omit it and pages run together.fenic check can't catchfenic check is a static lint (symbols & namespaces) — it doesn't see these.
The first three run clean and produce wrong output (truly silent); the
fourth errors only at execution. Get them right by hand:
fc.json.jq(col, query) returns an ARRAY (ArrayType(JsonType)), never a
scalar. Take one match before casting: fc.json.jq(c, ".x").get_item(0).cast(fc.IntegerType)."... {msg} ..." (one brace) is
not interpolated — the model receives the literal {msg}. Always {{ msg }}.fc.dt.datediff(end, start) returns end - start. Reversed args →
silently negative/wrong. Order matters.fc.dt.to_timestamp / to_date / date_format take Spark/Java patterns
(yyyy-MM-dd HH:mm:ss, MM-dd-yyyy), NOT Python/chrono %-tokens — fenic
converts the Spark pattern to chrono internally. A %-style string raises an
ExecutionError at materialization (so fenic check won't flag it). With no
format, to_timestamp expects ISO-8601-with-ms; datediff/date_trunc
take the resulting timestamp/date columns directly.Use fenic's native operators (fc.json.jq, fc.markdown.*,
fc.text.parse_transcript, fc.text.extract templates, fc.text.compute_fuzzy_*)
rather than dropping to json.loads, re, or manual string parsing. The point
of fenic is a typed, inspectable, rerunnable pipeline — raw-Python escape hatches
throw that away and don't run in the engine.
reference/functions.md, reference/dataframe.md, reference/config-and-types.md
— full signatures, generated from the installed fenic version.gotchas.md — the "wrote X, meant Y" correction table (every real failure mode
observed) and the silent-trap deep dive.npx claudepluginhub typedef-ai/fenic --plugin fenicAdds AI capabilities to SQL and PySpark pipelines via Databricks built-in functions (ai_classify, ai_extract, ai_summarize, ai_mask, ai_query, ai_forecast, etc.) without managing model endpoints. Also covers document parsing and custom RAG pipelines.
Runs LLM functions inside Spark SQL on AIDP via ai_generate() for summarizing, classifying, extracting, enriching rows, generating narratives, or grounded RAG analysis in the lakehouse.
Provides expert guidance for fab CLI, nb CLI, and DuckDB in Microsoft Fabric: workspace navigation, notebook management, data querying, deployment, jobs, APIs, and automation. Useful for managing workspaces, lakehouses, OneLake files, and automating operations.