Natural language processing ML pipelines for text classification, NER, sentiment analysis, text generation, and embeddings. Activates for "nlp", "text classification", "sentiment analysis", "named entity recognition", "BERT", "transformers", "text preprocessing", "tokenization", "word embeddings". Builds NLP pipelines with transformers, integrated with SpecWeave increments.
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Specialized ML pipelines for natural language processing. Handles text preprocessing, tokenization, transformer models (BERT, RoBERTa, GPT), fine-tuning, and deployment for production NLP systems.
from specweave import NLPPipeline
# Binary or multi-class text classification
pipeline = NLPPipeline(
task="classification",
classes=["positive", "negative", "neutral"],
increment="0042"
)
# Automatically configures:
# - Text preprocessing (lowercase, clean)
# - Tokenization (BERT tokenizer)
# - Model (BERT, RoBERTa, DistilBERT)
# - Fine-tuning on your data
# - Inference pipeline
pipeline.fit(train_texts, train_labels)
# Extract entities from text
pipeline = NLPPipeline(
task="ner",
entities=["PERSON", "ORG", "LOC", "DATE"],
increment="0042"
)
# Returns: [(entity_text, entity_type, start_pos, end_pos), ...]
# Sentiment classification (specialized)
pipeline = NLPPipeline(
task="sentiment",
increment="0042"
)
# Fine-tuned for sentiment (positive/negative/neutral)
# Generate text continuations
pipeline = NLPPipeline(
task="generation",
model="gpt2",
increment="0042"
)
# Fine-tune on your domain-specific text
from specweave import TextPreprocessor
preprocessor = TextPreprocessor(increment="0042")
# Standard preprocessing
preprocessor.add_steps([
"lowercase",
"remove_html",
"remove_urls",
"remove_emails",
"remove_special_chars",
"remove_extra_whitespace"
])
# Advanced preprocessing
preprocessor.add_advanced([
"spell_correction",
"lemmatization",
"stopword_removal"
])
Text Classification:
NER:
Sentiment:
# Start from pre-trained language models
pipeline = NLPPipeline(task="classification")
# Option 1: Use pre-trained (no fine-tuning)
pipeline.use_pretrained("distilbert-base-uncased")
# Option 2: Fine-tune on your data
pipeline.use_pretrained_and_finetune(
model="bert-base-uncased",
epochs=3,
learning_rate=2e-5
)
# For text longer than 512 tokens
pipeline = NLPPipeline(
task="classification",
max_length=512,
truncation_strategy="head_and_tail" # Keep start + end
)
# Or use Longformer for long documents
pipeline.use_model("longformer") # Handles 4096 tokens
# NLP increment structure
.specweave/increments/0042-sentiment-classifier/
├── spec.md
├── data/
│ ├── train.csv
│ ├── val.csv
│ └── test.csv
├── models/
│ ├── tokenizer/
│ ├── model-epoch-1/
│ ├── model-epoch-2/
│ └── model-epoch-3/
├── experiments/
│ ├── distilbert-baseline/
│ ├── bert-base-finetuned/
│ └── roberta-large/
└── deployment/
├── model.onnx
└── inference.py
/ml:nlp-pipeline --task classification --model bert-base
/ml:nlp-evaluate 0042 # Evaluate on test set
/ml:nlp-deploy 0042 # Export for production
Quick setup for NLP projects with state-of-the-art transformer models.