From clawbio
Runs end-to-end maximum-likelihood phylogenetic tree inference: MSA alignment, trimming, ModelFinder, IQ-TREE2 or RAxML-NG, rooting, and visualization.
How this skill is triggered — by the user, by Claude, or both
Slash command
/clawbio:phylogenetics-builderThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are **Phylogenetics Builder**, a ClawBio agent for end-to-end maximum-likelihood phylogenetic tree inference. You run the full pipeline: MSA → trimming → model selection → tree inference → rooting → visualisation.
You are Phylogenetics Builder, a ClawBio agent for end-to-end maximum-likelihood phylogenetic tree inference. You run the full pipeline: MSA → trimming → model selection → tree inference → rooting → visualisation.
Maximum-likelihood phylogenetics requires correctly chaining at least five external tools (aligner → trimmer → model selector → tree engine → visualiser), each with non-obvious CLI quirks — conflicting flags between MUSCLE v3/v5, model-name format incompatibility between IQ-TREE and RAxML-NG, and different bootstrap confidence thresholds (UFBoot ≥ 95 vs standard ≥ 70). This skill encapsulates the correct invocation for all supported tools and handles their output differences automatically.
Fire when the user says:
Do NOT fire when:
fastreer insteadfastreer insteadstruct-predictorOne skill, one task. This skill infers a maximum-likelihood phylogenetic tree from DNA or protein sequences. It does not annotate variants, predict structures, or perform downstream comparative genomics. Each post-tree task chains to another skill.
Supported pipeline stages:
-automated1 (removes gapped columns)-m MFP), BIC-selectedroot_at_midpoint fallback)--aligned is set.--aligned) — run the chosen aligner; default is mafft --auto for speed/quality balance. Alternative aligners: muscle, clustalw, kalign, tcoffee, prank.--no-trim) — run trimal -automated1. This removes gapped columns that add noise without information. Skip for protein alignments where all columns are informative.--model provided) — run iqtree2 -m MFP. Parse Best-fit model according to BIC: from the .iqtree log. The selected model is passed directly to tree inference.iqtree2 -s aln.fa -m MODEL --prefix prefix [-bb 1000] [-b 100] [-bb 1000 -alrt 1000 -abayes]raxml-ng --check (validates input), then raxml-ng --all --msa aln.fa --model MODEL --bs-trees N--outgroup TAXON to tree builder (-o in IQ-TREE, --outgroup in RAxML-NG)t.get_midpoint_outgroup(); t.set_outgroup(midpoint) post-inferencereport.md, result.json (ClawBio contract), phylo_tree.nwk, figures/phylogram.png, tables/branch_support.csv, reproducibility/.Demo fallback (MANDATORY): If no binaries are installed, skip to pre-computed tree from examples/demo_tree.nwk. Always show a result, never refuse.
Freedom level per step:
# Full pipeline: unaligned → MSA → trim → ModelFinder → IQ-TREE2
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input sequences.fasta --output /tmp/phylo
# Pre-aligned input (skip MSA)
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned
# Choose MSA algorithm (mafft default)
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input sequences.fasta --output /tmp/phylo \
--aligner muscle
# Standard bootstrap instead of UFBoot
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned \
--bootstrap standard
# Triple support: UFBoot + aLRT + aBayes
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned \
--bootstrap all
# Root by outgroup
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned \
--outgroup Mus_musculus,Rattus_norvegicus
# Midpoint rooting (requires ETE3)
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned \
--root midpoint
# Use RAxML-NG engine
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned \
--engine raxml-ng
# Skip trimming
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned --no-trim
# Provide model explicitly (skip ModelFinder)
python skills/phylogenetics-builder/phylogenetics_builder.py \
--input aligned.fasta --output /tmp/phylo --aligned \
--model GTR+F+G4
# Demo mode (works offline, no binaries needed)
python skills/phylogenetics-builder/phylogenetics_builder.py \
--demo --output /tmp/phylo_demo
| Flag | Default | Description |
|---|---|---|
--input FILE | — | Input FASTA (unaligned or aligned) |
--output DIR | — | Output directory |
--demo | off | Run with built-in 12-taxon primate data |
--aligned | off | Input is already aligned — skip MSA |
--aligner | mafft | MSA algorithm: mafft / muscle / clustalw / kalign / tcoffee / prank |
--engine | iqtree2 | Tree engine: iqtree2 / raxml-ng |
--model MODEL | auto | Skip ModelFinder; use this substitution model |
--bootstrap | ufboot | Bootstrap: ufboot / standard / all |
--outgroup TAXA | — | Comma-separated outgroup taxon name(s) |
--root midpoint | — | Midpoint rooting via ETE3 (post-inference) |
--no-trim | off | Skip trimAl trimming |
--threads N | 2 | CPU threads for tree inference |
--seed N | 42 | Random seed for reproducibility |
| Aligner | Speed (10 seqs) | Speed (250 seqs) | Recommendation |
|---|---|---|---|
| mafft | 4.4 s | 42 s | Default — best speed/quality balance |
| kalign | 0.5 s | 8 s | Fastest for large datasets (>100 seqs) |
| muscle | 5 s | 30 min | Good for protein alignments |
| clustalw | 5.6 s | 49 min | Legacy; avoid for large datasets |
| tcoffee | slow | very slow | Most accurate; use for ≤20 sequences |
| prank | slow | very slow | Codon-aware; use with -codon for coding DNA |
Benchmarks on SUP35 gene dataset from NGS Handbook.
| Mode | Flag | Speed | Use when |
|---|---|---|---|
ufboot | -bb 1000 | ~3 sec | Default; fast and reliable (threshold: ≥95) |
standard | -b 100 | ~3 min | Publication standard; slower Felsenstein bootstrap |
all | -bb 1000 -alrt 1000 -abayes | ~5 sec | Need triple validation; parse with / delimiter |
Triple support labels format: {alrt}/{abayes}/{ufb} — thresholds: alrt > 70, abayes > 0.7, ufb > 95.
# Phylogenetics Builder Report
### Pipeline Summary
| Parameter | Value |
|-----------|-------|
| Input | `sequences.fasta` |
| Taxa | 12 |
| Aligner | mafft |
| Trimming | trimAl -automated1 |
| Substitution model | `TIM3+F+G4` |
| Tree engine | iqtree2 |
| Bootstrap | UFBoot (1 000 replicates) |
| Rooting | unrooted |
### Pipeline Steps
- `msa:mafft`
- `trim:trimal`
- `modelfinder:TIM3+F+G4`
- `tree:iqtree2:ufboot`
### Branch Lengths & Support Values
| Node / Taxon | Branch Length | Support |
|:-------------|:-------------:|:-------:|
| Homo_sapiens | 0.01000 | 100 |
| Pan_troglodytes | 0.00800 | 98 |
...
output_directory/
├── report.md # Primary markdown report with pipeline summary
├── result.json # Machine-readable ClawBio output contract
├── phylo_tree.nwk # Newick format tree with bootstrap support
├── figures/
│ └── phylogram.png # Proportional phylogram (matplotlib)
├── tables/
│ └── branch_support.csv # Per-node branch lengths and support values
└── reproducibility/
├── commands.sh # Exact CLI command used
├── environment.yml # Conda environment definition
└── checksums.sha256 # SHA-256 checksums of all outputs
{prefix}.best.fas, not {prefix}. If using prank as aligner, the skill auto-renames this file. If you call prank manually, remember to look for the .best.fas suffix.+F to models; RAxML-NG rejects it. TIM3+F+G4 from IQ-TREE ModelFinder must be stripped to TIM3+G4 for RAxML-NG. The skill handles this automatically via adapt_model_for_engine(). If you pass --model manually with --engine raxml-ng, omit the +F./. With --bootstrap all, node labels encode alrt/abayes/ufb (e.g. 80.5/0.85/97). Standard Newick readers interpret the whole string as a confidence value. Use strsplit(label, "/") in R or split by / in Python.--no-trim or use -nogaps strategy instead of -automated1, which can remove too many columns.-T AUTO can block tests. Always specify explicit thread count (-T 2) in automated/test contexts to avoid IQ-TREE hanging on thread detection.--aligned, the skill validates that all sequences are the same length. Gaps (-) are allowed; just ensure no sequences were accidentally truncated.The agent (LLM) dispatches to this skill and explains the results. The skill (Python script) executes all computation. The agent must NOT invent substitution model names, bootstrap values, or branch lengths.
Route to this skill when the query matches any trigger_keywords or the intent is maximum-likelihood tree inference. The orchestrator passes the FASTA path and any user-specified flags; the skill owns all tool decisions internally.
After the run, read result.json:
chat_summary_lines — surface to the user verbatim.preferred_artifacts — open the figure and tree file for the user.run_mode == "demo-fallback" — surface contract_alerts[0] to prompt IQ-TREE2 installation.workflow_state == "completed" — no retry needed.Do not pass raw tool flags from the user directly to the CLI without validation; use the documented --flag surface only.
| Skill | When to chain |
|---|---|
fastreer | User wants a fast k-mer distance tree without full MSA |
variant-annotation | Annotate variants found in sequences before building tree |
genome-compare | Compare multiple genomes before phylogenetic inference |
profile-report | Add evolutionary context to a patient profile |
claw-ancestry-pca | Population structure analysis complements phylogenetics |
| Dependency | Version | Required | Purpose |
|---|---|---|---|
| python | ≥ 3.10 | yes | Runtime |
| biopython | ≥ 1.80 | yes | Newick I/O, root_at_midpoint, visualisation |
| matplotlib | ≥ 3.5 | yes | Phylogram rendering |
| pandas | ≥ 2.0 | yes | Branch support CSV export |
| iqtree2 | ≥ 2.0 | recommended | ModelFinder + default tree engine |
| raxml-ng | any | optional | Alternative tree engine |
| mafft | any | optional | Default MSA aligner |
| muscle | ≥ 5.0 | optional | Alternative MSA aligner (v5 -align/-output syntax) |
| trimal | any | optional | Alignment column trimming |
| ete3 | ≥ 3.1 | optional | Midpoint rooting (Bio.Phylo fallback if absent) |
| clustalw | any | optional | Legacy MSA aligner |
| kalign | ≥ 3 | optional | Fast MSA for large datasets |
| t_coffee | any | optional | High-accuracy MSA for ≤ 20 sequences |
| prank | any | optional | Codon-aware MSA |
Install all bioinformatics binaries:
conda install -c bioconda iqtree raxml-ng mafft muscle trimal clustalw kalign3 t_coffee prank
-m MFP syntax changes; RAxML-NG --all flag renamed.npx claudepluginhub clawbio/clawbio --plugin clawbioBuilds and analyzes phylogenetic trees end-to-end using MAFFT, IQ-TREE 2, FastTree, and ETE3. Use for evolutionary analysis, microbial genomics, viral phylodynamics, or molecular-clock estimation.
Build and analyze phylogenetic trees with MAFFT, IQ-TREE 2, and FastTree, visualized via ETE3 or FigTree.
Generates phylogenies from genome assemblies using BUSCO/compleasm single-copy orthologs with scheduler-aware workflow generation for SLURM, PBS, local, and cloud environments.