Community Plugin

sentencepiece

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.

1.0.0

Updated 25 days ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install sentencepiece@zechenzhangAGI/AI-research-SKILLs

Component Details

No components detected in this plugin's metadata.

Stats

Stars00123456789

MaintenanceGood

Last Commit25 days ago

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

learning-output-style

Interactive learning mode that requests meaningful code contributions at decision points (mimics the unshipped Learning output style)

45.7K

code-review

Automated code review for pull requests using multiple specialized agents with confidence-based scoring

sentencepiece

Similar Plugins

learning-output-style

code-review

feature-dev

pr-review-toolkit