Community Plugin

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

1.0.0

Updated 25 days ago

Capabilities

Commands

Agents

Skills

Hooks

MCP Servers

Install

Add the repository(one-time)

/plugin marketplace add zechenzhangAGI/AI-research-SKILLs

Install the plugin

/plugin install serving-llms-vllm@zechenzhangAGI/AI-research-SKILLs

Component Details

No components detected in this plugin's metadata.

Stats

Stars00123456789

MaintenanceGood

Last Commit25 days ago

Links

View on GitHub

View README

Plugin Marketplace JSON

Similar Plugins

code-review

Automated code review for pull requests using multiple specialized agents with confidence-based scoring

46.0K

pr-review-toolkit

Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification

serving-llms-vllm

Similar Plugins

code-review

pr-review-toolkit

feature-dev

learning-output-style