Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6x speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install speculative-decoding@zechenzhangAGI/AI-research-SKILLs