Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install hqq-quantization@zechenzhangAGI/AI-research-SKILLs