Exploiting LLM Quantization

23.04.25
Paper Link: "https://arxiv.org/pdf/2405.18137"

Methodology used: This paper presents a large-scale empirical study on the security implications of zero-shot weight quantization methods (LLM.int8(), NF4, FP4) for large language models. The methodology involves a three-stage attack framework: defining a target malicious behaviour, formulating it as constraints on the quantized weights, and using projected gradient descent (PGD) to modify the full-precision model such that its quantized versions exhibit the malicious behaviour while preserving utility. They evaluate these attacks across several popular LLMs (StarCoder, Phi-2, Gemma) on various attack scenarios (vulnerable code generation, over-refusal, content injection), measuring both attack success and model utility on standard benchmarks. They also investigate the effect of design choices and a Gaussian noise-based defense.
New things introduced/ Novelty: This work introduces the novel threat of "LLM quantization attacks," highlighting a security vulnerability arising from the discrepancy between full-precision and quantized models. It provides the first large-scale study demonstrating the practicality of such attacks across various models and quantization methods. The three-stage attack framework for systematically injecting malicious behaviour into quantized LLMs is a novel contribution. The paper also presents an initial investigation into potential defenses against these attacks.
Key take aways and results: The key takeaway is that zero-shot quantization methods can expose LLMs to significant security risks. It is possible to manipulate the full-precision weights of LLMs so that their quantized versions exhibit malicious behaviours (e.g., generating vulnerable code, refusing to answer prompts, injecting specific content) while the full-precision model retains comparable utility. The effectiveness of the attack can vary based on the quantization method (e.g., lower-bit quantization might be more susceptible) and the model's characteristics. A simple Gaussian noise-based defense shows promise in mitigating these attacks.
Comparison with State of the Art (SOTA) and how better it is and under what circumstances: Existing research on LLM quantization primarily focused on preserving accuracy and efficiency during inference. This paper explores a new and critical dimension of security in quantized LLMs, demonstrating a vulnerability that was previously largely unexplored. The findings highlight a potential weakness in the widespread use of quantization for local deployment of LLMs, as these models could be manipulated to exhibit harmful behaviours without significantly degrading their apparent utility in full precision.
Drawbacks that are discussed in the paper: The study did not extend to optimization-based quantization or activation quantization methods. It also faced computational limitations that prevented testing on the largest LLMs. The potential broader impact of the proposed Gaussian noise defense beyond benchmark performance requires further investigation. The paper notes the current lack of thorough evaluation and defense mechanisms against such attacks on model-sharing platforms.
Improvements that can be made: Future research should investigate quantization methods beyond zero-shot weight quantization. Scaling the attack framework to larger LLMs is important. A more comprehensive study of defense mechanisms, including the long-term effects and optimal application of noise-based defenses, is necessary. The development of robust evaluation and certification methods for quantized models on model-sharing platforms is crucial to address this security concern.