Quantization-Robust LLM Unlearning via Low-Rank Adaptation
2026-02-13 • Machine Learning
Machine LearningComputation and Language
AI summaryⓘ
The authors study a way to make large language models 'forget' certain information after training, which is called unlearning. They found that the usual method of unlearning doesn’t work well when the models are compressed to use less memory (quantization), because the changes get lost. To fix this, the authors use a technique called LoRA that updates only small parts of the model, making these changes stick even after compression. Their experiments show that this approach keeps the model’s performance and privacy protections better when using low-bit quantization.
Large Language ModelsUnlearningPost-Training Quantization4-bit QuantizationLow-Rank Adaptation (LoRA)Model CompressionPrivacy LeakageFine-Tuning
Authors
João Vitor Boer Abitante, Joana Meneguzzo Pasquali, Luan Fonseca Garcia, Ewerton de Oliveira, Thomas da Silva Paula, Rodrigo C. Barros, Lucas S. Kupssinskü
Abstract
Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask or erase unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induce parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.