Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?

2026-04-09Machine Learning

Machine LearningComputer Vision and Pattern Recognition
AI summary

The authors study how machine unlearning, which makes a model forget certain training data for privacy reasons, affects fairness. They tested this on a face dataset using different unlearning methods and found that instead of removing bias, the bias shifts from one group to related groups, especially between genders. For example, forgetting young women made the model worse at recognizing older women, showing a gender-related pattern in the model’s understanding. One method reduced this shift but made the model less accurate overall. The authors highlight that current unlearning techniques don’t fully handle how model features are connected, which can unintentionally increase bias.

machine unlearningfairnessbias redistributionCLIP modelszero-shot classificationdemographic parityembedding spaceprompt erasureprompt reweightingrefusal vector
Authors
Yunusa Haruna, Adamu Lawan, Ibrahim Haruna Abdulhamid, Hamza Mohammed Dauda, Jiaquan Zhang, Chaoning Zhang, Shamsuddeen Hassan Muhammad
Abstract
Machine unlearning enables models to selectively forget training data, driven by privacy regulations such as GDPR and CCPA. However, its fairness implications remain underexplored: when a model forgets a demographic group, does it neutralize that concept or redistribute it to correlated groups, potentially amplifying bias? We investigate this bias redistribution phenomenon on CelebA using CLIP models (ViT/B-32, ViT-L/14, ViT-B/16) under a zero-shot classification setting across intersectional groups defined by age and gender. We evaluate three unlearning methods, Prompt Erasure, Prompt Reweighting, and Refusal Vector using per-group accuracy shifts, demographic parity gaps, and a redistribution score. Our results show that unlearning does not eliminate bias but redistributes it primarily along gender rather than age boundaries. In particular, removing the dominant Young Female group consistently transfers performance to Old Female across all model scales, revealing a gender-dominant structure in CLIP's embedding space. While the Refusal Vector method reduces redistribution, it fails to achieve complete forgetting and significantly degrades retained performance. These findings highlight a fundamental limitation of current unlearning methods: without accounting for embedding geometry, they risk amplifying bias in retained groups.