STAMP: Selective Task-Aware Mechanism for Text Privacy
2026-03-12 • Machine Learning
Machine LearningCryptography and SecurityInformation Theory
AI summaryⓘ
The authors created STAMP, a method that protects private information in text while keeping it useful for tasks like answering questions or classifying news. STAMP decides how much privacy protection to add to each word by looking at how important the word is for the task and how sensitive its content might be. They also developed a new way to add noise called the polar mechanism, which changes word embeddings carefully to keep their meaning intact. Tests showed that this approach balances privacy and usefulness better than older methods.
text privatizationprivacy budgettoken embeddingspolar mechanismtask-aware privacycosine similarityembedding spacedownstream tasknoise mechanismprivacy-utility trade-off
Authors
Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon
Abstract
We present STAMP (Selective Task-Aware Mechanism for Text Privacy), a new framework for task-aware text privatization that achieves an improved privacy-utility trade-off. STAMP selectively allocates privacy budgets across tokens by jointly considering (i) each token's importance to the downstream task (as measured via a task- or query-specific representation), and (ii) its privacy sensitivity (e.g., names, dates, identifiers). This token-level partitioning enables fine-grained, group-wise control over the level of noise applied to different parts of the input, balancing privacy protection with task relevance. To privatize individual token embeddings, we introduce the polar mechanism, which perturbs only the direction of embeddings on the unit sphere while preserving their magnitude. Decoding is performed via cosine nearest-neighbor search, aligning the perturbation geometry with the decoding geometry. Unlike isotropic noise mechanisms, the polar mechanism maintains semantic neighborhoods in the embedding space and better preserves downstream utility. Experimental evaluations on SQuAD, Yelp, and AG News datasets demonstrate that STAMP, when combined with the normalized polar mechanism, consistently achieves superior privacy-utility trade-offs across varying per-token privacy budgets.