SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

2026-04-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors studied how to create personalized stickers from just one reference image using diffusion models, which is hard because models can overfit and mix up background details or lose flexibility in editing. They proposed SEAL, a new module that helps the model focus better on important parts and structure during adaptation without changing the core model. They also built StickerBench, a dataset of stickers with detailed labels to test how well the model keeps the sticker's identity while allowing changes. Their experiments show SEAL improves both preserving the original look and editing control.

diffusion modelspersonalized text-to-image generationsingle-image synthesistest-time fine-tuningspatial attentionembedding adaptationidentity preservationcontextual controllabilitysticker datasettoken strategy

Authors

Changhyun Roh, Yonghyun Jeong, Jonghyun Lee, Chanho Eom, Jihyong Oh

Abstract

Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based diffusion backbones. SEAL applies three components during embedding adaptation: (1) a Semantic-guided Spatial Attention Loss, (2) a Split-merge Token Strategy, and (3) Structure-aware Layer Restriction. To support sticker-domain personalization with attribute-level control, we present StickerBench, a large-scale sticker image dataset with structured tags under a six-attribute schema (Appearance, Emotion, Action, Camera Composition, Style, Background). These annotations provide a consistent interface for varying context while keeping target identity fixed, enabling systematic evaluation of identity disentanglement and contextual controllability. Experiments show that SEAL consistently improves identity preservation while maintaining contextual controllability, highlighting the importance of explicit spatial and structural constraints during test-time adaptation. The code, StickerBench, and project page will be publicly released.

View PDFOpen arXiv