AI summaryⓘ
The authors study how both sides in job-matching processes learn about their preferences through limited interviews, which give partial and noisy information. Unlike earlier work, they consider that employers might also be unsure about who they prefer and can make hiring mistakes early on. To address this, they allow employers to delay hiring decisions strategically, helping them correct mistakes later without needing coordination. They develop new algorithms for both a fully controlled setting and more realistic decentralized settings, showing better learning performance than previous methods. Their work shows that even with uncertainty and limited interviews, efficient and stable matches can be learned quickly.
two-sided matching marketspreferencesinterviewsbandit learningstrategic deferralcentralized algorithmdecentralized algorithmtime-independent regretstable matchingspartial preference information
Authors
Amirmahdi Mirfakhar, Xuchuang Wang, Mengfan Xu, Hedyeh Beyhaghi, Mohammad Hajiesmaili
Abstract
Two-sided matching markets rely on preferences from both sides, yet it is often impractical to evaluate preferences. Participants, therefore, conduct a limited number of interviews, which provide early, noisy impressions and shape final decisions. We study bandit learning in matching markets with interviews, modeling interviews as \textit{low-cost hints} that reveal partial preference information to both sides. Our framework departs from existing work by allowing firm-side uncertainty: firms, like agents, may be unsure of their own preferences and can make early hiring mistakes by hiring less preferred agents. To handle this, we extend the firm's action space to allow \emph{strategic deferral} (choosing not to hire in a round), enabling recovery from suboptimal hires and supporting decentralized learning without coordination. We design novel algorithms for (i) a centralized setting with an omniscient interview allocator and (ii) decentralized settings with two types of firm-side feedback. Across all settings, our algorithms achieve time-independent regret, a substantial improvement over the $O(\log T)$ regret bounds known for learning stable matchings without interviews. Also, under mild structured markets, decentralized performance matches the centralized counterpart up to polynomial factors in the number of agents and firms.