Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification

2026-06-02Machine Learning

Machine Learning
AI summary

The authors explain that measuring online systems based on money earned can be tricky because a few users make most of the money, which creates unreliable test results. They propose a new method that combines two statistical techniques, post-stratification and CUPED, to reduce this problem and get clearer results without needing more users. They tested this approach at ShareChat, where it cut down the amount of needed traffic by about 45% while keeping confidence in the results. The authors also share advice on how and when to use their method in real-world recommendation systems.

online evaluationranking systemsmonetization metricsheavy-tailed distributionA/B experimentsvariance reductionpost-stratificationCUPEDstatistical powerrecommendation systems
Authors
Neeti Pokharna, Olivier Jeunen, Yatharth Saraf, Aleksei Ustimenko
Abstract
Online evaluation of ranking and retrieval systems often relies on downstream monetization metrics such as app revenue or creator earnings. These metrics are typically heavy-tailed, with a small fraction of users dominating both mean and variance, leading to low statistical power and unreliable conclusions in A/B experiments -- especially under limited traffic. We present a practical framework for variance reduction in online experiments by combining post-stratification with CUPED. Our approach leverages pre-experiment covariates to improve the sensitivity of monetization experiments without requiring additional traffic. Deployed at ShareChat across ranking-driven monetization experiments, the method substantially reduces variance and improves decision stability, achieving equivalent statistical confidence with ~45\% less traffic than standard metrics. We further discuss practical design choices, guardrails, and limitations, providing guidance on when post-stratification is appropriate for real-world information retrieval and Recommendation systems.