Decision Quality Evaluation Framework at Pinterest

2026-02-17Artificial Intelligence

Artificial Intelligence
AI summary

The authors developed a system at Pinterest to check how good decisions are when people or AI decide if content is safe to show online. They created a trusted set of examples that experts carefully picked to compare against decisions. They also made a smart way to pick more examples automatically to cover more cases. This helps Pinterest test different AI tools, improve how instructions are given to AI, keep up with changing rules, and make sure statistics about content are accurate. Overall, the authors moved the process from guesswork to using real data for better content safety.

content moderationlarge language modelsdecision quality evaluationGolden Setpropensity scoresprompt optimizationpolicy evolutiondata-driven validationcontent safetybenchmarking
Authors
Yuqi Tian, Robert Paine, Attila Dobi, Kevin O'Sullivan, Aravindh Manickavasagam, Faisal Farooq
Abstract
Online platforms require robust systems to enforce content safety policies at scale. A critical component of these systems is the ability to evaluate the quality of moderation decisions made by both human agents and Large Language Models (LLMs). However, this evaluation is challenging due to the inherent trade-offs between cost, scale, and trustworthiness, along with the complexity of evolving policies. To address this, we present a comprehensive Decision Quality Evaluation Framework developed and deployed at Pinterest. The framework is centered on a high-trust Golden Set (GDS) curated by subject matter experts (SMEs), which serves as a ground truth benchmark. We introduce an automated intelligent sampling pipeline that uses propensity scores to efficiently expand dataset coverage. We demonstrate the framework's practical application in several key areas: benchmarking the cost-performance trade-offs of various LLM agents, establishing a rigorous methodology for data-driven prompt optimization, managing complex policy evolution, and ensuring the integrity of policy content prevalence metrics via continuous validation. The framework enables a shift from subjective assessments to a data-driven and quantitative practice for managing content safety systems.