Greening AI Inference with Accuracy and Latency-aware User Incentives

2026-05-26Machine Learning

Machine LearningOther Computer Science
AI summary

The authors discuss how running AI services creates carbon emissions, especially during AI inference (when AI processes requests). They propose a system that balances users' needs for how fast and high quality the AI responses are with their concerns about the environment. Their solution offers users a subscription option with discounts if they agree to slower or lower-quality responses at times when carbon emissions would be higher. This approach helps reduce emissions while still serving user requests effectively.

AI inferencecarbon emissionsquality of experience (QoE)latencyenvironmental sustainabilityservice subscriptioncarbon intensityresource allocationtradeoffAI model complexity
Authors
Vasilios A. Siris, Adamantia Stamou, George D. Stamoulis, Konstantinos Varsos, Ramin Khalili
Abstract
The widespread use of AI services has raised concerns for its environmental sustainability, towards which recent studies have identified carbon emissions of AI inference as the major contributor. This paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quality and latency, together with their environmental consciousness, while accounting for the tradeoff between carbon emissions and the two QoE parameters. Our approach can accommodate different tradeoffs, that depend on the size and complexity of the AI models and the allocation of resources to serve inference requests. The incentives can be offered through a practical two-tier service subscription that offers users a discount in exchange for reduced carbon emissions. The discounted service option gives the AI provider the flexibility to serve some percentage of inference requests at a lower quality and higher latency during periods of high carbon intensity.