PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

2026-05-18Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster ComputingArtificial IntelligenceProgramming Languages
AI summary

AI summary is being generated…

Authors
Stephen Mell, David Mell, Konstantinos Kallas, Steve Zdancewic, Osbert Bastani
Abstract
Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to traditional applications, execution time is dominated by the external components, which cannot be handled by traditional language optimization systems, like optimizing compilers. To address this problem, we develop PopPy, a system that can uncover parallelization opportunities in Python applications that invoke these heavy external components, including those used in compound AI applications. PopPy supports a very expressive fragment of Python and requires minimal developer input to uncover parallelism. It combines an ahead-of-time compiler with a runtime, addressing three key challenges in extracting parallelism from Python applications: language complexity, dynamic dispatch, and variable mutation. On a set of real-world compound AI applications, PopPy achieves up to $6.4\times$ speedups in end-to-end execution time compared to standard Python execution while preserving the sequential program semantics.