exaCB: Reproducible Continuous Benchmark Collections at Scale Leveraging an Incremental Approach

2026-03-23Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster Computing
AI summary

The authors explain that as supercomputers get more complex, it's important to check how well different programs perform all the time, not just once. They created exaCB, a tool that fits benchmarking tests into the usual coding workflow, making it easier to catch slowdowns or energy problems early. Using exaCB on a big new supercomputer called JUPITER, they tested over 70 programs to track performance and energy use continuously. This shows exaCB can help manage and improve software on very powerful computers consistently.

High-performance computing (HPC)Exascale architecturesContinuous integration (CI)Continuous benchmarking (CB)Software performance evaluationEnergy efficiencyBenchmarking frameworkJUPITER supercomputerCI/CD pipelinesScientific applications
Authors
Jayesh Badwaik, Mathis Bode, Michal Rajski, Andreas Herten
Abstract
The increasing heterogeneity of high-performance computing (HPC) systems and the transition to exascale architectures require systematic and reproducible performance evaluation across diverse workloads. While continuous integration (CI) ensures functional correctness in software engineering, performance and energy efficiency in HPC are typically evaluated outside CI workflows, motivating continuous benchmarking (CB) as a complementary approach. Integrating benchmarking into CI workflows enables reproducible evaluation, early detection of regressions, and continuous validation throughout the software development lifecycle. We present exaCB, a framework for continuous benchmarking developed in the context of the JUPITER exascale system. exaCB enables application teams to integrate benchmarking into their workflows while supporting large-scale, system-wide studies through reusable CI/CD components, established harnesses, and a shared reporting protocol. The framework supports incremental adoption, allowing benchmarks to be onboarded easily and to evolve from basic runnability to more advanced instrumentation and reproducibility. The approach is demonstrated in JUREAP, the early-access program for JUPITER, where exaCB enabled continuous benchmarking of over 70 applications at varying maturity levels, supporting cross-application analysis, performance tracking, and energy-aware studies. These results illustrate the practicality using exaCB for continuous benchmarking for exascale HPC systems across large, diverse collections of scientific applications.