Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters

2026-04-09Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster ComputingHardware ArchitectureEmerging TechnologiesPerformance
AI summary

The authors created Wattlytics, a web tool that helps people decide how to design and run GPU computing systems while considering costs and energy use. Unlike simple calculators, it combines detailed performance data, power modeling, and long-term cost analysis all in one place. Users can choose different GPUs, scientific tasks, and conditions like energy prices to see which setups work best. The authors show that sometimes energy-efficient GPUs save more money overall than just the fastest ones. Wattlytics makes complex decisions about high-performance computing easier to understand and plan for.

GPUDynamic Voltage and Frequency Scaling (DVFS)Total Cost of Ownership (TCO)BenchmarkingHigh-Performance Computing (HPC)Energy EfficiencyScientific WorkloadsPower ModelingDesign-Space ExplorationMonte Carlo Simulation
Authors
Ayesha Afzal, Georg Hager, Gerhard Wellein
Abstract
The escalating computational demands and energy footprint of GPU-accelerated computing systems complicate informed design and operational decisions. We present the first release of Wattlytics (https://wattlytics.netlify.app), an interactive, browser-based decision-support system. Unlike existing procurement-oriented calculators, Wattlytics uniquely integrates benchmark-driven GPU performance scaling, dynamic voltage and frequency scaling (DVFS)-aware piecewise power modeling, and multi-year total cost of ownership (TCO) analysis within a single interactive environment. Users can configure heterogeneous systems across contemporary GPU architectures (GH200, H100, L40S, L40, A40, A100, and L4), select representative scientific workloads (e.g., GROMACS, AMBER), and explore deployment scenarios under constraints such as energy prices, system lifetime, and frequency scaling. Wattlytics computes multidimensional decision metrics (TCO breakdown, work-per-TCO, power-per-TCO, and work-per-watt-per-TCO) and supports design-space exploration, what-if scenarios, sensitivity metrics (elasticity, Sobol indices, Monte Carlo) and collaborative features to guide realistic cluster design and procurement under uncertainty. We demonstrate selected scenarios comparing deployment strategies under different operational modes: ixed budget, fixed GPU count, fixed performance, and fixed power. Our case studies show that, under budget or energy constraints, optimally deployed energy-efficient GPUs can outperform higher-performance alternatives in overall cost-effectiveness. Wattlytics helps users explore the design parameter space and distinguish between cost- and risk-driving factors, turning HPC design into a well-informed and explainable decision-making process.