A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations

2026-03-31Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster ComputingPerformance
AI summary

The authors studied how using a low-precision number format (INT8) can speed up high-precision (FP64) math tasks on modern GPUs without changing the original code. They used a special tool to imitate high-precision calculations with INT8, focusing on matrix multiplication in a scientific application. They found that by adjusting the precision, they could keep accuracy while improving performance, unlike usual methods that change the algorithms. Their work suggests that adaptable precision could help future scientific computing run faster on AI-focused hardware.

INT8FP64GPUmatrix multiplicationBLASUnified Memory Architecturemixed-precision computingautomatic offloadscientific computingprecision emulation
Authors
Hang Liu, Junjie Li, Yinzhi Wang, Niraj K. Nepal, Yang Wang
Abstract
This study explores the use of INT8-based emulation for accelerating traditional FP64-based HPC workloads on modern GPU architectures. Through SCILIB-Accel automatic BLAS offload tool for cache-coherent Unified Memory Architecture, we emulate FP64 matrix multiplications in the LSMS CPU application in the MuST suite without code changes. We find that accuracy depends on both arithmetic precision and the properties of the operator, which can be dealt with through tunable precision emulation. Unlike traditional mixed-precision approaches, this method preserves original algorithms while optimizing hardware utilization. We showcase the potential of improving accuracy and performance at the same time. This work highlights the potential of AI-driven hardware to transform HPC, advocating for adaptive precision strategies in future scientific computing.