Mitigating Artifacts in Pre-quantization Based Scientific Data Compressors with Quantization-aware Interpolation

2026-02-23Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster Computing
AI summary

The authors looked at a fast way to shrink big scientific data files, which sometimes makes the data less accurate when the allowed error is not small. They studied why these inaccuracies (artifacts) happen and created a new method to fix them after shrinking the data. Their new method improves the quality of the data when it's decompressed, without slowing down the process. They tested their approach on several real datasets and showed it works well with existing fast compressors.

error-bounded lossy compressionpre-quantizationcompression artifactsquantization indexinterpolation algorithmdecompressed data qualityparallel computingshared-memorydistributed-memoryhigh-performance computing
Authors
Pu Jiao, Sheng Di, Jiannan Tian, Mingze Xia, Xuan Wu, Yang Zhang, Xin Liang, Franck Cappello
Abstract
Error-bounded lossy compression has been regarded as a promising way to address the ever-increasing amount of scientific data in today's high-performance computing systems. Pre-quantization, a critical technique to remove sequential dependency and enable high parallelism, is widely used to design and develop high-throughput error-controlled data compressors. Despite the extremely high throughput of pre-quantization based compressors, they generally suffer from low data quality with medium or large user-specified error bounds. In this paper, we investigate the artifacts generated by pre-quantization based compressors and propose a novel algorithm to mitigate them. Our contributions are fourfold: (1) We carefully characterize the artifacts in pre-quantization based compressors to understand the correlation between the quantization index and compression error; (2) We propose a novel quantization-aware interpolation algorithm to improve the decompressed data; (3) We parallelize our algorithm in both shared-memory and distributed-memory environments to obtain high performance; (4) We evaluate our algorithm and validate it with two leading pre-quantization based compressors using five real-world datasets. Experiments demonstrate that our artifact mitigation algorithm can effectively improve the quality of decompressed data produced by pre-quantization based compressors while maintaining their high compression throughput.