Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network
2026-04-10 • Machine Learning
Machine Learning
AI summaryⓘ
The authors created a new way to quickly make synthetic data using a simple neural network and a special loss function that shapes random numbers to look like real data. They tested this method on 25 different real datasets and found it was much faster and often better than current advanced methods. Their experiments checked how similar the fake data was to the real data and how well it worked for tasks like classification. They also used a technique called PCA to make the data smaller and more private, which helped improve performance while saving time and memory.
synthetic dataneural networkrandomized loss functionGaussian distributiontabular dataMMD scoreclassificationPCAdata augmentationprivacy preservation
Authors
Joanna Komorniczak
Abstract
The use of synthetic data in machine learning applications and research offers many benefits, including performance improvements through data augmentation, privacy preservation of original samples, and reliable method assessment with fully synthetic data. This work proposes a time-efficient synthetic data generation method based on a fully connected neural network and a randomized loss function that transforms a random Gaussian distribution to approximate a target real-world dataset. The experiments conducted on 25 diverse tabular real-world datasets confirm that the proposed solution surpasses the state-of-the-art generative methods and achieves reference MMD scores orders of magnitude faster than modern deep learning solutions. The experiments involved analyzing distributional similarity, assessing the impact on classification quality, and using PCA for dimensionality reduction, which further enhances data privacy and can boost classification quality while reducing time and memory complexity.