Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift

2026-05-01Cryptography and Security

Cryptography and Security
AI summary

The authors tested whether a popular AI image generator, Stable Diffusion, could be used to create fake tabular data by turning rows into tiny images. They explored how arranging features spatially affected results and introduced two ideas: distinguishing between fake data that looks good to machines versus humans, and defining AI-generated data meant for machines as "synthetic evidence." They warn that if machines accept such synthetic evidence as real without checks, it could quietly distort true data over time.

Stable Diffusiontabular dataUCI Adult Income datasetU-Net architectureinductive biasspatial localitystatistical realismperceptual realismsynthetic evidenceground truth drift
Authors
Adam Arthur, Christopher Schwartz
Abstract
Public image diffusion models are now powerful enough that an attacker without the resources to train a tabular-specific generator may repurpose one off the shelf. This study tests that possibility directly. An unmodified Stable Diffusion U-Net is applied to the UCI Adult Income dataset by reshaping each row into a small single-channel pseudo-image. The architecture's inductive bias toward spatial locality makes feature placement a design variable, and several layouts are tested. However, this is only the beginning of the story, as this paper also draws two philosophical distinctions. One separates statistical from perceptual realism: whether synthetic content holds up to a machine's correlation audits or a human's sensory inspection. The other introduces synthetic evidence as a category alongside synthetic media: AI-generated material whose consumer is a machine in a closed evidentiary pipeline rather than a person in an open information system. An attacker succeeds with synthetic evidence by thinking like the machine that will receive it. And the more the attacker succeeds, the more they can induce ground truth drift: the silent reclassification of AI-generated outputs as authentic when reused in pipelines that do not interrogate their provenance.