Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine

2026-02-23 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors explain that using labeled data from experts has been a big challenge for applying AI in medicine. They discuss how new methods that learn patterns from raw data without labels, called unsupervised and self-supervised learning, are making it easier to analyze large biomedical datasets. These methods help uncover hidden traits, link physical features to genetics, and find disease signs without human bias. The authors highlight examples where these approaches perform as well as or better than traditional methods that rely on expert labels.

artificial intelligencebiomedical datasupervised learningunsupervised learningself-supervised learningMRIgenomicsphenotypeshistologyanomaly detection

Authors

Soumick Chatterjee

Abstract

The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wave of clinical algorithms, a paradigm shift towards unsupervised and self-supervised learning (SSL) is currently unlocking the latent potential of biobank-scale datasets. By learning directly from the intrinsic structure of data - whether pixels in a magnetic resonance image (MRI), voxels in a volumetric scan, or tokens in a genomic sequence - these methods facilitate the discovery of novel phenotypes, the linkage of morphology to genetics, and the detection of anomalies without human bias. This article synthesises seminal and recent advances in "learning without labels," highlighting how unsupervised frameworks can derive heritable cardiac traits, predict spatial gene expression in histology, and detect pathologies with performance that rivals or exceeds supervised counterparts.

View PDFOpen arXiv