A unified multi-task framework enables interpretable chest radiograph analysis

2026-06-02Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed IMT-CXR, a new AI system that mimics how radiologists analyze chest X-rays by performing multiple connected tasks like detecting diseases, measuring key features, and creating detailed reports. Their approach uses a transformer model trained specifically for medical data to handle classification, lesion spotting, anatomy segmentation, and report writing all in one. Tests on various datasets showed the system works well, and doctors found most AI reports were as clear as or clearer than human-written ones. This method helps make AI decisions more understandable and trustworthy in clinical practice.

multimodal deep learningtransformer architecturechest X-ray analysismulti-label classificationlesion localizationanatomical segmentationradiology report generationmedical-domain instruction tuningexplainable AIclinical workflow
Authors
Lijian Xu, Ziyu Ni, Xinglong Liu, Xiaosong Wang, Hongsheng Li, Shaoting Zhang
Abstract
While multimodal deep learning has advanced medical imaging analysis, existing black-box systems \textcolor{black}{may remain confined to isolated tasks, often overlooking} the trust-sensitive nature of clinical diagnosis as a multi-task process. We propose IMT-CXR (Interpretable Multi-task Transformer for Chest X-ray Analysis), a framework that emulates radiologists' diagnostic workflow through three evidence-driven stages: 1) Disease recognition; 2) Attribute characterization (e.g., size, location, severity quantification); 3) Evidence-integrated report generation with traceable decision pathways. The framework employs a unified transformer architecture optimized via medical-domain instruction tuning, sequentially executing four clinical tasks: multi-label disease classification, lesion localization, anatomical segmentation, and radiology report generation. Experimental validation demonstrates competitive performance on ten CXR benchmarks under direct inference and fine-tuning settings. In a blinded evaluation of 160 historical reports from four medical centers, three radiologists rated 66\% of AI-generated reports as comparable to or surpassing original clinical reports in diagnostic clarity, highlighting the framework's translational potential. By establishing traceable diagnostic pathways from anatomical findings to conclusions, this work bridges the gap between AI technical metrics and clinical utility, advancing trustworthy AI systems in medical imaging.