Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

2026-02-26Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors study how to measure uncertainty in federated learning, where different agents train models without sharing data. They note current methods often ignore how differences in both data and models affect reliability. To solve this, they propose FedWQ-CP, a method that quickly calibrates uncertainty using a single communication round between agents and the server. Their experiments show FedWQ-CP reliably balances uncertainty estimates across agents while keeping predictions efficient.

federated learninguncertainty quantificationconformal predictiondata heterogeneitymodel heterogeneitycoveragecalibrationprediction setsregressionclassification
Authors
Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku
Abstract
Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk deploying overconfident models at under-resourced agents, leading to silent local failures despite seemingly satisfactory global performance. Existing federated UQ approaches often address data heterogeneity or model heterogeneity in isolation, overlooking their joint effect on coverage reliability across agents. Conformal prediction is a widely used distribution-free UQ framework, yet its applications in heterogeneous FL settings remains underexplored. We provide FedWQ-CP, a simple yet effective approach that balances empirical coverage performance with efficiency at both global and agent levels under the dual heterogeneity. FedWQ-CP performs agent-server calibration in a single communication round. On each agent, conformity scores are computed on calibration data and a local quantile threshold is derived. Each agent then transmits only its quantile threshold and calibration sample size to the server. The server simply aggregates these thresholds through a weighted average to produce a global threshold. Experimental results on seven public datasets for both classification and regression demonstrate that FedWQ-CP empirically maintains agent-wise and global coverage while producing the smallest prediction sets or intervals.