Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach
2026-06-05 • Machine Learning
Machine Learning
AI summaryⓘ
The authors introduce CascadeNet, a new method to figure out hidden influence networks from observed chains of events without needing to know how exactly things spread. Unlike older methods that assume a specific spreading model, CascadeNet uses a mathematical tool called the Jacobian to estimate influence and ensures reliable results through a special debiasing technique. They tested their approach on simulated data and real COVID-19 spread across Spanish provinces, finding it better at revealing the true underlying networks than existing methods. This helps to better understand how things like diseases or information spread without relying on strict assumptions.
cascade dynamicsinfluence networkJacobian matrixtransition functionNeyman-orthogonal debiasingRiesz representernetwork recoveryCOVID-19 transmissionmachine learningasymptotic normality
Authors
Lei Huang
Abstract
Many important outcomes unfold as dynamic cascades, including product adoption, disease spread, financial distress, and information diffusion. A central challenge is to recover the hidden influence network behind these cascades. Existing methods typically assume a specific diffusion model, and their performance degrades substantially when that assumption is misspecified. We propose CascadeNet, a Jacobian-based machine learning framework for network recovery that does not require specifying a diffusion mechanism. The key idea is that the underlying influence structure can be characterized by the Jacobian of the one-step transition function. CascadeNet first constructs a flexible estimator of the transition function, and further applies Neyman-orthogonal debiasing via the Riesz representer, so that the debiased Jacobian is $\sqrt{n}$-consistent and asymptotically normal, enabling formal inference on the network structure. We validate CascadeNet in both a simulation exercise and a real-world empirical application. In simulations, where the data-generating process is known, CascadeNet achieves the highest network recovery accuracy across nine common data-generating processes. In an empirical application to COVID-19 transmission across Spain's 52 provinces, CascadeNet recovers transmission networks that are significantly correlated with the true inter-province mobility network, whereas networks recovered by baseline methods show no significant alignment with the ground truth.