LCMP: Distributed Long-Haul Cost-Aware Multi-Path Routing for Inter-Datacenter RDMA Networks

2026-04-09Networking and Internet Architecture

Networking and Internet Architecture
AI summary

The authors address challenges in routing data flows over long-distance cloud networks that use RDMA technology, where traditional methods struggle due to uneven paths and congestion delays. They introduce LCMP, a system that uses both a path quality score and real-time congestion signals to smartly send data through multiple routes. LCMP also solves the problem of too many flows picking the same path by filtering out costly options and ensuring diverse routing choices. Tests on a small network and large simulations show LCMP reduces data transmission delays significantly compared to existing methods.

RDMAdatacenter networkingmulti-path routingcongestion controllong-haul networksflow completion timecontrol planehashinginter-datacenter communication
Authors
Dong-Yang Yu, Yuchao Zhang, Xiaodi Wang, Jun Wang, Wenfei Wu, Haipeng Yao, Wendong Wang, Ke Xu
Abstract
RDMA-empowered cloud services are gradually deployed across datacenters (DCs) with multiple paths, which exhibit new properties of path asymmetry, delayed congestion signals, and simultaneous flow routing collisions, and further fail existing routing methods. We present LCMP, a distributed long-haul cost-aware multi-path routing framework that aims to place RDMA flows on multiple inter-DC paths, achieving low-cost, low-latency, and congestion-responsive transmission. LCMP combines a control-plane path-quality score with compact on-switch congestion signals, where the former unifies quality assessment for asymmetric paths and the latter enables responsive reaction to path congestion. LCMP further resolves the simultaneous flow decision collision problem by filtering high-cost candidates, and performing a diversity-preserving hash inside the reduced set. On an 8-DC testbed, LCMP reduces median and tail FCT slowdown by up to 76% and 64%, respectively compared to state-of-the-art (SOTA) DCN routing strategies. And large-scale NS-3 simulations under the 2000 km inter-DC scenario confirm similar improvements.