PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores
2026-04-09 • Programming Languages
Programming LanguagesHardware Architecture
AI summaryⓘ
The authors studied a way to predict which memory stores a load instruction depends on, which helps the processor avoid delays. On small, space-limited processors, the prediction tables are small and often give wrong dependency results, causing unnecessary waiting. They introduced a method called profile-guided memory dependence prediction (PG-MDP) that uses software to identify loads that are independent and removes them from prediction, reducing errors and speeding up execution. Their approach improves performance almost as much as much larger predictors but without extra hardware size or complexity.
Memory Dependence PredictionLoad instructionStore instructionProcessor pipelinePredictor tableFalse dependencyProfile-guided optimizationIPC (Instructions Per Cycle)SPEC2017 CPU benchmarkArea-constrained cores
Authors
Luke Panayi, Johan Jino, Sebastian S. Kim, Alberto Ros, Alexandra Jimborean, Jim Whittaker, Martin Berger, Paul Kelly
Abstract
Memory Dependence Prediction (MDP) is a speculative technique to determine which stores, if any, a given load will depend on. Area-constrained cores are increasingly relevant in various applications such as energy-efficient or edge systems, and often have limited space for MDP tables. This leads to a high rate of false dependencies as memory independent loads alias with unrelated predictor entries, causing unnecessary stalls in the processor pipeline. The conventional way to address this problem is with greater predictor size or complexity, but this is unattractive on area-constrained cores. This paper proposes that targeting the predictor working set is as effective as growing the predictor, and can deliver performance competitive with large predictors while still using very small predictors. This paper introduces profile-guided memory dependence prediction (PG-MDP), a software co-design to label consistently memory independent loads via their opcode and remove them from the MDP working set. These loads bypass querying the MDP when dispatched and always issue as soon as possible. Across SPEC2017 CPU intspeed, PG-MDP reduces the rate of MDP queries by 79%, false dependencies by 77%, and improves geomean IPC for a small simulated core by 1.47% (to within 0.5% of using 16x the predictor entries), with no area cost and no additional instruction bandwidth.