DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration

2026-05-22 • Hardware Architecture

Hardware Architecture

AI summaryⓘ

The authors address the challenge of running complex deep neural networks efficiently on hardware by creating DORA, a special programmable system that controls exactly how data moves and computations happen. They designed DORA to work well with different types of neural network tasks using a new instruction set and smart memory and parallelism management. They also built a software tool that figures out the best way to run these tasks on DORA, balancing different constraints. Tests on real hardware showed that DORA keeps performance steady across different tasks and can be much faster than current solutions. Their work is available as open-source for others to use.

deep neural networksinstruction set architecture (ISA)dataflowon-chip memory managementcomputation parallelismcompilation frameworkMILP (Mixed Integer Linear Programming)heuristic schedulinghardware acceleratorreconfigurable systems

Authors

Xingzhen Chen, Zhuoping Yang, Jinming Zhuang, Shixin Ji, Sarah Schultz, Zheng Dong, Weisong Shi, Peipei Zhou

Abstract

As deep neural networks develop significantly more diverse and complex, achieving high performance and efficiency on complicated DNN models faces pressing challenges. Modern DNN workloads are increasingly diverse in operation types, tensor shapes, and execution dependencies, making it difficult to sustain high hardware efficiency across models. In addition, a generic accelerator often incurs substantial overhead when executing diverse workloads. To address these problems, we propose DORA, an instruction-based overlay architecture that explicitly describes dataflow via a proposed ISA, enabling fine-grained control of data movement, computation, and synchronization at the layer level. To support flexibility while achieving high performance, DORA adopts a novel on-chip memory management and computation parallelism management mechanism. DORA proposes a compilation framework that can generate instructions for given DNN workloads after a two-stage design space exploration. DORA framework also incorporates a MILP-based and a heuristic-based search engine to generate the schedule solution for different needs and constraints. We prototype DORA on the AMD Versal VCK190 platform, demonstrating its deployability on existing reconfigurable systems. Experimental results show that DORA maintains stable efficiency, with less than 5\% variation on a single vector processor across workloads exhibiting up to 6$\times$ variation in operation counts. Compared to state-of-the-art accelerators, DORA consistently achieves higher performance, delivering up to 5$\times$ throughput improvement. The heuristic-based scheduler further achieves up to 90\% optimality under practical time constraints. DORA is open-sourced at https://github.com/arc-research-lab/DORA.git.

View PDFOpen arXiv