A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems
By Thomas Benz ‡∗, Axel Vanoni ‡∗, Michael Rogenmoser ∗, Luca Benini ∗†
∗ ETH Zurich, Switzerland
† University of Bologna, Italy

Abstract
With the ever-growing heterogeneity in computing systems, driven by modern machine learning applications, pressure is increasing on memory systems to handle arbitrary and more demanding transfers efficiently. Descriptor-based direct memory access controllers (DMACs) allow such transfers to be executed by decoupling memory transfers from processing units. Classical descriptor-based DMACs are inefficient when handling arbitrary transfers of small unit sizes. Excessive descriptor size and the serialized nature of processing descriptors employed by the DMAC lead to large static overheads when setting up transfers. To tackle this inefficiency, we propose a descriptor-based DMAC optimized to efficiently handle arbitrary transfers of small unit sizes. We implement a lightweight descriptor format in an AXI4-based DMAC. We further increase performance by implementing a low-overhead speculative descriptor prefetching scheme without additional latency penalties in the case of a misprediction. Our DMAC is integrated into a 64-bit Linux-capable RISC-V SoC and emulated on a Kintex FPGA to evaluate its performance. Compared to an off-the-shelf descriptor-based DMAC IP, we achieve 1.66x less latency launching transfers, increase bus utilization up to 2.5x in an ideal memory system with 64-byte-length transfers while requiring 11% fewer lookup tables, 23% fewer flip-flops, and no block RAMs. We can extend our lead in bus utilization to 3.6x with 64-byte-length transfers in deep memory systems. We synthesized our DMAC in GlobalFoundries' GF12LP+ node, achieving a clock frequency of over 1.44 GHz while occupying only 49.5 kGE.
To read the full article, click here
Related Semiconductor IP
- DMA Controller
- DMA Controller
- Multi-Channel Streaming DMA Controller
- AHB/AXI/Wishbone DMA Controller
- DMA Controller with TileLink IIP
Related Articles
- Efficient Hardware-Assisted Heap Memory Safety for Embedded RISC-V Systems
- A 16 nm 1.60TOPS/W High Utilization DNN Accelerator with 3D Spatial Data Reuse and Efficient Shared Memory Access
- An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks
- A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA
Latest Articles
- ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs
- ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training
- OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs
- CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees
- CXL-ClusterSim: Modeling CXL-based Disaggregated Memory Cluster for Pooling and Sharing using gem5 and SST