Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning Inference
By Vineet Kumar 1, Ajay Kumar M 1, Yike Li 1, Shreejith Shanker 2, Deepu John 1
1 School of Electrical and Electronic Engineering, University College Dublin, Ireland,
2 Department of Electronic and Electrical Engineering, Trinity College Dublin, Ireland
This paper presents a novel System-on-Chip (SoC) architecture for accelerating complex deep learning models for edge computing applications through a combination of hardware and software optimisations. The hardware architecture tightly couples the open-source NVIDIA Deep Learning Accelerator (NVDLA) to a 32-bit, 4-stage pipelined RISC-V core from Codasip called µRISC_V. To offload the model acceleration in software, our toolflow generates bare-metal application code (in assembly), overcoming complex OS overheads of previous works that have explored similar architectures. This tightly coupled architecture and bare-metal flow leads to improvements in execution speed and storage efficiency, making it suitable for edge computing solutions. We evaluate the architecture on AMD's ZCU102 FPGA board using NVDLA-small configuration and test the flow using LeNet-5, ResNet-18 and ResNet-50 models. Our results show that these models can perform inference in 4.8 ms, 16.2 ms and 1.1 s respectively, at a system clock frequency of 100 MHz.
Index Terms — System-on-chip, RISC-V, NVDLA, Hardware accelerators, Deep learning, FPGA
To read the full article, click here
Related Semiconductor IP
- RISC-V IOPMP IP
- RISC-V Debug & Trace IP
- Gen#2 of 64-bit RISC-V core with out-of-order pipeline based complex
- 64-bit RISC-V core with in-order single issue pipeline. Tiny Linux-capable processor for IoT applications.
- Tiny, Ultra-Low-Power Embedded RISC-V Processor
Related Articles
- FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices
- FeNN-DMA: A RISC-V SoC for SNN acceleration
- MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference
- In-Pipeline Integration of Digital In-Memory-Computing into RISC-V Vector Architecture to Accelerate Deep Learning
Latest Articles
- RISC-V Functional Safety for Autonomous Automotive Systems: An Analytical Framework and Research Roadmap for ML-Assisted Certification
- Emulation-based System-on-Chip Security Verification: Challenges and Opportunities
- A 129FPS Full HD Real-Time Accelerator for 3D Gaussian Splatting
- SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
- TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks