RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI
By Muhammed Yildirim 1, Ozcan Ozturk 2
1 Ihsan Dogramaci Bilkent University, Turkey
2 Sabancı University, Turkey

Abstract
The increasing demand for on-device intelligence in Edge AI and TinyML applications requires the efficient execution of modern Convolutional Neural Networks (CNNs). While lightweight architectures like MobileNetV2 employ Depthwise Separable Convolutions (DSC) to reduce computational complexity, their multi-stage design introduces a critical performance bottleneck inherent to layer-by-layer execution: the high energy and latency cost of transferring intermediate feature maps to either large on-chip buffers or off-chip DRAM. To address this memory wall, this paper introduces a novel hardware accelerator architecture that utilizes a fused pixel-wise dataflow. Implemented as a Custom Function Unit (CFU) for a RISC-V processor, our architecture eliminates the need for intermediate buffers entirely, reducing the data movement up to 87\% compared to conventional layer-by-layer execution. It computes a single output pixel to completion across all DSC stages-expansion, depthwise convolution, and projection-by streaming data through a tightly-coupled pipeline without writing to memory. Evaluated on a Xilinx Artix-7 FPGA, our design achieves a speedup of up to 59.3x over the baseline software execution on the RISC-V core. Furthermore, ASIC synthesis projects a compact 0.284 mm2 footprint with 910 mW power at 2 GHz in 28 nm, and a 1.20 mm2 footprint with 233 mW power at 300 MHz in 40 nm. This work confirms the feasibility of a zero-buffer dataflow within a TinyML resource envelope, offering a novel and effective strategy for overcoming the memory wall in edge AI accelerators.
To read the full article, click here
Related Semiconductor IP
- Tiny, Ultra-Low-Power Embedded RISC-V Processor
- Low-Power Embedded RISC-V Processor
- Enhanced-Processing Embedded RISC-V Processor
- Vector-Capable Embedded RISC-V Processor
- Compact Embedded RISC-V Processor
Related Articles
- MIPI in next generation of AI IoT devices at the edge
- Boosting RISC-V SoC performance for AI and ML applications
- FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices
- ioPUF+: A PUF Based on I/O Pull-Up/Down Resistors for Secret Key Generation in IoT Nodes
Latest Articles
- Croc: Training the Next Generation Chip Designers on Domain-Specific End-to-End Open Source Silicon
- Design and Development of a Neuromorphic Silicon Suite: PVT Sensing, Stochastic LIF Inference, On-Chip STDP Learning, and Crossbar Programming
- LLM4RTL: Tool-Assisted LLM for RTL Generation
- Towards Delta Aware Training: Efficient DNN Weight Storage for Resource-Constrained FPGAs
- CHERI-D: Secure and efficient inline object ID for CHERI temporal memory safety