Efficiently Packing Neural Network AI Model for the Edge
Packing applications into constrained on-chip memory is a familiar problem in embedded design, and is now equally important in compacting neural network AI models into a constrained storage. In some ways this problem is even more challenging than for conventional software because working memory in neural network-based systems is all “inner loop”, where demand to page out to DDR memory could kill performance. Equally bad, repetitive DDR accesses during inferencing will blow typical low power budgets for edge devices. A larger on-chip memory is one way to resolve the problem but that adds to product cost. The best option where possible is to pack the model as efficiently as possible into available memory.
When compiling a neural network AI model to run on an edge device there are well known quantization techniques to reduce size: converting floating point data and weight values to fixed point, then shrinking further to INT8 or smaller values. Imagine if you could go further. In this article I want to introduce a couple of graph optimization techniques which will allow you to fit a wider range of quantized models to say a 2MB L2 memory where these would not have fit after quantization alone.
Related Semiconductor IP
- AES GCM IP Core
- High Speed Ethernet Quad 10G to 100G PCS
- High Speed Ethernet Gen-2 Quad 100G PCS IP
- High Speed Ethernet 4/2/1-Lane 100G PCS
- High Speed Ethernet 2/4/8-Lane 200G/400G PCS
Related Blogs
- Reviewing different Neural Network Models for Multi-Agent games on Arm using Unity
- Benefit of pruning and clustering a neural network for before deploying on Arm Ethos-U NPU
- Designing Energy-Efficient AI Accelerators for Data Centers and the Intelligent Edge
- Neural Network Model quantization on mobile
Latest Blogs
- Why Choose Hard IP for Embedded FPGA in Aerospace and Defense Applications
- Migrating the CPU IP Development from MIPS to RISC-V Instruction Set Architecture
- Quintauris: Accelerating RISC-V Innovation for next-gen Hardware
- Say Goodbye to Limits and Hello to Freedom of Scalability in the MIPS P8700
- Why is Hard IP a Better Solution for Embedded FPGA (eFPGA) Technology?