Benefit of pruning and clustering a neural network for before deploying on Arm Ethos-U NPU
Pruning and clustering are optimization techniques:
- Pruning: setting weights to zero
- Clustering: grouping weights together into clusters
These techniques modify the weights of a Machine Learning model. In some cases, they enable:
- Significant speed-up of the inference execution
- Reduction of the memory footprint
- Reduction in the overall power consumption of the system
We assume that you can optimize your workload without loss in accuracy and that you target an Arm® Ethos NPU. You can therefore prune and cluster your neural network before using the Vela compiler and deploying it on the Ethos-U hardware. See below for more information on optimizing your workload.
To read the full article, click here
Related Semiconductor IP
- Ultra-Low-Power LPDDR3/LPDDR2/DDR3L Combo Subsystem
- Parameterizable compact BCH codec
- 1G BASE-T Ethernet Verification IP
- Network-on-Chip (NoC)
- Microsecond Channel (MSC/MSC-Plus) Controller
Related Blogs
- Reviewing different Neural Network Models for Multi-Agent games on Arm using Unity
- Neural Network Model quantization on mobile
- Silicon-proven LVTS for 2nm: a new era of accuracy and integration in thermal monitoring
- Ultra Ethernet Consortium Set to Enable Scaling of Networking Interconnects for AI and HPC
Latest Blogs
- Physical AI at the Edge: A New Chapter in Device Intelligence
- Rivian’s autonomy breakthrough built with Arm: the compute foundation for the rise of physical AI
- AV1 Image File Format Specification Gets an Upgrade with AVIF v1.2.0
- Industry’s First End-to-End eUSB2V2 Demo for Edge AI and AI PCs at CES
- Integrating Post-Quantum Cryptography (PQC) on Arty-Z7