Benefit of pruning and clustering a neural network for before deploying on Arm Ethos-U NPU
Pruning and clustering are optimization techniques:
- Pruning: setting weights to zero
- Clustering: grouping weights together into clusters
These techniques modify the weights of a Machine Learning model. In some cases, they enable:
- Significant speed-up of the inference execution
- Reduction of the memory footprint
- Reduction in the overall power consumption of the system
We assume that you can optimize your workload without loss in accuracy and that you target an Arm® Ethos NPU. You can therefore prune and cluster your neural network before using the Vela compiler and deploying it on the Ethos-U hardware. See below for more information on optimizing your workload.
To read the full article, click here
Related Semiconductor IP
- ISO/IEC 7816 Verification IP
- 50MHz to 800MHz Integer-N RC Phase-Locked Loop on SMIC 55nm LL
- Simulation VIP for AMBA CHI-C2C
- Process/Voltage/Temperature Sensor with Self-calibration (Supply voltage 1.2V) - TSMC 3nm N3P
- USB 20Gbps Device Controller
Related Blogs
- Reviewing different Neural Network Models for Multi-Agent games on Arm using Unity
- Neural Network Model quantization on mobile
- New Armv9 CPUs for Accelerating AI on Mobile and Beyond
- Silicon-proven LVTS for 2nm: a new era of accuracy and integration in thermal monitoring
Latest Blogs
- A Comparison on Different AMBA 5 CHI Verification IPs
- Cadence Recognized as TSMC OIP Partner of the Year at 2025 OIP Ecosystem Forum
- Accelerating Development Cycles and Scalable, High-Performance On-Device AI with New Arm Lumex CSS Platform
- Desktop-Quality Ray-Traced Gaming and Intelligent AI Performance on Mobile with New Arm Mali G1-Ultra GPU
- Powering Scale Up and Scale Out with 224G SerDes for UALink and Ultra Ethernet