Benefit of pruning and clustering a neural network for before deploying on Arm Ethos-U NPU

Pruning and clustering are optimization techniques:

  • Pruning: setting weights to zero
  • Clustering: grouping weights together into clusters

These techniques modify the weights of a Machine Learning model. In some cases, they enable:

  • Significant speed-up of the inference execution
  • Reduction of the memory footprint
  • Reduction in the overall power consumption of the system

We assume that you can optimize your workload without loss in accuracy and that you target an Arm® Ethos NPU. You can therefore prune and cluster your neural network before using the Vela compiler and deploying it on the Ethos-U hardware. See below for more information on optimizing your workload.

To read the full article, click here

×
Semiconductor IP