Self-Compressing Neural Networks
The last decade of AI research has been characterised by the exploration of the potential of deep neural networks. The advances we have seen in recent years can be at least partly attributed to the increasing size of networks. Considerable effort has been put into creating larger and more complex architectures capable of increasingly impressive feats, from text generation with GPT-3 [1] to image generation with Imagen [2]. Moreover, the success of modern neural networks has led to their deployment in a wide variety of applications. Even as I'm writing this, a neural network is attempting to predict the next word I'm about to write, albeit not accurately enough to replace me anytime soon!
Performance optimisation, on the other hand, has received relatively little attention in the field, which is a significant obstacle to the wider deployment of neural networks. A likely reason for this is the ability to train large neural networks in data centres on thousands of GPUs or other hardware simultaneously. This contrasts with the field of computer graphics for example, where the constraint of having to run in real-time on a single computer created a strong incentive to optimise algorithms without sacrificing quality.
Research in neural network capacity suggests that network capacities needed to discover high-accuracy solutions are greater than capacities needed to represent these solutions. In their paper, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Frankle and Carbin [3] found that only a small fraction of weights in a network are needed to represent a good solution, but directly training a reduced capacity network doesn’t lead to the same level of accuracy. Similarly, Hinton et al. [4] found that transferring “knowledge” from a high-accuracy network to a low-capacity one can produce a network with higher accuracy than training using the same loss function as the high-capacity network.
In this blog post, we ask whether it is possible to dynamically reduce a network’s parameters while training.
To read the full article, click here
Related Semiconductor IP
- UFS 5.0 Host Controller IP
- PDM Receiver/PDM-to-PCM Converter
- Voltage and Temperature Sensor with integrated ADC - GlobalFoundries® 22FDX®
- 8MHz / 40MHz Pierce Oscillator - X-FAB XT018-0.18µm
- UCIe RX Interface
Related Blogs
- Embedded Vision: The Road Ahead for Neural Networks and Five Likely Surprises
- Push-button generation of deep neural networks
- Hierarchical Neural Networks
- Deployable Artificial Neural Networks Will Change Everything
Latest Blogs
- Satellite communications are no longer as secure as assumed
- Why Hardware Monitoring Needs Infrastructure, Not Just Sensors
- Why Post-Quantum Cryptography Doesn’t Replace Classical Cryptography
- The Silent Guardian of AI Compute - PUFrt Unifies Hardware Security and Memory Repair to Build the Trust Foundation for AI Factories
- Heterogeneous NPU Data Movement Tax: Intel's Own Slides Tell the Story