Self-Compressing Neural Networks
The last decade of AI research has been characterised by the exploration of the potential of deep neural networks. The advances we have seen in recent years can be at least partly attributed to the increasing size of networks. Considerable effort has been put into creating larger and more complex architectures capable of increasingly impressive feats, from text generation with GPT-3 [1] to image generation with Imagen [2]. Moreover, the success of modern neural networks has led to their deployment in a wide variety of applications. Even as I'm writing this, a neural network is attempting to predict the next word I'm about to write, albeit not accurately enough to replace me anytime soon!
Performance optimisation, on the other hand, has received relatively little attention in the field, which is a significant obstacle to the wider deployment of neural networks. A likely reason for this is the ability to train large neural networks in data centres on thousands of GPUs or other hardware simultaneously. This contrasts with the field of computer graphics for example, where the constraint of having to run in real-time on a single computer created a strong incentive to optimise algorithms without sacrificing quality.
Research in neural network capacity suggests that network capacities needed to discover high-accuracy solutions are greater than capacities needed to represent these solutions. In their paper, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Frankle and Carbin [3] found that only a small fraction of weights in a network are needed to represent a good solution, but directly training a reduced capacity network doesn’t lead to the same level of accuracy. Similarly, Hinton et al. [4] found that transferring “knowledge” from a high-accuracy network to a low-capacity one can produce a network with higher accuracy than training using the same loss function as the high-capacity network.
In this blog post, we ask whether it is possible to dynamically reduce a network’s parameters while training.
To read the full article, click here
Related Semiconductor IP
- Ultra Ethernet MAC & PCS 100G/200G/400G/800G
- Ethernet PCS 100G/200G/400G/800G/1.6T
- Ethernet MAC 100G/200G/400G/800G/1.6T
- Junction Over-Temperature Detector with Linear Centigrade-to-Voltage Output - X-FAB XT018
- Performance P570 Gen 3
Related Blogs
- Embedded Vision: The Road Ahead for Neural Networks and Five Likely Surprises
- Push-button generation of deep neural networks
- Hierarchical Neural Networks
- Deployable Artificial Neural Networks Will Change Everything
Latest Blogs
- Inside the SiFive Performance™ P570 Gen 3: High Performance Efficiency for Next-Generation Consumer and Commercial Applications
- What the steam engine can teach us about modern chip design
- Automotive silicon in the era of AI, functional safety, and cybersecurity
- JPEG XS Officially Joins GenICam, The Machine Vision Standard Managed By EMVA
- Beyond PCIe Compliance: Why Stress Testing Is Crucial for Edge AI Deployments