High-Fidelity Conversion of Floating-Point Networks for Low-Precision Inference using Distillation
A major challenge for fast, low-power inference on neural network accelerators is the size of the models. There is a trend in recent years towards deeper neural networks with more activations and coefficients, and with this rise in model size, comes a corresponding rise in inference time and energy consumption per inference. This is particularly significant in resource-constrained mobile and automotive applications. Low-precision inference helps reduce inference cost by reducing DRAM bandwidth (which is a significant contributor to device energy consumption), compute logic cost and power consumption.
In this context, the following question naturally arises: what is an optimal bit depth at which to encode neural network weights and activations? There are several proposed number formats that reduce the bit depth, including Nvidia’s TensorFloat , Google’s 8-bit asymmetric fixed point (Q8A) and bfloat16 . However, while being a step in the right direction, these formats cannot be said to be optimal: for example, most of these formats store an exponent for each individual value to be represented, which can be redundant when multiple values share the same range. More importantly, they do not consider the fact that different parts of a neural network often have different bit-depth requirements. Some layers can be encoded with a low bit depth, while other layers (like input and output layers) need a higher bit depth. An example of this is MobileNet v3, which can be converted from 32-bit floating point to bit depths mostly in the 5-12 bits range (see Figure 1).
To read the full article, click here
Related Semiconductor IP
- nQrux Secure Boot
- 4K/8K Multiformat IP supporting AV2 decoder
- Ultra Ethernet MAC & PCS 100G/200G/400G/800G
- Ethernet PCS 100G/200G/400G/800G/1.6T
- Ethernet MAC 100G/200G/400G/800G/1.6T
Related Blogs
- Verification of UALink (UAL) and Ultra Ethernet (UEC) Protocols for Scalable HPC/AI Networks using Synopsys VIP
- Take your neural networks to the next level with Arm's Machine Learning Inference Advisor
- The Growing Importance of PVT Monitoring for Silicon Lifecycle Management
- Maximizing the Usability of Your Chip Development: Design with Flexibility for the Future
Latest Blogs
- A Repeatable Framework for Hardware Security Assurance
- Inside the SiFive Performance™ P570 Gen 3: High Performance Efficiency for Next-Generation Consumer and Commercial Applications
- What the steam engine can teach us about modern chip design
- Automotive silicon in the era of AI, functional safety, and cybersecurity
- JPEG XS Officially Joins GenICam, The Machine Vision Standard Managed By EMVA