Making the most of Arm NN for GPU inference: FP16 and FastMath
Most operations in deep learning involve massive amounts of data, but simple control logic. As a parallel processor, GPUs are very suitable for this type of task. Current high-end mobile GPUs can provide substantial throughput thanks to having hundreds of Arithmetic Logic Units (ALUs). In fact, GPUs are built with a single purpose – parallel data processing, initially for 3D graphics and later for more general parallel computing.
Additionally, GPUs are energy-efficient processors. Nowadays, the number of operations per watt (TOPs/W) is used to evaluate the energy efficiency of mobile processors and embedded devices. GPUs have higher TOPs/W, due to the relatively simple control unit and lower working frequency.
One of the biggest challenges that faces mobile inference (and training) on deep neural networks (DNNs) is memory. Memory in neural networks (NN) is required to store input data, weight parameters and activations, as an input propagates through the network. As an example, the 50-layer ResNet network has ~26 million weight parameters and computes ~16 million activations in the forward pass. Adding on-chip memory is one way of solving the memory bottleneck problem by allowing higher memory bandwidth. However, on-chip memory is an expensive feature.
To read the full article, click here
Related Semiconductor IP
- Ultra-Low-Power LPDDR3/LPDDR2/DDR3L Combo Subsystem
- 1G BASE-T Ethernet Verification IP
- Network-on-Chip (NoC)
- Microsecond Channel (MSC/MSC-Plus) Controller
- 12-bit, 400 MSPS SAR ADC - TSMC 12nm FFC
Related Blogs
- Imagination and Renesas Redefine the Role of the GPU in Next-Generation Vehicles
- Deep Robotics and Arm Power the Future of Autonomous Mobility
- UEC-LLR: The Future of Loss Recovery in Ethernet for AI and HPC
- Rivian’s autonomy breakthrough built with Arm: the compute foundation for the rise of physical AI
Latest Blogs
- Rivian’s autonomy breakthrough built with Arm: the compute foundation for the rise of physical AI
- AV1 Image File Format Specification Gets an Upgrade with AVIF v1.2.0
- Industry’s First End-to-End eUSB2V2 Demo for Edge AI and AI PCs at CES
- Integrating Post-Quantum Cryptography (PQC) on Arty-Z7
- UA Link PCS customizations from 800GBASE-R Ethernet PCS Clause 172