AndesAIRE™ AnDLA™ I350 is a deep learning accelerator (DLA) designed to enable high performance-efficient and cost-sensitive AI solutions for edge and end-point inference. It supports popular deep learning frameworks, such as TensorFlow Lite, PyTorch, and ONNX, and performs versatile neural network operations such as convolution, fully-connect, element-wise, activation, pooling, channel padding, upsample, concatenation, etc. in the int8 data type. It also features an internal Direct Memory Access (DMA) and local memory, utilizing the best computing power of the hardware engines. Operation fusion techniques are also adopted in the AnDLA™ I350 to perform most common operator sequences more efficiently. The key configurable parameters of AnDLA™ I350 include the MAC number from 32 to 4096, and SRAM size from 16KB to 4MB, and provide flexible computing power from 64 GOPS to 8 TOPS (at 1 GHz) for a wide range of applications.
High performance-efficient deep learning accelerator for edge and end-point inference
Overview
Key Features
- Configurable MACs from 32 to 4096 (INT8)
- Maximum performance 8 TOPS at 1GHz
- Configurable local memory: 16KB to 4MB
- Multi-dimension DMA
- Four 64-bit AXI bus interfaces
- NN type: CNN inference
- NN models:
- Image and Video: AlexNet, VGG-16/19, MobileNet-v1/v2/v3, ResNet-8/50, Tiny YOLO v1/v2, YOLO v1/v2/v3/v4/v5, SSD
- MobileNet v1/v2, Inception v2, EfficientNet-lite, MobileFaceNet, BlazeNet
- Speech/Voice and audio: LSTM, RNN, GRU
- Operators: Conv2d, depthwise convolution, pointwise convolution, transpose convolution, dilated convolution, element-wise (add, sub, mul), fully-connected, activation (ReLU, leaky ReLU, sigmoid, Tanh, ReLU6, SiLU), pooling (max, ave), upsample, concatenation, batch normalization, channel padding
- Operator fusion
- NHWC data format
Block Diagram
Applications
- AIoT device / TinyML on edge and end-point
- Smart camera
- Smart sensor
- Sensor hub
- Wearable
- Smart home appliance
- Robotic
Technical Specifications
Related IPs
- Deep Learning Accelerator
- Deeply Embedded AI Accelerator for Microcontrollers and End-Point IoT Devices
- High-performance 2D (sprite graphics) GPU IP combining high pixel processing capacity and minimum gate count.
- 2D (vector graphics) & 3D GPU IP A GPU IP combining 3D and 2D rendering features with high performance, low power consumption, and minimum CPU load
- High performance 8-bit micro-controller with 256 bytes on-chip Data RAM, three 16-bit timer/counters, and two 16-bit dptr; 0.25um UMC Logic process.
- Unified Deep Learning Processor