Single-core neural network accelerator offering from 0.5 to 4 TOPS Optimized for machine learning inference applications
The Cadence® Tensilica® NNA 110 accelerator incorporates a custom hardware accelerator engine (NNE) coupled with a Tensilica Vision P6 or P1 DSP. The specialized compute block inside the NNA 110 hardware leverages features like random sparsity, tensor compression / decompression to provide an overall best in-class embedded AI accelerator solution.
A single-core NNA 110 accelerator supports 256 to 2K MAC 8x8-bit MAC computations and has various user-defined configurable options. The NNA 110 accelerator can run all neural network layers, including but not limited to convolution, fully connected, LSTM, LRN, and pooling operations. The accompanying Tensilica DSP in NNA 110 can run any operation that is not native to the accelerator, thereby making NNA 110 a highly flexible and robust future-proof offering. NNA 110 solution deliverables comprises of turnkey soft RTL IP, software compiler toolchain, and an accurate simulator for benchmarking.
Tensilica AI Max - NNA 110 Single Core
Overview
Key Features
- Supports scalable NNE MAC configurations: 256, 512, 1024, and 2048 8-bit MACs (# of 16-bit MACs = 1/4th of # 8-bit MACs)
- Supports UBUF configurations: 256KB to 2MB
- Supports various bandwidth configurations: 32/16/8/4 bytes/clock and AXI bus width of 128 or 256 bits
- Supports clock rates up to 1GHz
- Run-time sparsity-based cycle speedup
- 4-bit weight clustering
- Runtime tensor bandwidth compression/decompression
- Asymmetric quantization support
Block Diagram
Applications
- Automotive,
- Communications,
- Consumer Electronics,
- Data Processing,
- Industrial and Medical,
- Military/Civil Aerospace,
- Others
Technical Specifications
Maturity
Available on request