The Cadence® Tensilica® NNE 110 offers an energy-efficient hardware-based AI engine that can be paired with a Tensilica based DSP. The NNE 110 targets a variety of applications including audio, voice, and speech AI, lightweight vision AI, and always-on multi-sensory applications.
The product architecture natively supports the most common network layers found in these applications including convolution, depth-wise separable convolution, fully connected, LSTM, pooling, reshaping, and concatenation layers. Other layers can be supported (and further accelerated using TIE) using the host Tensilica DSP. The NNE 110 provides performance scalability from 32 to 128 MACs for 8x8-bit MAC computation, suiting a variety of low-power AI needs. It offers unique features for AI enhancement including hardware-based sparsity for compute and bandwidth reduction as well as on the-fly weight decompression for smaller system footprints.
Compact neural network engine offering scalable performance (32, 64, or 128 MACs) at very low energy footprints
Overview
Key Features
- Best-in-Class Energy.
- High MAC utilization and sparsity acceleration deliver significant energy improvements over CPUs and DSPs.
- Enables Compelling Use Cases and Advanced Concurrency.
- Resource-intensive AI applications like advanced noise suppression and speech recognition can run concurrently with all other workload needs.
- Scalable IP for Various Workloads.
- Balance area and performance based on system and use case requirements.
- Small System Footprint.
- Compact design area and compression/decompression reduce system memory and bandwidth requirements.
- Fast Time to Market.
- Fully verified IP packages including comprehensive software solutions that leverage existing programming paradigms.
Benefits
- Supported NNE MAC configurations: 32, 64, and 128 8-bit MACs (# of 16-bit MACs = 1/4th of # 8-bit MACs)
- Supported UBUF (local memory) configurations: 32kB, 64kB, 128kB and 256kB
- Bandwidth configurations: 32/16/8/4 bytes/clock and AXI bus of 128-bit width
- Clock rates up to 1GHz
- Runtime sparsity-based cycle speedup
- Runtime weight compression
- Asymmetric quantization support
Block Diagram
Technical Specifications
Related IPs
- AXI system Peripheral IP, DMA controller for AXI master port and slave port (32 - bit, 64 - bit and 128 - bit), 8 channels DMA, Soft IP
- High performance, flexible, 32 bit Energy Efficient Embedded Microcontroller
- Energy efficient, low cost, 8 bit replacement, 32 bit microcontroller
- Convolutional Neural Network (CNN) Compact Accelerator
- Very Low gate Count, Hardware level, Software Data Isolation and Master level Data protection engine.
- High-performance MCU core with privilege modes and MPU (32 or 64 bit)