Compact neural network engine offering scalable performance (32, 64, or 128 MACs) at very low energy footprints

Overview

The Cadence® Tensilica® NNE 110 offers an energy-efficient hardware-based AI engine that can be paired with a Tensilica based DSP. The NNE 110 targets a variety of applications including audio, voice, and speech AI, lightweight vision AI, and always-on multi-sensory applications.


The product architecture natively supports the most common network layers found in these applications including convolution, depth-wise separable convolution, fully connected, LSTM, pooling, reshaping, and concatenation layers. Other layers can be supported (and further accelerated using TIE) using the host Tensilica DSP. The NNE 110 provides performance scalability from 32 to 128 MACs for 8x8-bit MAC computation, suiting a variety of low-power AI needs. It offers unique features for AI enhancement including hardware-based sparsity for compute and bandwidth reduction as well as on the-fly weight decompression for smaller system footprints.

Key Features

  • Best-in-Class Energy.
    • High MAC utilization and sparsity acceleration deliver significant energy improvements over CPUs and DSPs.
  • Enables Compelling Use Cases and Advanced Concurrency.
    • Resource-intensive AI applications like advanced noise suppression and speech recognition can run concurrently with all other workload needs.
  • Scalable IP for Various Workloads.
    • Balance area and performance based on system and use case requirements.
  • Small System Footprint.
    • Compact design area and compression/decompression reduce system memory and bandwidth requirements.
  • Fast Time to Market.
    • Fully verified IP packages including comprehensive software solutions that leverage existing programming paradigms.

Benefits

  • Supported NNE MAC configurations: 32, 64, and 128 8-bit MACs (# of 16-bit MACs = 1/4th of # 8-bit MACs)
  • Supported UBUF (local memory) configurations: 32kB, 64kB, 128kB and 256kB
  • Bandwidth configurations: 32/16/8/4 bytes/clock and AXI bus of 128-bit width
  • Clock rates up to 1GHz
  • Runtime sparsity-based cycle speedup
  • Runtime weight compression
  • Asymmetric quantization support

Block Diagram

Compact neural network engine offering scalable performance (32, 64, or 128 MACs) at very low energy footprints Block Diagram

Technical Specifications

×
Semiconductor IP