Highly scalable inference NPU IP for next-gen AI applications

Overview

OPENEDGES, the total memory subsystem IP provider, introduces ENLIGHT Pro, a state-of-the-art inference neural processing unit (NPU) IP that outperforms its previous generation, ENLIGHT (or ENLIGHT Classic). This is an ideal solution for high-performance edge devices including automotive, cameras, and more. ENLIGHT Pro is meticulously engineered to deliver enhanced flexibility, scalability, and configurability, enhancing overall efficiency in a compact footprint.

ENLIGHT Pro supports the transformer model, a key requirement in modern artificial intelligence (AI) applications, particularly Large Language Models (LLMs). LLMs are instrumental in tasks such as text recognition and generation, trained using deep learning techniques on extensive datasets. The automotive industry is expected to adopt LLMs to offer instant, personalized, and accurate responses to customers' inquiries.

ENLIGHT Pro sets itself apart by achieving 4096 MACs/cycle for an 8-bit integer, quadrupling the speed of its predecessor, and operating at up to 1.0GHz on a 14nm process node. It offers performance ranging from 8 TOPS (Terra Operations per Second) to hundreds of TOPS, optimized for flexibility and scalability.

ENLIGHT Pro supports tensor shape transformation operations, including slicing, splitting, and transposing, and also supports a wide variety of data types --- integer 8,16, 32, and floating point (FP) 16 and 32 --- to ensure flexibility across computational tasks. The vector processor achieves 16-bit floating point 64 MACs/cycle and includes a 32x2 KB vector register file (VRF). Additionally, single-core, dual-core, and quad-core with scalable task mappings such as multiple models, data parallelism, and tensor parallelism are available.

ENLIGHT Pro incorporates a RISC-V CPU vector extension with custom instructions. This includes support for Softmax and local storage access, enhancing its overall flexibility. It comes with a software toolkit that supports widely used network formats like ONNX (PyTorch), TFLite (TensorFlow), and CFG (Darknet). ENLIGHT SDK streamlines the conversion of floating-point networks to integer networks through a network compiler and generates NPU commands and network parameters. Explicitly, ENLIGHT Pro has already succeeded in securing customer upon its launch.

Key Features

  • Matrix Multiplication: 4096 MACs/cycles (int 8), 1024 MACs/cycles (int 16)
  • Vector processor: RISC-V with RVV 1.0
  • Custom instructions for softmax and local storage access
  • Dedicated HW for tensor reshape operations (Tensor shape transformation operation, slice/split/transpose support)
  • Higher efficiencies in PPAs (power, performance, area) and DRAM bandwidth
  • Scalable cluster (single, dual, quad-core) and scalable task mapping (multiple models/data parallelism, tensor parallelism)
  • Supports various network formats: ONNX (PyTorch), TFLite (TensorFlow), and CFG (DarkNet)
  • Custom instructions for softmax and local storage access

Benefits

  • Easy customization of different core configurations and performance
  • Automated network quantization flow
  • DRAM traffic minimization through inter-layer optimization
  • Higher efficiencies in PPAs (power, performance, area) and DRAM bandwidth

Block Diagram

Highly scalable inference NPU IP for next-gen AI applications Block Diagram

Applications

  • Automotive
  • camera
  • Person, vehicle, bike, traffic sign detection
  • Parking lot vehicle location detection & recognition
  • License plate detection & recognition
  • Detection, tracking, and action recognition for surveillance, etc.

Deliverables

  • RTL design for synthesis
  • SW toolkits and device driver
  • User guide
  • Integration guide

Technical Specifications

Availability
Now
×
Semiconductor IP