AI accelerator

Overview

Designed specifically for the demands of Generative AI — the most challenging AI workload to date. Our IP core is engineered to maximize efficiency in processing and memory utilization, setting new standards in AI inference speed.

This IP core is readily available for devices in all families of the AMD Versal FPGA.

The GenAI v1 IP core has also been verified on AMD UltraScale+ devices (including on AWS F1 instances) running Meta’s Llama 2, 3.0, 3.1 and 3.2 LLM models with state-of-the-art efficiency.

GenAI v1-Q IP core with Quantization support

The GenAI v1-Q IP core product adds 4-bits and 5-bits Quantization support (Q4_K and Q5_K) to the extraordinary efficiency of the base GenAI v1.

Ideal for boosting their capabilities with low-cost DDR and LPDDR memories, incrementing inference speed by 276%.

4 bits Quantization lowers memory requirements by up to 75%, allowing the largest and most intelligent LLM models to fit into smaller systems, reducing the overall cost, while keeping real-time speed, and also reducing energy consumption. All of this with minimal impact on model accuracy and intelligence perception.

Key Features

  • Massive Floating Point (FP) Parallelism: To handle extensive computations simultaneously.
  • Optimized Memory Bandwidth Utilization: Ensuring peak efficiency in data handling.
  • The normalized throughput metric — tokens per second per unit of memory bandwidth — differentiates the quality of each accelerator design, independently of the memory technology and bandwidth selected by each vendor. This metric shows GenAI v1 accelerator design outperforms all major competitors:
    • +37% over Intel Gaudi
    • +28% beyond Nvidia’s cloud GPUs
    • +25% above Google’s latest TPU

Block Diagram

AI accelerator Block Diagram

Technical Specifications

×
Semiconductor IP