High-performance AI dataflow processor with scalable vector compute capabilities

Overview

The new XM Series offers an extremely scalable and efficient AI compute engine. By integrating scalar, vector, and matrix engines, XM Series customers can take advantage of very efficient memory bandwidth. The XM Series also continues the legacy of offering extremely high performance per watt for compute-intensive applications.

Key Features

  • SiFive Matrix Engine
    • Fat Outer Product design
    •  Tightly integrated with 4 X-Cores
    •  Deep fusion with vector units
  • 4 X-Cores per cluster
    •  Each with dual vector units
    •  Executes all other layers e.g. activation functions
    •  New exponential acceleration instructions
  • New matrix instructions
    • Fetched by scalar unit
    •  Source data comes from vector registers
    •  Destination to each matrix accumulator
  • 1 Cluster = 16 TOPS (INT8), 8 TFLOPS (BF16) per GHz
  • 1TB/s sustained bandwidth per XM Series cluster
  • XM clusters connect to memory in 2 ways:
    • CHI port for coherent memory access
    • High bandwidth port connected to SRAM for model data
  • Host CPU can be RISC-V, x86 or Arm (or not present)
  • System can scale across multiple dies using CHI 

Benefits

  • Matrix Engine
  • 4 X-Cores per cluster
  • 1 Cluster = 16 TOPS (INT8)

Block Diagram

High-performance AI dataflow processor with scalable vector compute capabilities Block Diagram

Applications

  • AI workloads, data flow management, object detection, speech and recommendation processing.

Technical Specifications

×
Semiconductor IP