The new XM Series offers an extremely scalable and efficient AI compute engine. By integrating scalar, vector, and matrix engines, XM Series customers can take advantage of very efficient memory bandwidth. The XM Series also continues the legacy of offering extremely high performance per watt for compute-intensive applications.
High-performance AI dataflow processor with scalable vector compute capabilities
Overview
Key Features
- SiFive Matrix Engine
- Fat Outer Product design
- Tightly integrated with 4 X-Cores
- Deep fusion with vector units
- 4 X-Cores per cluster
- Each with dual vector units
- Executes all other layers e.g. activation functions
- New exponential acceleration instructions
- New matrix instructions
- Fetched by scalar unit
- Source data comes from vector registers
- Destination to each matrix accumulator
- 1 Cluster = 16 TOPS (INT8), 8 TFLOPS (BF16) per GHz
- 1TB/s sustained bandwidth per XM Series cluster
- XM clusters connect to memory in 2 ways:
- CHI port for coherent memory access
- High bandwidth port connected to SRAM for model data
- Host CPU can be RISC-V, x86 or Arm (or not present)
- System can scale across multiple dies using CHI
Benefits
- Matrix Engine
- 4 X-Cores per cluster
- 1 Cluster = 16 TOPS (INT8)
Block Diagram
Applications
- AI workloads, data flow management, object detection, speech and recommendation processing.
Technical Specifications
Related IPs
- High-performance 64-bit RISC-V architecture multi-core processor with AI vector acceleration engine
- High-performance 32-bit multi-core processor with AI acceleration engine
- AIoT processor with vector computing engine
- 2.5D Multi-Core Raster & Vector Graphics Processor for low-power SoCs with Microcontroller
- ARC NPX Neural Processing Unit (NPU) IP supports the latest, most complex neural network models and addresses demands for real-time compute with ultra-low power consumption for AI applications
- Highest code density, Low Power 32-bit Processor with optional DSP